2024

Paper Image

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model

Yongting Zhang, Lu Chen, Guodong Zheng, Yifeng Gao, Rui Zheng, Jinlan Fu, Zhenfei Yin, Senjie Jin, Yu Qiao, Xuanjing Huang, Feng Zhao, Tao Gui2,, Jing Shao

Arxiv 2024

LanguageVisionSafety
Paper Image

Assessment of Multimodal Large Language Models in Alignment with Human Values

Zhelun Shi, Zhipin Wang, Hongxing Fan, Zaibin Zhang, Lijun Li, Yongting Zhang, Zhenfei Yin, Lu Sheng, Yu Qiao, Jing Shao

Arxiv 2024

LanguageVisionEvaluationSafety
Paper Image

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion

Qibing Ren, Chang Gao, Jing Shao, Junchi Yan, Xin Tan, Wai Lam, Lizhuang Ma

ACL(Annual Meeting of the Association for Computational Linguistics) 2024

LanguageCodeSafety
Paper Image

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models

Chen Qian, Jie Zhang, Wei Yao, Dongrui Liu, Zhenfei Yin, Yu Qiao, Yong Liu, Jing Shao

ACL(Annual Meeting of the Association for Computational Linguistics) 2024

LanguageInterpretabilityAlignment
Paper Image

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models

Lijun Li, Bowen Dong, Ruohui Wang, Xuhao Hu, Wangmeng Zuo, Dahua Lin, Yu Qiao, Jing Shao

ACL(Annual Meeting of the Association for Computational Linguistics) 2024

LanguageEvaluationSafety
Paper Image

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities

Chaochao Lu, Chen Qian, Guodong Zheng, Hongxing Fan, Hongzhi Gao, Jie Zhang, Jing Shao, Jingyi Deng, Jinlan Fu, Kexin Huang, Kunchang Li, Lijun Li, Limin Wang, Lu Sheng, Meiqi Chen, Ming Zhang, Qibing Ren, Sirui Chen, Tao Gui, Wanli Ouyang, Yali Wang, Yan Teng, Yaru Wang, Yi Wang, Yinan He, Yingchun Wang, Yixu Wang, Yongting Zhang, Yu Qiao, Yujiong Shen, Yurong Mou, Yuxi Chen, Zaibin Zhang, Zhelun Shi, Zhenfei Yin, Zhipin Wang

Technicle Report

LanguageVisionCodeEvaluationSafetyCasual Reasoning
Paper Image

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety

Zaibin Zhang, Yongting Zhang, Lijun Li, Hongzhi Gao, Lijun Wang, Huchuan Lu, Feng Zhao, Yu Qiao, Jing Shao

ACL(Annual Meeting of the Association for Computational Linguistics) 2024

LanguageAgentInterpretabilityAlignment