2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Arxiv 2024
Assessment of Multimodal Large Language Models in Alignment with Human Values
Arxiv 2024
CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
ACL(Annual Meeting of the Association for Computational Linguistics) 2024
Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
ACL(Annual Meeting of the Association for Computational Linguistics) 2024
SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
ACL(Annual Meeting of the Association for Computational Linguistics) 2024
From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Technicle Report
PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
ACL(Annual Meeting of the Association for Computational Linguistics) 2024
2023
ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models
Arxiv, 2023
LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
NeurIPS(Conference on Neural Information Processing Systems), 2023, Datasets and Benchmarks Track