2024

SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Arxiv 2024

Assessment of Multimodal Large Language Models in Alignment with Human Values
Arxiv 2024

CodeAttack: Revealing Safety Generalization Challenges of Large Language Models via Code Completion
ACL(Annual Meeting of the Association for Computational Linguistics) 2024

Towards Tracing Trustworthiness Dynamics: Revisiting Pre-training Period of Large Language Models
ACL(Annual Meeting of the Association for Computational Linguistics) 2024

SALAD-Bench: A Hierarchical and Comprehensive Safety Benchmark for Large Language Models
ACL(Annual Meeting of the Association for Computational Linguistics) 2024

From GPT-4 to Gemini and Beyond: Assessing the Landscape of MLLMs on Generalizability, Trustworthiness and Causality through Four Modalities
Technicle Report

PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety
ACL(Annual Meeting of the Association for Computational Linguistics) 2024
2023

ChEF: A Comprehensive Evaluation Framework for Standardized Assessment of Multimodal Large Language Models
Arxiv, 2023

LAMM: Language-Assisted Multi-Modal Instruction-Tuning Dataset, Framework, and Benchmark
NeurIPS(Conference on Neural Information Processing Systems), 2023, Datasets and Benchmarks Track