Dataset

PsySafe Dataset

Natural Language Task-Label-Dimension Triplet Trustworthiness Agent Safety Question Answering

Introduction

<h2 style="font-size: 21px;font-weight: bold;margin-bottom: 1rem;">Description</h2> <p>PsySafe dataset includes 855 triplets with various safe or unsafe tasks, which focuses on three key aspects: identifying how dark personality traits in agents might lead to risky behaviors; designing defense strategies to mitigate these risks; and evaluating the safety of multi-agent systems from both psychological and behavioral perspectives. <br /></p> <h2 style="font-size: 21px;font-weight: bold;margin-bottom: 1rem;">Data</h2> <p>Each data sample in PsySafe dataset is a triplet with the structure of (task, label, dimension). Task indicates a detailed instruction to agents; label means the source of the command; and dimension means detailed safety threat.</p>

Example Data

<p><img src="/images/dataset/PsySafe-egdata.png" alt="图片"></p>

Paper Link

arXiv preprint arXiv:2401.11880

Authors

Bibtex

@article{zhang2024psysafe, title={PsySafe: A Comprehensive Framework for Psychological-based Attack, Defense, and Evaluation of Multi-agent System Safety}, author={Zhang, Zaibin and Zhang, Yongting and Li, Lijun and Gao, Hongzhi and Wang, Lijun and Lu, Huchuan and Zhao, Feng and Qiao, Yu and Shao, Jing}, journal={arXiv preprint arXiv:2401.11880}, year={2024} }