2 independent teams tackled AI agent security from opposite angles — one measures how safe agents are, the other makes them safer. Same name, complementary missions.
Independent (academic research)
Safety benchmark for personal AI agents under realistic prompt injection. 120 adversarial test cases evaluating whether frontier LLMs remain safe when serving as agent backbones.
Best for: Security researchers and LLM developers evaluating agent safety. If you build or deploy AI agents, this benchmark tells you how vulnerable they are.