The Good, the Bad, and the Undecided: Governance of Adversarial Abusive Agents in Multi-Agent Systems

This project investigates AI safety and governance in multi-agent systems (MAS), where autonomous AI agents collaborate, learn from, and influence one another. Moving beyond traditional approaches that focus on aligning individual models, the research examines how harmful behaviors can spread through agent networks, how social norms emerge among AI agents, and how governance mechanisms can prevent systemic risks. Through large-scale simulations and experiments with LLM-based agent societies, the project evaluates interventions such as reputation systems and peer-reporting mechanisms to develop resilient, distributed approaches for maintaining safety and trust in increasingly autonomous AI ecosystems.

Faculty Mentors

NYU New York

NYU Shanghai

NYU Abu Dhabi