overview
What is Agent Arena?
Agent Arena is an AI model evaluation platform developed by Arena.ai (formerly LMSYS) that enables AI researchers, developers, enterprises, and consumers to evaluate and compare AI models (LLMs, image, code, etc.) through real-world human feedback. It shapes public leaderboards based on anonymous side-by-side comparisons and human voting. The platform is designed to move beyond static benchmarks by assessing AI agent performance in dynamic, multi-step workflows. A significant development, Agent Mode, introduced on June 4, 2026, allows AI agents to autonomously handle complex tasks using advanced tools. Arena.ai also launched a new leaderboard methodology focused on multi-component agents, analyzing organic user traces. Related initiatives include Microsoft's open-sourced Windows Agent Arena, a benchmark for AI agents operating within the Windows OS, evaluating models across 154 tasks.