overview
What is SWE-Bench Pro?
SWE-Bench Pro is an AI model evaluation and benchmarking tool developed by SWE-bench that enables AI/LLM Researchers, AI Agent Developers, and Software Engineers to evaluate the capabilities of AI agents in solving real-world software engineering tasks. It provides a comprehensive framework for testing and comparing different algorithms in a standardized manner, focusing on complex, long-horizon problems. This benchmark is designed to rigorously assess AI agents on realistic software engineering tasks, typically sourced from GitHub, requiring them to generate code patches that resolve described issues. A task is considered resolved only if the submitted code patch fixes the specific bug or implements the feature (fail-to-pass tests) and introduces no regressions (pass-to-pass tests).