overview
What is DeepSWE?
DeepSWE is an AI coding benchmark tool developed by Datacurve that enables researchers, model providers, and engineering teams to evaluate the genuine problem-solving capabilities of agentic AI. It focuses on novel, unseen scenarios and long-horizon software engineering tasks to provide contamination-free assessments. DeepSWE functions as a benchmark for measuring the ability of AI coding agents to handle realistic software development challenges. It assesses an AI's capacity for contextual understanding, logical reasoning, and adherence to best practices in code generation. The benchmark was officially released by Datacurve around May 2026, generating discussion due to its critique of existing benchmarks and its novel evaluation approach. It was developed to overcome perceived critical flaws in existing evaluations, such as data contamination, unrealistic prompts, and unreliable grading systems.