overview
What is WolfBench?
WolfBench is an AI evaluation framework developed by Wolfram that enables AI developers, researchers, and evaluators to rigorously assess AI agent consistency and reliability on diverse, real-world tasks. It provides a five-metric framework and 3D token usage visualization to offer a nuanced understanding of agent performance beyond single average scores.