TL;DR / Key Takeaways
- A new AI from Tokyo is outperforming giants like Claude Fable 5, and it’s not just another massive model.
- Sakana AI's Fugu Ultra uses a revolutionary 'orchestration' system that could change how we build intelligent systems.
The Frontier Isn't One Model
The prevailing paradigm of building ever-larger, monolithic AI models is giving way to a more sophisticated strategy. The new frontier involves designing intelligent systems that strategically coordinate multiple specialized AIs, promising enhanced adaptability and resilience against challenges like geopolitical export controls. This architectural shift marks a departure from singular, all-encompassing models.
At the heart of this evolution is **Sakana AI AI Fugu**, a multi-agent orchestration system. Functioning as a conductor LLM, Fugu is trained to dynamically route incoming tasks to the optimal agent from a swappable pool of other LLMs, including recursive instances of itself. This learned orchestrator, powered by a 7B-parameter RL Conductor model, autonomously handles model selection, delegation, verification, and synthesis, presenting a unified intelligence from a single API endpoint.
Sakana AI AI delivers Fugu in two distinct tiers, accessible via a single OpenAI-compatible API. The base Fugu model balances strong performance with low latency, serving as an excellent default for everyday applications like coding with Codex or interactive chatbots. For demanding, multi-step problems requiring peak accuracy and depth, **Fugu Ultra** steps in. Tuned for maximum answer quality, it coordinates a deeper pool of expert agents, boasting a 1.0M token context window and a 131K token maximum output. Early users apply it for AI research, paper production, and cybersecurity analysis.
Winning Without Playing the Same Game
Geopolitical currents now reshape the AI frontier, starkly highlighted by U.S. export controls that revoked public access to Anthropic's Claude Fable 5 and Mythos Preview on June 12, 2026. This move ignited urgent demand for AI sovereignty, compelling nations and enterprises to secure resilient, unhindered access to advanced models.
Sakana AI AI’s Fugu system, launched June 22, 2026, directly answers this imperative. Fugu is not a larger, monolithic model; instead, its core is a 7B-parameter RL Conductor model, a learned orchestrator. This system dynamically coordinates an agent pool of diverse LLMs, even recursive instances of itself, all accessible via a single OpenAI-compatible API.
This multi-agent architecture provides a crucial hedge against single-vendor lock-in and geopolitical disruptions. If a model becomes blocked or unavailable, Fugu can simply route around it, leveraging other agents in its pool. This adaptability fosters a more resilient and versatile AI infrastructure, guaranteeing continuous access to frontier capabilities and empowering businesses and nations to maintain technological independence. Fugu Ultra, its flagship variant, achieves maximum answer quality on complex, multi-step problems, further cementing Fugu's strategic value.
Dominating the Leaderboards
Fugu Ultra immediately made its mark, decisively outperforming Claude Fable 5 on LiveCodeBench, a dynamic benchmark for code-focused LLMs. Sakana AI AI’s orchestration system achieved a score of 93.2, surpassing Fable 5’s 89.8 on fresh, contamination-controlled competitive programming problems. This demonstrated Fugu Ultra’s potent command over complex coding challenges.
However, Fugu Ultra did not claim universal dominance. On SWE-Bench Pro, a benchmark designed for long-horizon software engineering tasks, Fable 5 maintained its lead. This distinction clarifies Fugu Ultra's design focus: it excels at individual complex tasks, whereas Fable 5 is purpose-built for sustained, multi-step software development.
Across a broader spectrum, Fugu Ultra consistently demonstrated its superior capabilities. It surpassed other leading models such as Opus 4.8, Gemini 3.1 Pro, and GPT 5.5 across a wide range of evaluations. These included benchmarks for coding, reasoning, and even humanities, underscoring the versatility of its multi-agent orchestration. For a deeper dive into its architecture, see Sakana AI Fugu: One Model to Command Them All.
From Theory to Reality: Fugu in Action
Fugu Ultra's capabilities extend far beyond benchmarks, demonstrating impressive real-world utility. The system conducted autonomous machine learning research, iteratively improving a small GPT model's training recipe. Over 14 hours on a single H100 GPU, Fugu ran more than 100 experiments, autonomously discovering enhancements in batch size, model depth, learning rate, and optimizer settings. This agent also achieved a notable 20% return in a financial time-series prediction test.
Enjoying this? Get one like it in your inbox each morning.
one email a day · unsubscribe in two clicks · no third-party tracking
Exhibiting superior reasoning and memory, Fugu Ultra excelled in complex cognitive tasks. In a blindfold chess challenge, it outplayed a 2100 ELO engine, showcasing its strategic depth. Furthermore, the system successfully wrote a functional Rubik's Cube solver from scratch, a feat where competing frontier models consistently failed to produce viable solutions.
Beyond abstract problem-solving, Fugu Ultra demonstrated remarkable proficiency in spatial and structural reasoning. It generated a fully functional mechanical iris in CAD, an intricate engineering design. This contrasted sharply with competing models, which produced only flawed or non-functional designs for the same task, underscoring Fugu Ultra's unique capability in practical generative design.
Frequently Asked Questions
What is Sakana Fugu Ultra?
Sakana Fugu Ultra is not a single, monolithic AI model. It is a multi-agent orchestration system that acts as a 'conductor,' intelligently delegating sub-tasks to a pool of specialized AI models to solve complex problems.
How does Fugu Ultra outperform models like Claude Fable 5?
Fugu Ultra's strength comes from its ability to select the best AI agent for each part of a task. By combining the strengths of various models and avoiding their weaknesses, it achieves superior performance on specific, complex benchmarks like LiveCodeBench.
What is the strategic advantage of Fugu's architecture?
Its architecture promotes 'AI sovereignty' by reducing dependency on a single model provider. If one model becomes unavailable due to export controls or other issues, Fugu can simply route tasks to other agents in its pool, ensuring resilience.
Is Fugu Ultra better than Claude Fable 5 at everything?
No. While Fugu Ultra excels on many benchmarks for complex, multi-step tasks, Claude Fable 5 was specifically designed for very long-running agentic tasks and still outperforms Fugu on benchmarks like SWE-Bench Pro that test this capability.
