AI Tool

MiMo V2.5 Pro UltraSpeed Review

Name: MiMo V2.5 Pro UltraSpeed
Availability: OnlineOnly
Author: Stork.AI

MiMo V2.5 Pro UltraSpeed is a 1-trillion-parameter Mixture-of-Experts AI model developed by Xiaomi and TileRT, designed for extremely fast text generation on standard hardware.

shipped Jun 14, 2026aifreemium

Domain rating80Traffic rankoutside top 1MAI-readablepartial

MiMo V2.5 Pro UltraSpeed — product screenshot

Why it matters

1Features a 1-trillion-parameter Mixture-of-Experts (MoE) AI model architecture.

2Achieves inference speeds of up to 1000 tokens per second (TPS), with peak demos reaching 1200 TPS.

3Officially released on June 8, 2026, in collaboration with TileRT.

4Includes an open-sourced MiMo-V2.5-Pro-FP4-DFlash checkpoint on Hugging Face.

Stork’s verdict on MiMo V2.5 Pro UltraSpeed

It delivers 1000 tokens per second for demanding tasks, but its EU AI Act compliance is currently listed as 'unknown'.

MiMo V2.5 Pro UltraSpeed reviewed by Stork AI · stork.ai/en/mimo-v2-5-pro-ultraspeed

About MiMo V2.5 Pro UltraSpeed

Business Model

Open Source

Headquarters

Beijing, China

Funding

Public

Platforms

Web, API

Target Audience

Developers and programmers

Leadership

Lei JunFounder & CEO

API DocsOpen Source

Specs

API Docs

View Documentation →

API Available

Yes, public API

overview

What is MiMo V2.5 Pro UltraSpeed?

MiMo V2.5 Pro UltraSpeed is a 1-trillion-parameter Mixture-of-Experts AI model developed by Xiaomi and TileRT that enables developers and programmers to achieve extremely fast text generation on standard hardware. It is designed for demanding real-time scenarios where low latency is critical, reaching inference speeds of up to 1000 tokens per second. This model is Xiaomi's flagship AI offering, providing high-speed AI reasoning and generation across various applications. The platform, Xiaomi MiMo, aims to foster human-machine collaboration by delivering an end-to-end AI product experience, from advanced model inference to agent applications, and supports multimodal understanding across text, image, video, and audio.

features

Key Features of MiMo V2.5 Pro UltraSpeed

MiMo V2.5 Pro UltraSpeed integrates a suite of advanced features designed for high-performance AI applications and developer utility.

1-trillion-parameter Mixture-of-Experts (MoE) AI model architecture.
Extremely fast text generation, achieving up to 1000 tokens per second (TPS) on standard hardware.
Multimodal understanding capabilities across text, image, video, and audio inputs.
Advanced long-range reasoning for complex tasks.
Integrated Speech Synthesis (TTS) and Automatic Speech Recognition (ASR) functionalities.
Terminal-based coding agent for automated programming tasks and long-horizon code generation.
Open-sourced components, including the MiMo-V2.5-Pro-FP4-DFlash checkpoint, under an MIT license.
Developer API available for integration into custom applications and workflows.
Support for automated programming tasks and long-horizon task assistance.

use cases

Who Should Use MiMo V2.5 Pro UltraSpeed?

MiMo V2.5 Pro UltraSpeed is engineered for professionals and organizations requiring high-speed, intelligent AI capabilities across various demanding scenarios.

Developers and Engineers: For AI coding assistance in long-horizon tasks, building agentic platforms for real-world task planning, and accessing large language models (LLMs) via API.
Researchers: For high-speed agent workflows, enabling instant generation and validation of large-scale hypotheses, and reducing human-machine latency.
Quantitative Traders and Financial Analysts: For real-time decision loops, analyzing market impact, and generating trading signals within milliseconds.
Fraud Detection and Risk Assessment Specialists: For performing complex fraud reasoning and risk assessment in hundreds of milliseconds.
Enterprises and Startups: For latency-sensitive applications that require rapid output, such as live agent loops and parallel reasoning chains, where traditional models introduce unacceptable delays.

how to use

How to Use MiMo V2.5 Pro UltraSpeed

Access to MiMo V2.5 Pro UltraSpeed is primarily facilitated through its developer API and open-source components. Users can integrate the model into their existing systems or leverage its capabilities for new applications.

1Access the MiMo V2.5 Pro UltraSpeed API via the official developer portal at https://mimo.xiaomi.com/open/docs for integration into custom applications.
2Utilize the open-sourced MiMo-V2.5-Pro-FP4-DFlash checkpoint available on Hugging Face for local deployment, fine-tuning, or research purposes.
3Integrate the API for high-speed text generation, multimodal understanding, and long-range reasoning in real-time applications.
4Leverage the terminal-based coding agent for automated programming tasks, code generation, and long-horizon software engineering projects.
5Monitor Xiaomi's official announcements for updates on API access, trial extensions, and new features for the UltraSpeed model.

pricing

MiMo V2.5 Pro UltraSpeed Pricing & Plans

MiMo V2.5 Pro UltraSpeed operates on a freemium model, with specific pricing for the UltraSpeed API often subject to promotional periods and application-based access. The broader MiMo platform offers tiered pricing based on token usage and subscription plans.

MiMo V2.5 Pro UltraSpeed API: Initially offered via a limited-time, application-based trial (June 9 to June 23, 2026), with extensions based on resource availability. Specific long-term pricing for UltraSpeed is not publicly detailed beyond promotional access.
MiMo-V2.5 (Standard Model): Input tokens are priced at $0.0004 per 1k tokens, and output tokens at $0.002 per 1k tokens.
MiMo-V2.5-Pro (up to 256K context): Input tokens are priced at $0.001 per 1k tokens, and output tokens at $0.003 per 1k tokens.
MiMo-V2.5-Pro (256K-1M context): Input tokens are priced at $0.002 per 1k tokens, and output tokens at $0.006 per 1k tokens.
Subscription Plans (for MiMo platform): The Lite plan offers 4.1 billion credits per month, and the Standard plan offers 11 billion credits per month. API rate limits for these plans are currently unknown.

Pros

+Exceptional inference speed, consistently reaching over 1000 tokens per second (TPS) for demanding real-time applications.
+Utilizes a 1-trillion-parameter Mixture-of-Experts (MoE) architecture for efficient and scalable AI processing.
+Designed specifically for low-latency scenarios, enabling previously unfeasible applications like high-frequency trading and instant coding agents.
+Offers comprehensive multimodal understanding across text, image, video, and audio inputs.
+Includes open-source components (MiMo-V2.5-Pro-FP4-DFlash checkpoint) providing flexibility for developers and researchers.
+Part of Xiaomi's end-to-end AI platform, offering a broad range of AI product experiences and fostering human-machine collaboration.

Cons

−UltraSpeed API access was initially limited to an application-based trial, suggesting potential restrictions or variable availability for general use.
−Some users reported connectivity issues and API pauses (1-3 minutes) during the preview phase, which could impact reliability.
−Specific long-term pricing details for the UltraSpeed variant beyond promotional periods are not fully transparent.
−The 'provider' and 'deployer' for EU AI Act obligations are currently listed as 'unknown', indicating potential compliance clarity gaps.
−Requires integration via API, which necessitates developer resources and technical expertise for implementation.

Similar Tools

MiMo V2.5 Pro UltraSpeed vs Competitors

MiMo V2.5 Pro UltraSpeed positions itself as a leader in inference speed for large-scale AI models, particularly for real-time, low-latency applications, while also offering a comprehensive AI platform.

Mistral AI (Mixtral 8x7B)On Stork Compare

Mistral AI offers highly efficient and powerful open-source models, including a Mixture-of-Experts (MoE) architecture that balances performance with computational efficiency.

Like MiMo V2.5 Pro UltraSpeed, Mixtral 8x7B utilizes a Mixture-of-Experts architecture, focusing on efficient and fast text generation, making it a direct architectural and performance competitor. Being open-source, it offers flexibility for deployment on various hardware, similar to MiMo's focus on standard hardware.

Google Gemini (Gemini 3.1 Flash-Lite)↗

Google Gemini offers a family of multimodal AI models, with Gemini 3.1 Flash-Lite specifically designed for strong performance at scale and affordability, emphasizing speed.

Gemini 3.1 Flash-Lite directly competes on speed and cost-efficiency, offering a 2.5x faster time to first answer token and a 45% increase in output speed compared to Gemini 2.5 Flash, aligning with MiMo V2.5 Pro UltraSpeed's focus on extremely fast text generation.

Anthropic (Claude 3 Haiku)On Stork Compare

Claude 3 Haiku is Anthropic's fastest and most compact model, engineered for near-instant responsiveness and high-volume enterprise applications.

Similar to MiMo V2.5 Pro UltraSpeed, Claude 3 Haiku prioritizes speed and efficiency, aiming for near-instant text generation, making it a strong competitor for applications requiring rapid output on potentially less powerful systems.

OpenAI (GPT-4o)On Stork Compare

OpenAI's GPT-4o is a leading multimodal AI model renowned for its broad capabilities in understanding and generating human-like text, with continuous optimization for speed and cost.

GPT-4o offers a highly capable and continuously optimized model for text generation, competing with MiMo V2.5 Pro UltraSpeed on overall performance and speed, and is widely accessible through a freemium model via ChatGPT.

See every MiMo V2.5 Pro UltraSpeed alternative, compared→

Visit MiMo V2.5 Pro UltraSpeed↗