Gemini 3.5 Flash Review: Faster but More Expensive Than You Think

💡

TL;DR / Key Takeaways

Google claims Gemini 3.5 Flash delivers top-tier AI performance at incredible speed for less. But third-party data reveals a shocking truth about its real-world costs and coding abilities.

Blazing Speed, Baffling Benchmarks

Google touts Gemini 3.5 Flash for its exceptional speed, a standout feature hitting 278 tokens per second. This remarkable velocity positions Flash significantly ahead of competitors like GPT-5.5 and Opus 4.7, massively outperforming even models like Haiku and other open-source OpenAI offerings. If raw output is the priority, Flash leads the pack.

Internally, Google's own benchmarks suggest Flash’s coding skills align with top-tier models. The company’s data indicates performance only a few percent behind GPT-5.5 on SW Bench Pro and Terminal Bench. It even reportedly beats Opus 4.7 on Terminal Bench by around 10%, though Opus 4.7 gains an advantage on SW Bench Pro.

Yet, independent analysis from Artificial Analysis reveals a contradictory reality. Third-party evaluations cast significant doubt on Google’s optimistic claims, providing a less flattering assessment of Flash’s true capabilities in critical areas.

On Artificial Analysis's independent coding index, Flash scores a mediocre 45. This places it not only behind formidable rivals like Kimi K2.6 but, surprisingly, also behind Google's own older Gemini 3.1 Pro model. Flash’s performance in coding intelligence appears to be a significant weakness, failing to meet the "frontier performance" Google advertises, despite its impressive speed.

The $1,500 Price Tag Google Didn't Mention

Google’s marketing touts Gemini 3.5 Flash as remarkably cheap, priced at just $1.50 per million input tokens. On paper, this positions Flash as a budget-friendly option, seemingly undercutting rivals like Opus 4.7 and GPT-5.5. However, this appealing claim falls apart dramatically under real-world testing, revealing a significantly different cost structure than Google advertises.

Independent analysis by Artificial Analysis exposed the true operational expenses. Running its standard intelligence benchmark with Flash cost a staggering $1,552. This figure represents a 5.5 times higher expense than its predecessor, Gemini 3 Flash, and is 75% more expensive than Gemini 3.1 Pro. Crucially, Flash proved more costly than even higher-performing models like GPT-5.5 when used for high-reasoning tasks, where GPT-5.5 significantly outperforms Flash in coding.

The underlying reason for this exorbitant cost lies in the model's extreme token hungry behavior. During agentic evaluations, Gemini 3.5 Flash averaged 49 turns per task. This metric is one of the highest recorded across all models tested, indicating an aggressive consumption of input tokens. Such a high token burn rate drives up the final bill, rendering the initial per-token pricing deceptive and negating its perceived value.

Meet Antigravity, Google's Codex Clone

Google didn't just unveil Flash; it also launched Antigravity 2.0, a new standalone coding agent app. This application immediately strikes developers with its uncanny resemblance to existing powerhouses like Codex and Cursor. Moving away from its previous incarnation as an IDE, Antigravity 2.0 now functions as a dedicated application, offering a familiar interface for managing AI conversations and coding projects.

Initial tests reveal Antigravity 2.0 excels at simpler UI-focused tasks. When prompted to create a basic cafe website, the agent produced a visually appealing and functional design, even outperforming Opus 4.7 in a direct comparison. This suggests Flash’s underlying capabilities are particularly adept at generating clean, modern user interfaces, albeit with a slight "AI feel" characterized by common card and gradient styles.

However, the agent's performance diverges significantly when tackling more complex full-stack applications, such as a personal finance dashboard. While Antigravity 2.0 successfully generated a working application much faster, its UI often felt generic and lacked the sophisticated polish seen in competitor outputs. This indicates a disparity in its ability to handle intricate architectural design versus rapid, surface-level aesthetic generation.

The Gemini CLI is Dead. What's Next?

In a disruptive move, Google announced it will shut down the open-source Gemini CLI on June 18th. This action forces developers to transition to the new, closed-source Antigravity CLI, developed in Go. The abrupt change signals a clear strategic shift, moving away from community-contributed open-source initiatives towards Google's proprietary ecosystem.

Ultimately, Gemini 3.5 Flash emerges as a niche offering. Its blazing speed, hitting 278 tokens per second, makes it a top choice for those prioritizing raw throughput and advanced agentic capabilities. However, its weak coding performance, scoring a mere 45 on Artificial Analysis’s coding index (falling below Kimi K2.6 and even Gemini 3.1 Pro), coupled with prohibitively high operational costs, position it poorly for general development. Artificial Analysis found running the intelligence index cost $1,552, a staggering 5.5 times more than Gemini 3 Flash and even surpassing GPT 5.5 for high-reasoning coding tasks.

This release points to a potential strategic shift for Google. The company appears to de-emphasize the high-end developer market and its bleeding-edge AI tooling. Instead, Google seems to concentrate its formidable AI resources on seamlessly integrating these advancements into its vast array of mass-market consumer products, including Search, Workspace, and Android, aiming for broader user impact rather than specialized developer adoption.

Frequently Asked Questions

What is Gemini 3.5 Flash?

Gemini 3.5 Flash is Google's latest AI model, designed for speed and efficiency. It features a 1 million token context window and multimodal capabilities, but its real-world performance and cost are subjects of debate.

Is Gemini 3.5 Flash better than GPT-5.5 or Opus 4.7?

It depends on the task. Flash is significantly faster than both. However, third-party benchmarks show its coding abilities are weaker, and while it's strong in agentic workflows, it's not a clear winner over models like Opus 4.7 in overall quality.

Why is Gemini 3.5 Flash expensive in practice?

Despite a low per-token price, the model is described as 'token hungry.' It uses a high number of input tokens and turns to complete tasks, leading to significantly higher real-world costs for complex jobs than its pricing suggests.

What is Antigravity 2.0?

Antigravity 2.0 is Google's new standalone AI coding agent, replacing the previous IDE version. It functions similarly to other tools like Codex and Cursor, providing an interface for AI-assisted software development.

𝕏 in ↑↗

One weekly email of tools worth shipping. No drip funnel.

one email per week · unsubscribe in two clicks · no third-party tracking

Frequently Asked Questions

What is Gemini 3.5 Flash?

Is Gemini 3.5 Flash better than GPT-5.5 or Opus 4.7?

Why is Gemini 3.5 Flash expensive in practice?

What is Antigravity 2.0?

Google's New AI Is Deceptively Fast

TL;DR / Key Takeaways

Blazing Speed, Baffling Benchmarks

The $1,500 Price Tag Google Didn't Mention

Meet Antigravity, Google's Codex Clone

The Gemini CLI is Dead. What's Next?

Frequently Asked Questions

What is Gemini 3.5 Flash?

Is Gemini 3.5 Flash better than GPT-5.5 or Opus 4.7?

Why is Gemini 3.5 Flash expensive in practice?

What is Antigravity 2.0?

One weekly email of tools worth shipping. No drip funnel.

Frequently Asked Questions

Read Next

Build Your 'Mini-Me' AI in 5 Minutes

The AI Trap That's Downgrading Your Brain

Vercel Built a Language for AI. Why?

Stay Ahead of the AI Curve