TL;DR / Key Takeaways
Why OpenAI Declared a 'Code Red'
Code Red hit OpenAIâs inbox as an internal memo from Sam Altman circulated, according to people familiar with the matter, warning that the company could not treat Googleâs latest AI push as just another product cycle. The message: Geminiâs surge and Googleâs research blitz had crossed from background noise into an existential competitive threat.
Inside OpenAI, the memo landed against a backdrop of uncomfortable numbers. New thirdâparty data shows Geminiâs monthly active users climbing faster than ChatGPT, with Google leaning on Android, Search, and Chrome distribution to pump usage across billions of devices.
Google, for its part, has stopped playing defense. Over just a few weeks, the company rolled out Titans and MIRAS for longâcontext memory, prepped Nano Banana 2 Flash as a cheaper image model, and quietly tested AIâwritten news headlines on user phones, all while shoving Gemini deeper into Workspace and Android.
None of these moves stand alone. Together they form a coordinated assault: research breakthroughs that fix core Transformer weaknesses, productized models that undercut rivals on cost, and distribution plays that exploit Googleâs control over mobile and the web.
For OpenAI, the Gemini growth curve may be the loudest alarm bell. ChatGPT still dominates mindshare, but Googleâs ability to autoâonboard users through default integrations means even a slightly weaker model can win if it sits in front of more eyeballs, more often.
That threat arrives just as Google Research starts to chip away at OpenAIâs technical moat. Titansâ new Langzeitgedächtnisächtnis system claims context windows above 2 million tokens and benchmark wins over GPTâ4 and Llamaâ3.1â70B on longâsequence tests, hinting that Google can now handle sprawling histories without melting compute budgets.
Altmanâs memo reportedly urges teams to accelerate work on the companyâs next model, codenamed Garlic, and to rethink how quickly OpenAI can ship agents and memory systems of its own. The fear is not only losing users, but falling behind a rival that suddenly controls both the smarter architecture and the bigger audience.
Underneath the user charts and launch events, a deeper shift is brewing. Google is not just growing faster; it is betting on a fundamental change in how AI remembers, learns, and lives inside everyday devicesâand that is what really triggered Code Red.
The Amnesia Plaguing Every AI
Modern AI talks a big game about âunderstanding,â but under the hood most large language models live in a kind of fiveâminute fugue state. A model like GPTâ4 or Gemini only âremembersâ what fits inside a fixed context windowâa sliding buffer of a few thousand to maybe a million tokens that behaves like shortâterm memory on a loop.
Imagine talking to someone who forgets everything older than the last page of chat history. You can paste a 500âpage contract or years of emails, but once you overflow that window, the oldest details vanish, replaced by whatever arrived last. No matter how smart the model, anything outside that context might as well not exist.
Blame the standard Transformer architecture that powers almost every frontier LLM. Selfâattention compares every token with every other token, so compute and memory scale roughly quadratically: double the sequence length and you 4x the cost; go 10x and youâre staring at ~100x more work.
Past a few hundred thousand tokens, even heavily optimized Transformers start to buckle. Latency spikes, memory blows up, and quality degrades as models resort to tricks like sparse attention or aggressive truncation that quietly drop parts of your input. Thatâs why â2Mâtoken contextâ headlines usually hide brutal hardware bills and fragile behavior at the edge.
Older ideas like RNNs and modern StateâSpace Models (SSMs), including Mambaâstyle architectures, flip the tradeâoff. They process streams in linear time by folding history into a compact hidden state, so they breeze through millions of tokens without melting GPUs.
The catch: compressing an entire book, codebase, or customer history into a tiny state vector smears out detail. Subtle dependencies, rare edge cases, or that one critical line in a log file get averaged away, so the model responds fast but with a kind of statistical amnesia. You gain scale and lose precision.
This structural forgetfulness has become the biggest brake on true personalization and deep context. As long as models canât reliably carry rich Langzeitgedächtnisächtnis across sessions, devices, and tasks, âAI assistantsâ remain chat windows with goldfish brains, not systems that grow with you over months or years.
Meet Titans: The AI That Never Forgets
Google calls its new architecture Titans, and it reads like a direct answer to the âfiveâminute memoryâ problem haunting todayâs AI. Instead of stretching a standard Transformer until it breaks, Titans splices together two different memory systems and forces them to cooperate. The result: models that handle context windows above 2 million tokens without collapsing under their own compute.
At the core sits a familiar shortâterm memory: windowed selfâattention over the recent chunk of text. That window stays sharp and precise, so the model can track pronouns, code variables, and subtle phrasing in the last few thousand tokens. No lossy compression, no blurry summaries.
Alongside that, Titans adds a separate, persistent Langzeitgedächtnisächtnis module. This longâterm memory doesnât just cache raw text; it stores distilled representations of what actually mattered in earlier passages. Google describes three variants of this systemâMemoryâasâContext, MemoryâasâGates, and MemoryâasâLayersâeach wiring the stored knowledge back into the model in a different way.
The revolutionary twist: Titans updates this longâterm memory during inference. While you chat, code, or feed it documents, the memory module learns on the fly which pieces are surprising, useful, or rare and writes them into its internal store. No offline fineâtune, no retraining run, just continuous adjustment as the session unfolds.
Surprise drives the write decisions. When the model encounters something that deviates strongly from its expectationsâan edgeâcase API, a niche regulation, a userâs quirky preferenceâit flags that as highâvalue and commits it to longâterm memory. Less surprising, repetitive content gets a lower priority and eventually falls out of the store through smart forgetting rather than bruteâforce truncation.
Benchmarks hint at how big this shift could be. A Titans model with just 760 million parameters reportedly hits over 95% accuracy on NeedleâinâaâHaystack at 16,000 tokens and dominates the bAbIâLong benchmark, outscoring GPTâ4, RecurrentGemma 9B, Llama 3.1 70B, and even Llama 3 paired with retrieval tools. Long sequences stop being a pathological edge case and start looking like the default workload.
That turns AI from a static, preâtrained encyclopedia into a dynamic partner that remembers what you did last week. Titans can, in principle, build up a stable working history with a team, a codebase, or a research project and refine its behavior across sessions. Googleâs own writeâup, Titans + MIRAS: Helping AI have long-term memory, frames this as a step toward models that learn more like people doâincrementally, contextually, and without hitting reset every time you open a new chat.
The Genius Is In the 'Surprise'
Surprise sits at the heart of Titansâ new memory system. Instead of hoarding every token across a 2âmillionâplus context window, the model assigns a surprisal score to each chunk of text, measuring how far reality deviates from what its internal language model predicts. High-surprise events get written into Titansâ separate Langzeitgedächtnisächtnis, while predictable boilerplate scrolls past and vanishes.
That simple rule turns memory from a passive log into an active editor. A routine âThanks, talk tomorrowâ at the end of 500 emails never makes the cut; a oneâoff API key, a weird edgeâcase bug report, or a sudden policy change almost always does. Titans effectively compresses days of interaction into a sparse set of âyouâll regret forgetting thisâ moments.
Under the hood, surprise acts like a budget. Each memory slot carries a usefulness score derived from both its initial surprisal and how often Titans later reads it back successfully. When the budget fills, the model demotes lowâvalue entries first, pushing them out of active memory-as-context and into cheaper representations or dropping them entirely.
Google frames this as intelligent forgetting rather than deletion. Instead of a hard cutoff when you hit 128K or 1M tokens, relevance decays smoothly: a rarely used project spec slowly loses resolution, while an actively referenced design doc stays crisp. The memory module updates online during inference, so this decay happens continuously as Titans works.
That behavior looks uncannily human. Cognitive psychology shows that people encode novel, emotionally charged, or unexpected events far more strongly than daily routines; your first day at a new job outlives 200 ordinary Tuesdays. Titans bakes a similar bias into silicon: novelty gets a stronger write signal, repetition gets background noise treatment.
Human memory also forgets on purpose to stay efficient, and Titans mirrors that tradeoff. By allowing old, low-surprise traces to fade instead of clinging to everything, the system avoids the âfiveâminute genius, lifelong amnesiacâ trap of classic transformers. What remains is a long-lived narrative thread that highlights turning points, not timestamps.
Crushing the Competition: Titans vs. The World
Google did not just talk a big game with Titans; it brought benchmark receipts. On long-sequence tests that typically reduce large models to mush, a 760M-parameter Titans variant quietly posted numbers that embarrass systems more than 50x its size.
On the classic Needle-in-a-Haystack evaluation, Titans had to find a single planted fact hidden inside sprawling documents. At a 16,000-token context length, it hit over 95% accuracy, where many frontier models start dropping answers or hallucinating.
Long-context story understanding usually exposes models that only âsort ofâ remember earlier passages. On bAbI-Long, which forces systems to connect facts scattered across massive synthetic narratives, Titans did not just edge out rivals; it dominated the leaderboard.
Googleâs paper and subsequent analyses claim Titans outperformed a brutal comparison set on these long-range tasks: - GPT-4 - **Llama 3.1 70B** - RecurrentGemma 9B - Llama 3 paired with retrieval and search tools
That last result matters most. Retrieval-augmented setups bolt external memory and vector databases onto models like Llama to compensate for forgetfulness, yet Titansâ built-in Langzeitgedächtnisächtnis still won. Instead of juggling embeddings and external stores, Titans keeps an internal, trainable memory that updates on the fly.
Parameter counts tell the real story. While GPT-4 and Llama 3.1 70B live in the tens or hundreds of billions of parameters, Titansâ long-context star sits at just 760 million. You get performance that looks like a frontier model on multi-hundred-page inputs, at a cost profile closer to a mid-tier open-source LLM.
That efficiency unlocks deployment options the giants cannot touch. A subâbillionâparameter model that reads 2M+ tokens and still nails Needle-in-a-Haystack can run more cheaply in the cloud, fan out across fleets of GPUs, or even inch toward on-device scenarios.
Architecturally, Titansâ results suggest that smarter memory beats brute-force scale for long-context reasoning. If a 760M model can out-recall GPT-4 on million-token problems, the next arms race might not be about size at all, but about who builds the best brain.
Beyond Memory: MIRAS and the Continual Learner
MIRAS arrives not as yet another model, but as a unifying theory for how sequence models should remember, forget, and adapt. Google Research frames it as a roadmap that puts Transformers, Mamba, RWKV, DeltaNet, and Titans on the same map: different answers to the same four questions about memory form, storage rules, overwrite speed, and update dynamics.
Instead of handâwaving about âlong context,â MIRAS forces architects to specify what kind of Langzeitgedächtnisächtnis they want and how aggressively it should rewrite itself. That framing directly targets catastrophic forgetting, the longâstanding problem where a model fineâtuned on new skills quietly erases old ones because its parameters double as both brain and scratchpad.
Continual learning sits at the center of this roadmap. Rather than training once on a frozen pile of web text and calling it a day, MIRAS pushes for systems that update their memory online, during use, without wrecking previously acquired abilities.
Ilja Sutskever has described his north star as models that learn like a âtalented teenagerâ: constantly absorbing, revising, and integrating new experiences. MIRAS operationalizes that vision by treating usage as an ongoing training stream, not a readâonly inference phase.
Titans becomes the first big, public step along that MIRAS path. Its surpriseâdriven memory module, detailed in Titans: Learning to Memorize at Test Time, already behaves like a protoâcontinual learner, selectively writing unexpected events into a dedicated store instead of hammering them into the base weights.
Benchmarks hint at what that shift enables. A 760âmillionâparameter Titans variant holds its own against GPTâ4 and Llamaâ3.1â70B on longâsequence tasks, while updating its memory live across multiâmillionâtoken sessions.
Philosophically, MIRAS flips the script on how labs think about scale. Rather than only stacking more parameters and data, Google is betting that smarter, structured memoryâand models that never really stop learningâwill matter more than yet another 10 trillion tokens.
Your New Coworker Is an Agent Named Lux
Your next âAI coworkerâ might not be a chatbox in a sidebar, but a cursor quietly moving across your own desktop. That is the bet from the Open AGI Foundation with Lux, a new kind of model that treats the computer itself as the interface. Instead of prompting a bot and hoping an API exists, you point Lux at a screen and it just starts working.
Lux describes itself as a computer usage model, and that phrase is doing a lot of work. The system ingests raw pixels, parses buttons, menus, and forms, then issues lowâlevel actions: clicks, scrolls, key presses, window switches. It can operate full desktops, browsers, spreadsheets, code editors, even stubborn legacy tools that never got a web API.
This moves Lux out of the âassistantâ category and into infrastructure territory. You can wire it into a remote VM and have it reconcile invoices in a browser, crossâcheck data in a desktop spreadsheet, then draft followâup emails in Outlook. For enterprises drowning in brittle RPA scripts and halfâfinished integrations, a screenânative agent starts to look like a universal adapter.
Benchmark numbers back up the swagger. On Mind2Web, an online benchmark built from more than 300 realâworld tasks across live websites, Lux scores 83.6, a massive jump over Googleâs Gemini at 69.0 and OpenAIâs best model at 61.3. Same tasks, same messy web, radically different success rate.
Mind2Web is brutal by design. Agents must navigate login walls, weird layouts, infinite scroll, popâups, and inconsistent UI patterns to complete multiâstep goals like booking travel, checking order histories, or digging through account settings. Luxâs margin on this benchmark suggests it is not just memorizing flows, but actually building a working model of how interfaces behave.
That edge comes from what its creators call agentic active preâtraining. Instead of only learning from static logs or synthetic instructions, Lux spends preâtraining time acting inside real environments, exploring UIs, failing, and correcting. The model internalizes patterns like âfilters hide behind funnel iconsâ or âconfirmation dialogs often invert button colors,â which transfer across apps.
You can think of it as the difference between reading a manual and actually driving a car. Traditional LLM agents âread the manualâ of web APIs and DOM trees; Lux logs millions of hours behind the wheel of live software. That embodied experience gives it a more intuitive, humanlike grasp of user interfacesâand makes âyour new coworkerâ sound less like hype and more like an imminent product category.
Google's Two-Pronged Attack: Speed and Controversy
Google is not betting everything on Titansâ long Langzeitgedächtnisächtnis. In parallel, the company is pushing a second front: raw distribution and cheap generative media. Internal growth data cited by thirdâparty analytics shows Geminiâs monthly active users climbing faster than ChatGPTâs, and Google wants matching firepower in images and UI experiments.
Enter Nano Banana 2 Flash, a new image model tuned for cost and speed rather than leaderboard glory. Positioned as a ânearâproâ version of Googleâs flagship image system, it aims to deliver almost Proâlevel quality at a fraction of the compute cost. That matters for billions of lowâmargin image calls in Search, Android, Docs, and ad tooling.
Think of Nano Banana 2 Flash as Googleâs bulkâink cartridge for generative art. You do not print museum pieces with it; you flood the web with thumbnails, social cards, stickers, and product mockups. If Google can undercut Midjourney, DALL¡E, and Stability on price while keeping quality âgood enough,â it controls the mass market for AI images.
At the same time, Google quietly ran a very different experiment: AIârewritten news headlines inside Google Discover. Instead of showing publishersâ original titles, an internal model generated new ones on the fly, sometimes reframing stories with stronger emotional hooks or different emphases. Users saw these synthetic headlines without any clear label or optâout.
Publishers noticed. Reports from Scandinavian and European outlets described headlines that distorted tone or meaning, including crime stories that sounded more sensational and political pieces that downplayed key context. Editors argued that Googleâs AI effectively became an unaccountable coâauthor sitting between their newsroom and their audience.
Backlash came fast because it hits a longâsimmering fault line. Platforms already control distribution, ad markets, and now increasingly the language that frames journalism. When an AI headline can change how a corruption probe or climate report feels, editorial judgment shifts from newsrooms to ranking systems and model weights.
The Discover test shows how quickly âassistive AIâ turns into editorial AI. Titans and Nano Banana 2 Flash chase scale and speed, but the headline controversy exposes the trade: tech platforms want to rewrite not just content, but how the world encounters it.
The Numbers Don't Lie: Gemini's Growth Is Real
Code Red stopped being a metaphor once the download charts arrived. According to SensorTower data cited in recent reports, Geminiâs mobile app now ranks among the fastestâgrowing AI products ever, with monthly active users climbing at a pace that dwarfs ChatGPTâs yearâoverâyear gains.
ChatGPT still dominates on raw scale, with hundreds of millions of users and the most recognizable brand in consumer AI. But SensorTowerâs curves tell a different story about momentum: Geminiâs MAUs grow multiple times faster monthâtoâmonth, especially in markets where Google can preâinstall or aggressively surface the app.
That velocity matters more than bragging rights. Rapid MAU growth feeds a flywheel of: - More developer interest in Gemini APIs - More enterprise pilots that want Googleâscale reliability - More consumer trust that this isnât a deadâend experiment
For developers, Geminiâs ascent means a credible alternative to OpenAI that plugs directly into Android, Chrome, and Google Cloud. When your target users already live inside Gmail, Docs, and Search, building on Googleâs stack starts to look less like a risk and more like an inevitability.
Enterprises read the same charts and see negotiating leverage. A fastâgrowing Gemini gives CIOs cover to demand better pricing, dataâresidency guarantees, and multiâvendor strategies that pit OpenAI, Google, Microsoft, and Anthropic against each other.
Google, meanwhile, quietly exploits its distribution machine. Gemini suggestions in Android, AI features in Workspace, and Geminiâpowered search experiments all funnel ordinary users into Googleâs ecosystem without requiring a separate âAI appâ decision.
That is the real Code Red for OpenAI: not that Gemini has already won, but that Google finally aligned research, product, and distribution. Titans, MIRAS, and the broader Gemini stack now ship into an audience counted in the billions, and every incremental feature update rides that rail. For anyone tracking the technical underpinnings, Googleâs longâcontext work sits alongside open implementations in the Google Research GitHub Repository, underscoring how quickly these ideas can propagate.
The New AI Battlefield Is Here
Code Red no longer describes a single companyâs panic; it describes a new AI battlefield. Titans gives Google a model that can juggle 2âmillionâplus token contexts with a real Langzeitgedächtnisächtnis, updating its memory live instead of pretending every conversation resets to zero. Benchmarks like NeedleâinâaâHaystack at >95% accuracy and dominance on bAbIâLong show those gains are not just marketing slides.
Layer MIRAS on top and you get a roadmap, not a oneâoff model. MIRAS reframes Transformers, Mamba, RWKV, and friends as different answers to four questions about memory shape, storage rules, decay speed, and update dynamics. That turns âbigger context windowâ into a design space for continuously learning systems.
Meanwhile Lux attacks a different front: control. Lux looks at your actual screen, parses UI elements, and issues clicks, scrolls, and keypresses to complete real tasks across browsers, spreadsheets, and email clients. On the Mind2Web benchmark of 300+ realâworld website tasks, it posts around 83.6% success, putting older âagentâ demos that rely on fragile APIs to shame.
Distribution pressure comes from Gemini and Nano Banana 2 Flash. Sensor Towerâstyle data shows Geminiâs monthly active users climbing faster than ChatGPTâs, aided by deep Android and Chrome integration. Nano Banana 2 Flash, a cheaper, faster image model that nearly matches its Pro sibling, positions Google to flood midârange phones and web apps with âgood enoughâ multimodal AI.
Google now fights a multiâfront war:
- 1Foundational architecture: Titans and MIRAS redefine how models remember and learn.
- 2Practical agency: Luxâstyle computerâuse agents turn LLMs into full desktop operators.
- 3Market distribution: Gemini growth, Nano Banana, and AIâtuned headlines push this stack into everyday feeds and devices.
Static, onceâtrainedâthenâfrozen models look increasingly like last decadeâs playbook. The next phase centers on agents that remember months of interaction history, adapt policies on the fly, and live inside operating systems, browsers, and productivity suites. All of that lands squarely on OpenAIâs doorstep: its nextâgeneration model, Garlic, now has to prove it can match Titansâ memory, Luxâlevel agency, and Geminiâscale reach, or risk watching Google set the rules for AIâs second act.
Frequently Asked Questions
What is Google Titans?
Titans is a new AI architecture from Google Research designed to give models a true long-term memory. It separates short-term processing from a long-term memory module that learns and updates continuously during use.
How does Titans' memory work?
Titans decides what to store based on 'surprise.' The more unexpected or novel a piece of information is, the more likely it is to be saved, allowing the AI to build a memory of key facts efficiently.
Is Google Titans better than GPT-4?
On specific long-context benchmarks, which test an AI's ability to recall information from vast amounts of text, the video and related reports claim Titans significantly outperforms models like GPT-4 and Llama 3.1.
What is MIRAS?
MIRAS is a framework introduced alongside Titans. It provides the rules and methods for models to learn continuously from new data without forgetting past knowledge, moving AI closer to a state of perpetual learning.