Google Titans AI: A New Era for Long-Term Memory in Language Models

💡

TL;DR / Key Takeaways

Google just unveiled an AI with a working long-term memory that crushes every existing benchmark. This new architecture, codenamed Titans, finally solves AI's biggest weakness and changes the game for good.

Why OpenAI Declared a 'Code Red'

Code Red hit OpenAI’s inbox as an internal memo from Sam Altman circulated, according to people familiar with the matter, warning that the company could not treat Google’s latest AI push as just another product cycle. The message: Gemini’s surge and Google’s research blitz had crossed from background noise into an existential competitive threat.

Inside OpenAI, the memo landed against a backdrop of uncomfortable numbers. New third‑party data shows Gemini’s monthly active users climbing faster than ChatGPT, with Google leaning on Android, Search, and Chrome distribution to pump usage across billions of devices.

Google, for its part, has stopped playing defense. Over just a few weeks, the company rolled out Titans and MIRAS for long‑context memory, prepped Nano Banana 2 Flash as a cheaper image model, and quietly tested AI‑written news headlines on user phones, all while shoving Gemini deeper into Workspace and Android.

None of these moves stand alone. Together they form a coordinated assault: research breakthroughs that fix core Transformer weaknesses, productized models that undercut rivals on cost, and distribution plays that exploit Google’s control over mobile and the web.

For OpenAI, the Gemini growth curve may be the loudest alarm bell. ChatGPT still dominates mindshare, but Google’s ability to auto‑onboard users through default integrations means even a slightly weaker model can win if it sits in front of more eyeballs, more often.

That threat arrives just as Google Research starts to chip away at OpenAI’s technical moat. Titans’ new Langzeitgedächtnisächtnis system claims context windows above 2 million tokens and benchmark wins over GPT‑4 and Llama‑3.1‑70B on long‑sequence tests, hinting that Google can now handle sprawling histories without melting compute budgets.

Altman’s memo reportedly urges teams to accelerate work on the company’s next model, codenamed Garlic, and to rethink how quickly OpenAI can ship agents and memory systems of its own. The fear is not only losing users, but falling behind a rival that suddenly controls both the smarter architecture and the bigger audience.

Underneath the user charts and launch events, a deeper shift is brewing. Google is not just growing faster; it is betting on a fundamental change in how AI remembers, learns, and lives inside everyday devices—and that is what really triggered Code Red.

The Amnesia Plaguing Every AI

Modern AI talks a big game about “understanding,” but under the hood most large language models live in a kind of five‑minute fugue state. A model like GPT‑4 or Gemini only “remembers” what fits inside a fixed context window—a sliding buffer of a few thousand to maybe a million tokens that behaves like short‑term memory on a loop.

Imagine talking to someone who forgets everything older than the last page of chat history. You can paste a 500‑page contract or years of emails, but once you overflow that window, the oldest details vanish, replaced by whatever arrived last. No matter how smart the model, anything outside that context might as well not exist.

Blame the standard Transformer architecture that powers almost every frontier LLM. Self‑attention compares every token with every other token, so compute and memory scale roughly quadratically: double the sequence length and you 4x the cost; go 10x and you’re staring at ~100x more work.

Past a few hundred thousand tokens, even heavily optimized Transformers start to buckle. Latency spikes, memory blows up, and quality degrades as models resort to tricks like sparse attention or aggressive truncation that quietly drop parts of your input. That’s why “2M‑token context” headlines usually hide brutal hardware bills and fragile behavior at the edge.

Older ideas like RNNs and modern State‑Space Models (SSMs), including Mamba‑style architectures, flip the trade‑off. They process streams in linear time by folding history into a compact hidden state, so they breeze through millions of tokens without melting GPUs.

The catch: compressing an entire book, codebase, or customer history into a tiny state vector smears out detail. Subtle dependencies, rare edge cases, or that one critical line in a log file get averaged away, so the model responds fast but with a kind of statistical amnesia. You gain scale and lose precision.

This structural forgetfulness has become the biggest brake on true personalization and deep context. As long as models can’t reliably carry rich Langzeitgedächtnisächtnis across sessions, devices, and tasks, “AI assistants” remain chat windows with goldfish brains, not systems that grow with you over months or years.

Meet Titans: The AI That Never Forgets

Google calls its new architecture Titans, and it reads like a direct answer to the “five‑minute memory” problem haunting today’s AI. Instead of stretching a standard Transformer until it breaks, Titans splices together two different memory systems and forces them to cooperate. The result: models that handle context windows above 2 million tokens without collapsing under their own compute.

At the core sits a familiar short‑term memory: windowed self‑attention over the recent chunk of text. That window stays sharp and precise, so the model can track pronouns, code variables, and subtle phrasing in the last few thousand tokens. No lossy compression, no blurry summaries.

Alongside that, Titans adds a separate, persistent Langzeitgedächtnisächtnis module. This long‑term memory doesn’t just cache raw text; it stores distilled representations of what actually mattered in earlier passages. Google describes three variants of this system—Memory‑as‑Context, Memory‑as‑Gates, and Memory‑as‑Layers—each wiring the stored knowledge back into the model in a different way.

The revolutionary twist: Titans updates this long‑term memory during inference. While you chat, code, or feed it documents, the memory module learns on the fly which pieces are surprising, useful, or rare and writes them into its internal store. No offline fine‑tune, no retraining run, just continuous adjustment as the session unfolds.

Surprise drives the write decisions. When the model encounters something that deviates strongly from its expectations—an edge‑case API, a niche regulation, a user’s quirky preference—it flags that as high‑value and commits it to long‑term memory. Less surprising, repetitive content gets a lower priority and eventually falls out of the store through smart forgetting rather than brute‑force truncation.

Benchmarks hint at how big this shift could be. A Titans model with just 760 million parameters reportedly hits over 95% accuracy on Needle‑in‑a‑Haystack at 16,000 tokens and dominates the bAbI‑Long benchmark, outscoring GPT‑4, RecurrentGemma 9B, Llama 3.1 70B, and even Llama 3 paired with retrieval tools. Long sequences stop being a pathological edge case and start looking like the default workload.

That turns AI from a static, pre‑trained encyclopedia into a dynamic partner that remembers what you did last week. Titans can, in principle, build up a stable working history with a team, a codebase, or a research project and refine its behavior across sessions. Google’s own write‑up, Titans + MIRAS: Helping AI have long-term memory, frames this as a step toward models that learn more like people do—incrementally, contextually, and without hitting reset every time you open a new chat.

The Genius Is In the 'Surprise'

Surprise sits at the heart of Titans’ new memory system. Instead of hoarding every token across a 2‑million‑plus context window, the model assigns a surprisal score to each chunk of text, measuring how far reality deviates from what its internal language model predicts. High-surprise events get written into Titans’ separate Langzeitgedächtnisächtnis, while predictable boilerplate scrolls past and vanishes.

That simple rule turns memory from a passive log into an active editor. A routine “Thanks, talk tomorrow” at the end of 500 emails never makes the cut; a one‑off API key, a weird edge‑case bug report, or a sudden policy change almost always does. Titans effectively compresses days of interaction into a sparse set of “you’ll regret forgetting this” moments.

Under the hood, surprise acts like a budget. Each memory slot carries a usefulness score derived from both its initial surprisal and how often Titans later reads it back successfully. When the budget fills, the model demotes low‑value entries first, pushing them out of active memory-as-context and into cheaper representations or dropping them entirely.

Google frames this as intelligent forgetting rather than deletion. Instead of a hard cutoff when you hit 128K or 1M tokens, relevance decays smoothly: a rarely used project spec slowly loses resolution, while an actively referenced design doc stays crisp. The memory module updates online during inference, so this decay happens continuously as Titans works.

That behavior looks uncannily human. Cognitive psychology shows that people encode novel, emotionally charged, or unexpected events far more strongly than daily routines; your first day at a new job outlives 200 ordinary Tuesdays. Titans bakes a similar bias into silicon: novelty gets a stronger write signal, repetition gets background noise treatment.

Human memory also forgets on purpose to stay efficient, and Titans mirrors that tradeoff. By allowing old, low-surprise traces to fade instead of clinging to everything, the system avoids the “five‑minute genius, lifelong amnesiac” trap of classic transformers. What remains is a long-lived narrative thread that highlights turning points, not timestamps.

Crushing the Competition: Titans vs. The World

Google did not just talk a big game with Titans; it brought benchmark receipts. On long-sequence tests that typically reduce large models to mush, a 760M-parameter Titans variant quietly posted numbers that embarrass systems more than 50x its size.

On the classic Needle-in-a-Haystack evaluation, Titans had to find a single planted fact hidden inside sprawling documents. At a 16,000-token context length, it hit over 95% accuracy, where many frontier models start dropping answers or hallucinating.

Long-context story understanding usually exposes models that only “sort of” remember earlier passages. On bAbI-Long, which forces systems to connect facts scattered across massive synthetic narratives, Titans did not just edge out rivals; it dominated the leaderboard.

Google’s paper and subsequent analyses claim Titans outperformed a brutal comparison set on these long-range tasks: - GPT-4 - **Llama 3.1 70B** - RecurrentGemma 9B - Llama 3 paired with retrieval and search tools

That last result matters most. Retrieval-augmented setups bolt external memory and vector databases onto models like Llama to compensate for forgetfulness, yet Titans’ built-in Langzeitgedächtnisächtnis still won. Instead of juggling embeddings and external stores, Titans keeps an internal, trainable memory that updates on the fly.

Parameter counts tell the real story. While GPT-4 and Llama 3.1 70B live in the tens or hundreds of billions of parameters, Titans’ long-context star sits at just 760 million. You get performance that looks like a frontier model on multi-hundred-page inputs, at a cost profile closer to a mid-tier open-source LLM.

That efficiency unlocks deployment options the giants cannot touch. A sub‑billion‑parameter model that reads 2M+ tokens and still nails Needle-in-a-Haystack can run more cheaply in the cloud, fan out across fleets of GPUs, or even inch toward on-device scenarios.

Architecturally, Titans’ results suggest that smarter memory beats brute-force scale for long-context reasoning. If a 760M model can out-recall GPT-4 on million-token problems, the next arms race might not be about size at all, but about who builds the best brain.

Beyond Memory: MIRAS and the Continual Learner

MIRAS arrives not as yet another model, but as a unifying theory for how sequence models should remember, forget, and adapt. Google Research frames it as a roadmap that puts Transformers, Mamba, RWKV, DeltaNet, and Titans on the same map: different answers to the same four questions about memory form, storage rules, overwrite speed, and update dynamics.

Instead of hand‑waving about “long context,” MIRAS forces architects to specify what kind of Langzeitgedächtnisächtnis they want and how aggressively it should rewrite itself. That framing directly targets catastrophic forgetting, the long‑standing problem where a model fine‑tuned on new skills quietly erases old ones because its parameters double as both brain and scratchpad.

Continual learning sits at the center of this roadmap. Rather than training once on a frozen pile of web text and calling it a day, MIRAS pushes for systems that update their memory online, during use, without wrecking previously acquired abilities.

Ilja Sutskever has described his north star as models that learn like a “talented teenager”: constantly absorbing, revising, and integrating new experiences. MIRAS operationalizes that vision by treating usage as an ongoing training stream, not a read‑only inference phase.

Titans becomes the first big, public step along that MIRAS path. Its surprise‑driven memory module, detailed in Titans: Learning to Memorize at Test Time, already behaves like a proto‑continual learner, selectively writing unexpected events into a dedicated store instead of hammering them into the base weights.

Benchmarks hint at what that shift enables. A 760‑million‑parameter Titans variant holds its own against GPT‑4 and Llama‑3.1‑70B on long‑sequence tasks, while updating its memory live across multi‑million‑token sessions.

Philosophically, MIRAS flips the script on how labs think about scale. Rather than only stacking more parameters and data, Google is betting that smarter, structured memory—and models that never really stop learning—will matter more than yet another 10 trillion tokens.

Your New Coworker Is an Agent Named Lux

Your next “AI coworker” might not be a chatbox in a sidebar, but a cursor quietly moving across your own desktop. That is the bet from the Open AGI Foundation with Lux, a new kind of model that treats the computer itself as the interface. Instead of prompting a bot and hoping an API exists, you point Lux at a screen and it just starts working.

Lux describes itself as a computer usage model, and that phrase is doing a lot of work. The system ingests raw pixels, parses buttons, menus, and forms, then issues low‑level actions: clicks, scrolls, key presses, window switches. It can operate full desktops, browsers, spreadsheets, code editors, even stubborn legacy tools that never got a web API.

This moves Lux out of the “assistant” category and into infrastructure territory. You can wire it into a remote VM and have it reconcile invoices in a browser, cross‑check data in a desktop spreadsheet, then draft follow‑up emails in Outlook. For enterprises drowning in brittle RPA scripts and half‑finished integrations, a screen‑native agent starts to look like a universal adapter.

Benchmark numbers back up the swagger. On Mind2Web, an online benchmark built from more than 300 real‑world tasks across live websites, Lux scores 83.6, a massive jump over Google’s Gemini at 69.0 and OpenAI’s best model at 61.3. Same tasks, same messy web, radically different success rate.

Mind2Web is brutal by design. Agents must navigate login walls, weird layouts, infinite scroll, pop‑ups, and inconsistent UI patterns to complete multi‑step goals like booking travel, checking order histories, or digging through account settings. Lux’s margin on this benchmark suggests it is not just memorizing flows, but actually building a working model of how interfaces behave.

That edge comes from what its creators call agentic active pre‑training. Instead of only learning from static logs or synthetic instructions, Lux spends pre‑training time acting inside real environments, exploring UIs, failing, and correcting. The model internalizes patterns like “filters hide behind funnel icons” or “confirmation dialogs often invert button colors,” which transfer across apps.

You can think of it as the difference between reading a manual and actually driving a car. Traditional LLM agents “read the manual” of web APIs and DOM trees; Lux logs millions of hours behind the wheel of live software. That embodied experience gives it a more intuitive, humanlike grasp of user interfaces—and makes “your new coworker” sound less like hype and more like an imminent product category.

Google's Two-Pronged Attack: Speed and Controversy

Google is not betting everything on Titans’ long Langzeitgedächtnisächtnis. In parallel, the company is pushing a second front: raw distribution and cheap generative media. Internal growth data cited by third‑party analytics shows Gemini’s monthly active users climbing faster than ChatGPT’s, and Google wants matching firepower in images and UI experiments.

Enter Nano Banana 2 Flash, a new image model tuned for cost and speed rather than leaderboard glory. Positioned as a “near‑pro” version of Google’s flagship image system, it aims to deliver almost Pro‑level quality at a fraction of the compute cost. That matters for billions of low‑margin image calls in Search, Android, Docs, and ad tooling.

Think of Nano Banana 2 Flash as Google’s bulk‑ink cartridge for generative art. You do not print museum pieces with it; you flood the web with thumbnails, social cards, stickers, and product mockups. If Google can undercut Midjourney, DALL·E, and Stability on price while keeping quality “good enough,” it controls the mass market for AI images.

At the same time, Google quietly ran a very different experiment: AI‑rewritten news headlines inside Google Discover. Instead of showing publishers’ original titles, an internal model generated new ones on the fly, sometimes reframing stories with stronger emotional hooks or different emphases. Users saw these synthetic headlines without any clear label or opt‑out.

Publishers noticed. Reports from Scandinavian and European outlets described headlines that distorted tone or meaning, including crime stories that sounded more sensational and political pieces that downplayed key context. Editors argued that Google’s AI effectively became an unaccountable co‑author sitting between their newsroom and their audience.

Backlash came fast because it hits a long‑simmering fault line. Platforms already control distribution, ad markets, and now increasingly the language that frames journalism. When an AI headline can change how a corruption probe or climate report feels, editorial judgment shifts from newsrooms to ranking systems and model weights.

The Discover test shows how quickly “assistive AI” turns into editorial AI. Titans and Nano Banana 2 Flash chase scale and speed, but the headline controversy exposes the trade: tech platforms want to rewrite not just content, but how the world encounters it.

The Numbers Don't Lie: Gemini's Growth Is Real

Code Red stopped being a metaphor once the download charts arrived. According to SensorTower data cited in recent reports, Gemini’s mobile app now ranks among the fastest‑growing AI products ever, with monthly active users climbing at a pace that dwarfs ChatGPT’s year‑over‑year gains.

ChatGPT still dominates on raw scale, with hundreds of millions of users and the most recognizable brand in consumer AI. But SensorTower’s curves tell a different story about momentum: Gemini’s MAUs grow multiple times faster month‑to‑month, especially in markets where Google can pre‑install or aggressively surface the app.

That velocity matters more than bragging rights. Rapid MAU growth feeds a flywheel of: - More developer interest in Gemini APIs - More enterprise pilots that want Google‑scale reliability - More consumer trust that this isn’t a dead‑end experiment

For developers, Gemini’s ascent means a credible alternative to OpenAI that plugs directly into Android, Chrome, and Google Cloud. When your target users already live inside Gmail, Docs, and Search, building on Google’s stack starts to look less like a risk and more like an inevitability.

Enterprises read the same charts and see negotiating leverage. A fast‑growing Gemini gives CIOs cover to demand better pricing, data‑residency guarantees, and multi‑vendor strategies that pit OpenAI, Google, Microsoft, and Anthropic against each other.

Google, meanwhile, quietly exploits its distribution machine. Gemini suggestions in Android, AI features in Workspace, and Gemini‑powered search experiments all funnel ordinary users into Google’s ecosystem without requiring a separate “AI app” decision.

That is the real Code Red for OpenAI: not that Gemini has already won, but that Google finally aligned research, product, and distribution. Titans, MIRAS, and the broader Gemini stack now ship into an audience counted in the billions, and every incremental feature update rides that rail. For anyone tracking the technical underpinnings, Google’s long‑context work sits alongside open implementations in the Google Research GitHub Repository, underscoring how quickly these ideas can propagate.

The New AI Battlefield Is Here

Code Red no longer describes a single company’s panic; it describes a new AI battlefield. Titans gives Google a model that can juggle 2‑million‑plus token contexts with a real Langzeitgedächtnisächtnis, updating its memory live instead of pretending every conversation resets to zero. Benchmarks like Needle‑in‑a‑Haystack at >95% accuracy and dominance on bAbI‑Long show those gains are not just marketing slides.

Layer MIRAS on top and you get a roadmap, not a one‑off model. MIRAS reframes Transformers, Mamba, RWKV, and friends as different answers to four questions about memory shape, storage rules, decay speed, and update dynamics. That turns “bigger context window” into a design space for continuously learning systems.

Meanwhile Lux attacks a different front: control. Lux looks at your actual screen, parses UI elements, and issues clicks, scrolls, and keypresses to complete real tasks across browsers, spreadsheets, and email clients. On the Mind2Web benchmark of 300+ real‑world website tasks, it posts around 83.6% success, putting older “agent” demos that rely on fragile APIs to shame.

Distribution pressure comes from Gemini and Nano Banana 2 Flash. Sensor Tower–style data shows Gemini’s monthly active users climbing faster than ChatGPT’s, aided by deep Android and Chrome integration. Nano Banana 2 Flash, a cheaper, faster image model that nearly matches its Pro sibling, positions Google to flood mid‑range phones and web apps with “good enough” multimodal AI.

Google now fights a multi‑front war:

1Foundational architecture: Titans and MIRAS redefine how models remember and learn.
2Practical agency: Lux‑style computer‑use agents turn LLMs into full desktop operators.
3Market distribution: Gemini growth, Nano Banana, and AI‑tuned headlines push this stack into everyday feeds and devices.

Static, once‑trained‑then‑frozen models look increasingly like last decade’s playbook. The next phase centers on agents that remember months of interaction history, adapt policies on the fly, and live inside operating systems, browsers, and productivity suites. All of that lands squarely on OpenAI’s doorstep: its next‑generation model, Garlic, now has to prove it can match Titans’ memory, Lux‑level agency, and Gemini‑scale reach, or risk watching Google set the rules for AI’s second act.

Frequently Asked Questions

What is Google Titans?

Titans is a new AI architecture from Google Research designed to give models a true long-term memory. It separates short-term processing from a long-term memory module that learns and updates continuously during use.

How does Titans' memory work?

Titans decides what to store based on 'surprise.' The more unexpected or novel a piece of information is, the more likely it is to be saved, allowing the AI to build a memory of key facts efficiently.

Is Google Titans better than GPT-4?

On specific long-context benchmarks, which test an AI's ability to recall information from vast amounts of text, the video and related reports claim Titans significantly outperforms models like GPT-4 and Llama 3.1.

What is MIRAS?

MIRAS is a framework introduced alongside Titans. It provides the rules and methods for models to learn continuously from new data without forgetting past knowledge, moving AI closer to a state of perpetual learning.

𝕏 in ↑↗

Frequently Asked Questions

What is Google Titans?

How does Titans' memory work?

Is Google Titans better than GPT-4?

What is MIRAS?

Google's AI Brain Just Evolved

TL;DR / Key Takeaways

Why OpenAI Declared a 'Code Red'

The Amnesia Plaguing Every AI

Meet Titans: The AI That Never Forgets

The Genius Is In the 'Surprise'

Crushing the Competition: Titans vs. The World

Beyond Memory: MIRAS and the Continual Learner

Your New Coworker Is an Agent Named Lux

Google's Two-Pronged Attack: Speed and Controversy

The Numbers Don't Lie: Gemini's Growth Is Real

The New AI Battlefield Is Here

Frequently Asked Questions

What is Google Titans?

How does Titans' memory work?

Is Google Titans better than GPT-4?

What is MIRAS?

Frequently Asked Questions

Read Next

O Novo Agente da Anthropic Acabou de Matar o No-Code

Esta Ferramenta Domina Agentes de IA Caóticos

A Memória Perfeita da IA Chegou

Stay Ahead of the AI Curve