Google's New AGI Framework: A Cognitive Test for AI Intelligence

TL;DR / Key Takeaways

Top AI labs are racing towards AGI without agreeing on what it is. Google DeepMind just dropped a scientific framework to end the debate, and it's based on the human mind.

The Wild West of AGI Is Over

The pursuit of Artificial General Intelligence (AGI) drives a fierce, often chaotic, global race among leading AI labs. Billions of dollars and countless hours pour into developing systems capable of human-level cognition, yet the finish line for this monumental endeavor remains undefined. Every major player declares AGI as their ultimate goal, but none agree on what achieving it truly entails, creating a "Wild West" scenario where progress is subjective and often unquantifiable.

Major labs offer starkly different visions for AGI, highlighting the industry's lack of consensus. OpenAI defines it as "a highly autonomous system that outperforms humans at most economically valuable work," emphasizing economic utility.

In contrast, Google DeepMind co-founder Shane Legg describes an AGI as "an artificial agent that can at least do the kinds of cognitive things that people can typically do." Francis Chollet, creator of the ARC benchmark, frames intelligence around skill acquisition efficiency – how rapidly a system learns new concepts.

This profound definitional ambiguity renders any objective assessment of AGI progress nearly impossible. Without a shared understanding of the target, the industry defaults to subjective, "vibes-based" evaluations of AI capabilities. These assessments are often driven by impressive but narrow benchmark scores, which frequently suffer from data contamination or memorization, obscuring true generalized intelligence.

The problem becomes stark: how does one accurately measure advancement towards a goal that cannot even be consistently defined? This fundamental challenge has plagued the AI community, creating a speculative environment where genuine breakthroughs are hard to distinguish from mere incremental improvements. Google DeepMind's recent paper directly confronts this measurement vacuum, proposing a radical shift in how we evaluate intelligent systems.

Google's New Rulebook for Intelligence

Quietly, on March 16, 2026, Google DeepMind unveiled a landmark paper poised to redefine the pursuit of Artificial General Intelligence. Titled 'Measuring Progress Towards AGI: A Cognitive Framework,' this document directly addresses the current AGI "Wild West" by proposing a structured, scientific approach to assessment. It replaces the arbitrary finish lines of existing benchmarks with a comprehensive rulebook for intelligence itself, grounded in decades of human cognitive science.

DeepMind's core proposal advocates a radical shift away from single, gameable benchmark scores that often misrepresent an AI's true capabilities. Instead, the paper posits a need for a full cognitive profile, modeled meticulously on human intelligence. This framework evaluates an AI system's capabilities across 10 distinct cognitive faculties—including perception, reasoning, and social cognition—directly comparing its performance against real human distributions. This ensures a holistic understanding of an AI's intellectual landscape, moving beyond mere task completion to assess genuine intelligence.

Crucially, the framework makes a fundamental distinction: it focuses squarely on what a system can accomplish, not how it achieves it. Whether an AI utilizes transformer architectures, diffusion models, or entirely novel mechanisms is irrelevant to its evaluation. The paper's emphasis remains solely on observable outcomes and demonstrable intellectual abilities, divorcing assessment from underlying technological implementation. This "black box" approach ensures broad applicability and future-proofs the evaluation as AI technologies continue to evolve.

This initiative represents a pivotal move to inject much-needed scientific rigor into the AGI conversation. By providing a common language and a standardized, multi-dimensional assessment protocol, Google DeepMind aims to unify research efforts across the globe. It seeks to establish a universal yardstick, allowing labs worldwide to measure progress objectively and collaboratively, transforming the AGI race from a chaotic sprint into a transparent, shared scientific endeavor. This framework offers a robust foundation for tracking true advancement toward human-level general intelligence.

Deconstructing the Mind: The 10 Faculties

Google DeepMind's new framework anchors itself in a precise cognitive taxonomy, a structured classification of mental abilities. This isn't an arbitrary list invented for AI; instead, it draws directly from decades of established research across cognitive science, psychology, and neuroscience. The framework deliberately maps onto how human intelligence has been studied, providing a robust, empirically grounded basis for evaluating artificial systems. This foundational choice moves the AGI discussion from philosophical abstraction to measurable, scientific comparison.

Central to this taxonomy are 10 distinct cognitive faculties, identified as the fundamental building blocks of intelligence observed in humans: - Perception: Extracting and processing sensory information. - Generation: Producing useful outputs like text, speech, or actions. - Attention: Focusing cognitive resources on relevant information. - Learning: Acquiring new knowledge and adapting after deployment. - Memory: Storing and retrieving information over time, and forgetting outdated data. - Reasoning: Drawing valid conclusions through various logical inferences. - Metacognition: Knowledge and monitoring of one's own cognitive processes, including self-awareness of uncertainty. - Executive Functions: Planning, inhibiting impulses, and switching strategies to achieve goals. - Problem Solving: Applying multiple faculties to find solutions for novel challenges. - Social Cognition: Understanding social cues, inferring others' thoughts, and cooperating appropriately.

These ten faculties collectively form a comprehensive profile, designed to assess AI systems against the full spectrum of human cognitive capabilities. Rather than a single, easily gamed "AGI score," Google DeepMind proposes evaluating AI performance across each of these dimensions, directly comparing it to human baselines. This granular approach promises a far more objective and informative assessment of an AI’s true intellectual progress.

Significantly, the paper emphasizes evaluating what a system can accomplish, not how it achieves it. This crucial distinction ensures the framework remains technology-agnostic, applicable to any AI architecture from transformers to novel designs, without bias towards specific methodologies. For a deeper dive into the framework's specifics, refer to the Measuring Progress Towards AGI: A Cognitive Framework - Google Blog. The accompanying Kaggle hackathon, with its $200,000 prize pool, further underscores Google DeepMind's commitment to collaboratively building robust evaluations, particularly for complex areas like Metacognition and social cognition, where the evaluation gap is currently largest. Future sections will delve into each of these 10 faculties in detail, exploring Google DeepMind's proposed evaluation methods and the profound implications for AGI development.

Building Blocks of Cognition (Part 1)

Google DeepMind's groundbreaking paper, 'Measuring Progress Towards AGI: A Cognitive Framework,' introduces a rigorous 10-faculty cognitive taxonomy for assessing AI. This detailed framework establishes essential "building blocks" of cognition, beginning with the first five foundational faculties that govern how an intelligent system interacts with and processes its world. These components move beyond simplistic benchmarks to define nuanced capabilities.

Perception stands as the initial faculty, evaluating an AI's ability to interpret sensory data, not just detect it. This includes understanding a complex visual scene, recognizing objects, relationships, and context, or accurately interpreting the subtle meanings within human speech and written text. It measures the system's capacity to extract rich, actionable meaning from raw input.

Next, Generation assesses an AI’s capability to produce useful, coherent, and often novel outputs. This ranges from crafting articulate, contextually relevant text and synthesizing natural-sounding speech, to executing precise computer actions and motor movements in physical or virtual environments. It gauges an AI’s skill in translating internal understanding into tangible, external results.

The third crucial faculty, Attention, scrutinizes an AI’s human-like capacity to focus cognitive resources selectively. This means zeroing in on salient information within a vast dataset while effectively filtering out irrelevant distractions. Current AI models often process everything simultaneously; true attention signifies a paradigm shift towards more efficient, goal-directed processing.

Learning and Memory form the fourth and fifth interconnected pillars. Learning evaluates an AI’s capacity for continual learning, acquiring new knowledge and adapting behaviors in real-time post-deployment, akin to a human mastering a new card game or adjusting to a novel job. Memory complements this, measuring the system’s ability to robustly store and retrieve information over extended periods, and equally important, to intelligently forget outdated or irrelevant data, preventing cognitive overload.

The Higher Orders of Thought (Part 2)

Beyond foundational sensory and memory functions, Google DeepMind’s framework elevates five complex cognitive faculties, crucial for achieving human-level intelligence. Reasoning forms a critical pillar, enabling systems to draw valid conclusions through various logical forms. This includes deductive reasoning, inductive reasoning, analogical reasoning, and mathematical inference, moving past rote memorization to true understanding.

Perhaps the most significant gap in current AI, Metacognition, assesses an AI’s self-awareness and understanding of its own knowledge. Can a system "know what it knows," express uncertainty, or articulate its limitations when faced with novel queries? Today’s models notoriously "confidently give you the wrong answer," lacking this vital ability to monitor their own cognitive processes, though Claude has begun to exhibit nascent signs.

Next, Executive Functions govern an AI’s capacity for high-level control and strategic action. These abilities, often likened to the brain's CEO, encompass sophisticated planning, the critical ability to inhibit impulses, and dynamically switching strategies in response to changing conditions. They enable an AI to set a goal and diligently pursue it, adjusting its approach and maintaining focus over extended periods to achieve complex objectives.

Problem Solving synthesizes these diverse cognitive abilities to tackle novel, real-world challenges. This faculty requires an AI to integrate perception, reasoning, planning, and learning, applying them cohesively to find effective solutions in unfamiliar domains. It represents a system's capacity for adaptive intelligence, moving beyond pre-programmed responses to genuinely address new and complex situations that demand creative solutions.

Finally, Social Cognition addresses an AI’s ability to navigate the complexities of human interaction and collaboration. This involves understanding subtle social cues, accurately inferring others' intentions and thoughts, cooperating effectively, negotiating outcomes, and responding appropriately in intricate social situations. It is indispensable for systems operating in human-centric environments, moving beyond isolated tasks to collaborative engagement within complex social dynamics.

This comprehensive taxonomy, introduced in the paper "Measuring Progress Towards AGI: A Cognitive Framework" on March 16th, 2026, focuses on what a system accomplishes, not how it does it. DeepMind’s framework explicitly ignores underlying architectures like transformers or diffusion models, prioritizing observable intelligent behavior. It provides a universal lens to measure progress towards AGI, irrespective of specific technological approaches or internal mechanisms.

The Ultimate Human Showdown

Google DeepMind's framework culminates in a rigorous, three-stage evaluation protocol designed to provide a comprehensive, unbiased assessment of AI intelligence. This systematic approach aims to move beyond anecdotal evidence and single-metric benchmarks, establishing a new standard for tracking progress toward AGI.

First, the cognitive assessment phase involves subjecting the AI to a broad suite of tasks, each meticulously designed to isolate and test a specific cognitive faculty. Crucially, these tasks remain private and held out, independently verified by a third party. This stringent measure directly combats the pervasive issue of data contamination, ensuring the AI hasn't simply memorized answers during training, which would falsely inflate its perceived intelligence.

Next, the framework establishes robust human baselines. Researchers administer the exact same tasks, under identical conditions, to a large, demographically representative sample of human adults, all possessing at least a high school level education. This step generates a genuine distribution of human performance, providing the essential real-world context against which AI capabilities can be accurately measured.

Finally, the process generates detailed cognitive profiles. Developers plot the AI's performance on each of the ten faculties directly against the collected human distribution. The resulting radar chart offers an immediate, intuitive visual representation, highlighting precisely where an AI system excels and where it falls short compared to typical human abilities. For more on the specific traits, see Google DeepMind Plans to Track AGI Progress With These 10 Traits of General Intelligence.

These profiles can illustrate systems performing below the human median in several areas, or those exceeding it across all ten faculties. Even a system achieving the 99th percentile across the board, matching or surpassing almost every human in the sample on every task, represents a profound milestone, though the paper cautiously notes it wouldn't definitively prove AGI due to the inherent limitations of any finite sample of human capability.

What This 'IQ Test' Still Misses

Google DeepMind's "cognitive framework" offers a robust assessment, yet the paper itself candidly acknowledges critical limitations. No single evaluation can capture the full spectrum of intelligence, and this proposed "IQ test" for AI is no exception.

Crucially, the framework exclusively measures cognitive capability, not the speed of execution. An AI might demonstrate perfect reasoning, but if it takes minutes to process a millisecond decision, it remains impractical for real-world applications like autonomous vehicles, high-frequency trading, or surgical robotics, where timely response is paramount.

Beyond raw intellect, the framework overlooks an AI's inherent system propensities. It cannot quantify whether an agent is inherently risk-averse, reckless, conservative, or aggressive. Such tendencies are paramount for ethical deployment and alignment with human values, especially in high-stakes scenarios where an AI's operational character matters as much as its competence.

Another significant challenge arises from the "model versus system" problem. Should an AI be evaluated using its full suite of external tools, akin to allowing a calculator during a human IQ test? Google DeepMind proposes assessing the complete system, including access to tools, but on tasks specifically designed so these aids do not trivialize the underlying cognitive challenge being measured.

This nuanced approach aims to prevent an AI from simply offloading complex cognitive tasks to external utilities without demonstrating intrinsic understanding. The goal remains to gauge intelligence, not merely efficient tool usage, ensuring the framework differentiates between true cognitive prowess and sophisticated look-up functions.

These acknowledged gaps highlight that even a meticulously designed cognitive "IQ test" for AI systems remains a work in progress. While defining what intelligence entails is a monumental step, understanding how it manifests in dynamic, value-laden environments will require further evolution of evaluation methodologies.

A $200,000 Hunt for AGI's Weakest Links

Google DeepMind’s framework extends beyond theoretical proposals. To immediately operationalize its ambitious cognitive taxonomy, Google launched a Kaggle hackathon concurrently with the paper's release. This move transformed the academic exercise into a concrete, community-driven initiative.

The hackathon offers a substantial $200,000 prize pool, incentivizing researchers and developers globally. This significant investment aims to crowdsource the creation of actual evaluation tasks, directly addressing the framework's need for novel, unbiased assessments across its ten faculties. Google understands the monumental challenge of building these tests from scratch.

Crucially, the hackathon targets five specific cognitive faculties where current AI evaluation methods are weakest or non-existent. These include: - Learning - Metacognition - Attention - Executive functions - Social cognition

These categories represent some of the most complex and human-like aspects of intelligence, presenting a considerable hurdle for robust, un-gameable evaluation. Existing benchmarks often fall short in these nuanced areas.

By engaging the global AI community, Google DeepMind seeks to rapidly develop the sophisticated, targeted tests essential for its three-stage evaluation protocol. This collaborative approach aims to fill the most significant gaps in our collective ability to measure and understand true machine intelligence, transforming an academic paper into a living, evolving standard. The hackathon signifies a commitment to practical implementation, not just conceptualization.

Is This the Only Litmus Test?

Google DeepMind's "Measuring Progress Towards AGI: A Cognitive Framework" establishes a new gold standard for comprehensive AGI evaluation, yet it exists within a broader ecosystem of critical benchmarks. The AI research community leverages diverse assessments, each designed to illuminate distinct facets of machine intelligence. Prominently, ARC-AGI, or the Abstraction and Reasoning Corpus, developed by Google AI researcher François Chollet, offers a starkly contrasting perspective.

Chollet's ARC-AGI presents a profoundly different kind of challenge. Unlike Google DeepMind's expansive cognitive taxonomy, which maps intelligence across 10 distinct faculties, ARC-AGI focuses narrowly on fluid intelligence and the ability to infer rules from minimal examples. It comprises abstract visual puzzles, requiring an agent to observe input-output pairs and then apply the learned transformation to a new, unseen input. The core demand is genuine generalization beyond training data.

Critically, current state-of-the-art AI models, despite their impressive feats in language generation, image synthesis, and complex strategic games, achieve scores near zero on ARC-AGI. These models, often trained on vast datasets, excel at pattern recognition within familiar distributions. However, they consistently falter when confronted with the fundamental inductive reasoning and novel problem-solving demanded by Chollet's puzzles, tasks a human child might grasp intuitively.

This stark disparity vividly illustrates the "jagged frontier" of AI progress. Machines now routinely surpass human performance in highly specialized domains like Go, chess, or even advanced code generation. Yet, they simultaneously struggle with what seem like trivially simple tasks for humans, such as understanding basic causal relationships or adapting to entirely new, abstract problem structures without explicit programming. Google DeepMind’s framework aims to map this uneven landscape comprehensively, while ARC-AGI exposes a persistent and critical gap in AI's foundational cognitive abilities. Both types of benchmarks are indispensable for truly understanding and navigating the complex path to AGI.

Goodbye Vibes, Hello Science

Google DeepMind's new framework marks a profound shift, fundamentally redefining the pursuit of Artificial General Intelligence. This isn't merely another benchmark; it establishes a paradigm shift for the entire field, replacing speculative claims with a rigorous, scientific methodology.

Gone are the days of vague pronouncements and cherry-picked demos. Researchers can now move beyond subjective "vibes" and anecdotal evidence, grounding AGI progress in a quantifiable, verifiable standard. The proposed 10 cognitive faculties and three-stage evaluation protocol offer an objective lens to assess capabilities against real human performance.

This granular cognitive taxonomy provides an invaluable diagnostic tool. Developers can now pinpoint specific weaknesses in their models, identifying precisely which faculties—be it metacognition, executive functions, or social cognition—require further development. This cognitive map transforms AGI research from a scattershot effort into a targeted, systematic engineering challenge.

The accompanying $200,000 Kaggle hackathon further underscores Google's commitment to this scientific approach. By inviting the global research community to build evaluations for these specific faculties, Google is actively fostering a collaborative, data-driven path toward AGI, rather than internal, opaque competition.

Ultimately, this framework elevates the AGI conversation. The question is no longer just if we can build truly intelligent machines, but how we will scientifically measure, verify, and systematically navigate our journey toward them. It ushers in an era of scientific verification for artificial intelligence.

Frequently Asked Questions

What is Google's new AGI framework?

It's a proposal by Google DeepMind to measure progress towards AGI by testing AI systems across 10 core cognitive faculties, comparing their performance directly against human baselines rather than using a single score.

What are the 10 cognitive faculties in the framework?

The 10 faculties are Perception, Generation, Attention, Learning, Memory, Reasoning, Metacognition, Executive Functions, Problem Solving, and Social Cognition.

How is this different from existing AI benchmarks?

Unlike benchmarks that test specific skills like coding or math, this framework provides a holistic cognitive profile. It aims to prevent 'teaching to the test' by using private, third-party verified tasks.

Does this new framework mean AGI is close?

No. The framework itself is a measurement tool, not a claim of achievement. It's designed to provide a clear, scientific roadmap to track progress towards AGI, moving the conversation from speculation to empirical evidence.

𝕏 in ↑↗

One weekly email of tools worth shipping. No drip funnel.

one email per week · unsubscribe in two clicks · no third-party tracking

Google Just Rewrote the Rules for AGI