AI Finds 23-Year-Old Linux Kernel Bug with Simple Script

TL;DR / Key Takeaways

An AI just found a critical Linux bug that hid from experts for over two decades using just a 12-line script.
This discovery changes cybersecurity forever, revealing what's lurking in our most trusted code.

A Ghost in the Kernel

Claude Code just found a remotely exploitable Linux kernel bug, a critical flaw that lay hidden since March 2003. This vulnerability, lurking for 23 years in one of the world's most scrutinized codebases, was uncovered by the AI in a mere matter of hours. The speed and precision of this discovery defy conventional expectations for vulnerability research.

Linux, the foundational operating system powering countless servers and devices globally, has undergone relentless auditing by legions of human experts and sophisticated static analysis tools for decades. Yet, this specific heap overflow in the NFSv4.0 LOCK replay cache evaded detection, highlighting a significant blind spot in traditional security methodologies.

This event transcends a simple bug find; it marks a profound paradigm shift in software security. The ability of an AI to autonomously identify such a deeply embedded and complex flaw suggests a new era for cybersecurity, where automated intelligence could fundamentally alter the landscape of code auditing and threat mitigation.

Claude Code, an advanced AI model, executed this unprecedented feat. Researcher Nicholas Carlini, a scientist at Anthropic, orchestrated the discovery. He employed a remarkably straightforward setup: a 12-line bash script that iterated through Linux kernel source files, feeding them to Claude with the instruction, "Find vulnerabilities, pretend it's a CTF."

Carlini's minimal approach yielded extraordinary results. The AI didn't just guess; it demonstrated a sophisticated understanding of intricate multi-client NFS behavior, pinpointing the exact conditions for the heap overflow. This required comprehending how two NFS clients could trigger a specific edge case, leading to a server writing over a thousand bytes into a 112-byte buffer.

The identified bug allowed an unauthenticated attacker to read kernel memory over the network if NFS was exposed, posing a severe risk. Carlini confirmed the AI's prolific nature, stating that Claude Code found not only this 23-year-old Linux bug but also "hundreds more that survived decades of audits," signaling a vast untapped potential for AI in uncovering legacy flaws.

The 12-Line Script That Shook Linux

Nicholas Carlini, a research scientist at Anthropic, orchestrated the impossible with a deceptively simple setup: a 12-line bash script. This terse command-line sequence served as the entire operational framework, demonstrating that groundbreaking vulnerability research does not demand an elaborate, custom-built infrastructure. It merely required a direct interface to a powerful AI model.

Carlini's script looped through Linux kernel source files, feeding snippets to Claude Code, specifically Claude Opus 4.6. The accompanying instruction was straightforward and colloquial: "find vulnerabilities, pretend it's a CTF." This unadorned prompt, mimicking a Capture The Flag challenge, unlocked an unprecedented capability in automated security analysis, bypassing the need for complex rule sets or predefined exploit patterns.

Despite the experiment's minimalist approach, the AI's discovery was anything but simple. Claude identified a 23-year-old Linux kernel bug, a remotely exploitable heap overflow nestled within the NFSv4.0 LOCK replay cache. Present since March 2003, this critical flaw allowed an unauthenticated attacker to read kernel memory over the network if NFS was exposed, a vulnerability that had eluded decades of human audits and static analysis tools. The specific vulnerability has since received CVE-2026-31402.

This wasn't an expensive, multi-year research initiative funded by a defense contractor. Instead, it was an accessible experiment, executable by virtually anyone with access to Claude Code and a basic understanding of scripting. The profound results underscore a paradigm shift: sophisticated bug hunting is no longer the exclusive domain of elite, well-resourced teams, but a frontier now open to readily available AI.

Carlini's initial findings extend far beyond this singular discovery. Claude Code didn't just find one old Linux bug; it uncovered "hundreds more that survived decades of audits," with Carlini himself having already identified five distinct Linux kernel vulnerabilities. Hundreds of additional potential crashes await human validation, signaling a vast, untapped reservoir of hidden flaws exposed by this accessible AI methodology.

Anatomy of a 23-Year-Old Flaw

Vulnerability, now formally tracked as CVE-2026-31402, represents a critical heap overflow within the Network File System version 4 (NFSv4 lock system). Specifically, the flaw resides in the NFSv4.0 LOCK replay cache, a component handling file locking requests across networked clients. This isn't a simple oversight; it demands a deep understanding of multi-client NFS behavior to exploit.

At its core, the bug manifests as a severe memory corruption issue. Imagine trying to force 1056 bytes of data into a tiny container only designed to hold 112 bytes. This extreme mismatch describes the core problem. The Linux kernel, under specific conditions, attempts this impossible feat, causing memory to spill out beyond its allocated boundaries.

This precarious edge case requires a precise two-client interaction to trigger. First, Client A initiates a file lock request, providing an unusually long but technically legal 1024-byte owner ID. This extended identifier is crucial to the exploit chain.

Next, Client B attempts to acquire the very same file lock that Client A now holds. The NFS server correctly denies Client B's request. However, in constructing the denial response, the server includes Client A's original 1024-byte owner ID.

Here lies the critical flaw: the server's internal response buffer for this operation measures a mere 112 bytes. When it attempts to embed the 1024-byte owner ID along with other necessary denial message data, the total response size balloons to approximately 1056 bytes. This massive overflow writes well past the buffer's limits, corrupting adjacent kernel memory.

Exploiting this kernel memory corruption allows an attacker to remotely read sensitive data without requiring any authentication, provided the NFS service is exposed on the target system. This unauthenticated remote memory disclosure makes the bug exceptionally dangerous, revealing why it survived decades of audits and why AI's ability to find it is so significant. For further technical insights into this discovery, read more about Claude Code Finds Long Hidden Linux NFS Vulnerability - Let's Data Science.

Why Decades of Audits Failed

For over two decades, this critical heap overflow lay dormant within the Linux kernel, evading countless expert eyes and sophisticated automated tools. Its persistence highlights a profound blind spot in traditional cybersecurity audits, which proved incapable of uncovering a bug rooted deep in the Network File System (NFS) V4 lock system. The fact that an AI, Claude Code, unearthed a 23-year-old Linux bug in hours exposes the inherent limitations of past methodologies.

Static analysis tools, designed to scan code for patterns of known vulnerabilities, typically struggle with highly contextual, multi-state flaws. Similarly, fuzzing, a common technique that bombards software with malformed inputs to trigger crashes, often misses edge cases requiring precise, sequential interactions between multiple system components. This specific vulnerability demanded a nuanced understanding of how two distinct NFS clients would interact to trigger the overflow, a scenario difficult to randomly generate or statically identify.

Uncovering CVE-2026-31402 necessitated a deep, contextual understanding of multi-client NFS behavior and the intricate state transitions within its lock system. The bug materialized only when a client acquired an unusually long 1024-byte owner ID, and a second client subsequently triggered a denial, causing the server to write 1056 bytes into a mere 112-byte buffer. Such a specific, state-dependent interaction is where AI models like Claude Code excel, interpreting complex protocol specifications and identifying non-obvious logical flaws.

Researcher Nicholas Carlini himself admitted the difficulty, stating, "If I never found one of these by hand, now I have a bunch." This frank admission underscores the challenge for human experts to find such remotely exploitable heap overflows manually. Carlini's simple 12-line bash script, paired with Claude's analytical prowess, bypassed decades of failed audits and revealed hundreds more potential vulnerabilities that traditional methods overlooked.

The 'Fresh Eyes' Revolution

The discovery of CVE-2026-31402 heralds a profound shift in cybersecurity: the advent of AI-powered vulnerability research. Unlike human experts, AI approaches legacy codebases with truly "fresh eyes," devoid of the accumulated assumptions and mental models that can blind even the most seasoned auditors. This unburdened perspective proved critical in unearthing a flaw hidden for over two decades.

Human developers and security auditors, despite their expertise, inevitably develop cognitive shortcuts. Over years of working with complex systems like the Linux kernel and protocols such as NFS, they form ingrained understandings of how components should interact. These mental frameworks, while efficient, can inadvertently create blind spots, causing subtle deviations from protocol specifications or unexpected edge cases to go unnoticed.

Claude Code, specifically Claude Opus 4.6, operates without these human biases. It analyzes raw code logic and protocol specifications purely, identifying discrepancies or potential overflows based on objective data. Nicholas Carlini's simple 12-line bash script, instructing Claude to "find vulnerabilities, pretend it's a CTF," leveraged this capability. The AI didn't guess; it understood complex, multi-client NFS behavior and how a 1024-byte owner ID could catastrophically overflow a 112-byte buffer during a denial response.

This breakthrough also underscores the rapid advancement in AI capabilities. Older versions of AI models might have missed the intricate logic leading to CVE-2026-31402. Claude Opus 4.6, however, demonstrated a superior ability to reason through intricate state machines and inter-client interactions, uncovering not just this one 23-year-old Linux bug but also "hundreds more that survived decades of audits." This exponential improvement signals a future where AI routinely uncovers deep-seated flaws that have long eluded human detection.

From AI Slop to Critical Alerts

Linux kernel maintainers have dramatically shifted their stance on AI-generated bug reports. A palpable sense of urgency and respect for AI's capabilities now permeates conversations where deep skepticism and outright dismissal once reigned supreme. This represents a profound change in the notoriously conservative open-source community.

Greg Kroah-Hartman, a venerable figure in the Linux kernel community, noted "the world switched" almost overnight. A sudden influx of "real reports," exhibiting genuine insight and actionable detail, now supplants what was once universally dismissed as useless "AI slop," fundamentally altering their bug triage process.

For years, AI-generated vulnerability submissions were largely ignored. These early attempts often characterized by nonsensical findings, superficial analyses, or outright fabrications, wasted maintainers' precious time. They consistently lacked the deep contextual understanding necessary for genuine, exploitable bug identification, leading to a default assumption of low quality.

Willy Tarreau, another influential kernel developer, corroborates this dramatic change. His team now sees an average of 5-10 high-quality bug reports per day, a stark contrast to the previous rate of merely 2-3 reports per week. This exponential increase highlights a qualitative leap in AI-driven analysis, demonstrating a newfound ability to pinpoint critical flaws.

This new era of actionable intelligence stems directly from sophisticated AI models like Claude Code, which demonstrated a profound understanding of complex systems like NFSv4.0 to uncover CVE-2026-31402. Nicholas Carlini’s simple 12-line script proved a potent catalyst for this paradigm shift, proving AI's ability to find vulnerabilities in deeply embedded, decades-old codebases. For more details on how Claude Code was used to identify this long-standing vulnerability, read about the discovery here: Claude Code Used to Find Remotely Exploitable Linux Kernel Vulnerability Hidden for 23 Years - InfoQ.

AI is no longer just a tool for generating code or content; it now actively contributes to the core security of foundational software. This shift forces a radical re-evaluation of how the open-source community approaches vulnerability discovery and code auditing, promising a future where hidden flaws become increasingly rare.

The New Human Bottleneck

Vulnerability research has fundamentally shifted, inverting the traditional challenge. AI models like Claude Code now effortlessly generate potential exploits, shifting the bottleneck from the arduous discovery of flaws to their exhaustive, time-consuming human validation. This dramatic change redefines the core problem for cybersecurity.

Nicholas Carlini's straightforward 12-line bash script yielded far more than just CVE-2026-31402, the 23-year-old Linux bug. Claude identified "hundreds more" potential crashes, each a complex vulnerability that survived decades of human and automated audits. These findings now sit in a queue, demanding meticulous review by a finite pool of highly specialized human experts.

This torrent of high-potential leads places an unprecedented burden on security teams globally. Organizations are suddenly awash in promising, AI-generated vulnerability reports, yet they critically lack the sheer manpower and specialized expertise to thoroughly investigate each one. The sheer volume threatens to overwhelm existing incident response, patch development, and software assurance workflows.

The implications extend beyond immediate patching. This introduces the next critical challenge in the age of AI-driven security: how to effectively scale human expertise to match the AI's relentless discovery rate. Legacy systems, once thought thoroughly audited, are now fertile ground for AI, but the speed of human analysis cannot keep pace with AI's ability to probe vast codebases for subtle flaws.

Confronting a future where the ability to quickly and accurately confirm AI-flagged issues becomes the most valuable, and scarce, resource in cybersecurity. This demands a profound rethinking of security operations, moving towards new triage mechanisms and human-AI collaboration models designed to accelerate validation. The era where finding a critical bug was the hardest part has passed; now, discerning the signal from the noise and acting on it is the paramount concern.

The sheer volume of potential issues means many critical, remotely exploitable vulnerabilities could remain unpatched, simply because human analysts cannot process them all. This creates a new security debt, where potential exploits accumulate, waiting for human intervention that might never come. Addressing this human capacity deficit represents the next frontier in securing our digital infrastructure.

Cybersecurity's Double-Edged Sword

The revelation of AI-discovered vulnerabilities like CVE-2026-31402 presents a profound double-edged sword for cybersecurity. Nicholas Carlini’s simple 12-line bash script, which empowered Claude Code to unearth a 23-year-old Linux bug in mere hours, underscores an alarming reality: if researchers can leverage AI with such ease, so too can malicious actors. This capability threatens to dramatically accelerate the digital arms race between attackers and defenders, fundamentally altering the security landscape.

Widely available AI models, capable of uncovering complex zero-day vulnerabilities on demand, introduce an unprecedented threat landscape. Imagine nation-state actors or sophisticated criminal enterprises deploying similar scripts, not for defensive disclosure, but for widespread exploitation against critical infrastructure and corporate networks. The barrier to entry for discovering deeply hidden flaws, previously requiring immense human expertise and time, has significantly lowered, democratizing vulnerability research for both good and ill.

This shift raises urgent questions regarding the ethics of developing and releasing such powerful tools. While Carlini’s work at Anthropic aims to bolster defensive security, the underlying technology’s dual-use nature is undeniable. Should access to these advanced AI models be restricted, or is their widespread availability an inevitable, uncontrollable force?

Responsible disclosure practices, already a complex dance between identifying flaws and coordinating patches, become even more critical and fraught. Researchers face immense pressure to report findings quickly, but the sheer speed at which AI can generate vulnerabilities could overwhelm existing disclosure mechanisms. The volume of potential flaws, like the "hundreds more" Claude Code identified, demands a new paradigm for rapid validation and patching across vast codebases.

Ultimately, AI’s prowess in finding vulnerabilities forces a reckoning with our collective digital security posture. The same technology promising to secure our future also hands unprecedented power to those seeking to undermine it, often without authentication required. We must confront how to harness this power defensively, without inadvertently arming the adversaries. The implications for global stability, critical infrastructure, and individual privacy are staggering.

Your Action Plan: Patch and Prepare

Immediate action for every Linux user, administrator, and CTO is clear: update your kernel without delay. The critical heap overflow vulnerability, tracked as CVE-2026-31402, allowed unauthenticated attackers to read kernel memory over the network. This 23-year-old flaw in the NFSv4.0 lock system, exposed by Claude Code, demands immediate patching across all affected systems.

Beyond patching, critically re-evaluate your network architecture. The NFS bug's severity escalated because it required no authentication if the Network File System was exposed to the internet. Restrict access to internal networks or implement robust VPNs and firewalls. Never expose services like NFS directly to the public internet; this practice creates an open invitation for exploits.

Adopt a new, proactive security mindset: assume all legacy systems harbor critical, undiscovered vulnerabilities. If an AI can unearth a complex, multi-client interaction bug from 2003 with a 12-line bash script, traditional audit methods have significant blind spots. This paradigm shift means continuously scanning, updating, and validating even long-standing codebases.

Cybersecurity teams now face a dynamic threat landscape where AI significantly accelerates vulnerability discovery. Focus resources not just on finding new bugs, but on rapid validation and deployment of patches. For additional insights into this groundbreaking research and the methodology, read Linux 7.0: One Bash Script. One Weekend. 23 Years of Kernel Bugs. - Can Artuc - Medium.

This incident underscores the fragility of even widely trusted software. The era of assuming stability for decades-old code is over. Organizations must prioritize continuous security hygiene and integrate advanced AI-powered tools into their defensive strategies, moving past reactive patching to proactive threat anticipation. Prepare for a future where hidden flaws emerge with unprecedented speed.

The Dawn of AI-Driven Security

AI's stunning discovery of CVE-2026-31402, a 23-year-old Linux kernel bug, heralds a new epoch in digital defense. Nicholas Carlini’s simple 12-line bash script demonstrated AI's unprecedented capacity to bypass decades of human and automated audits, unequivocally marking the definitive dawn of AI-driven security. This event is not an isolated anomaly but a profound precursor to a fundamental shift.

AI will soon permeate every stage of the software development lifecycle. Imagine intelligent agents performing continuous, real-time code analysis, identifying subtle flaws as developers write code, and even suggesting or implementing automated patching as new features move through integration pipelines. This proactive approach will dramatically shrink attack surfaces and reduce the window of vulnerability.

Beyond initial development, AI systems will continuously monitor live environments, detecting anomalous network traffic or unusual kernel behavior indicative of a zero-day exploit. They will evolve from merely identifying known signatures to predicting potential vulnerabilities based on vast datasets of historical exploits, architectural patterns, and evolving threat intelligence.

Companies like Better Stack are uniquely positioned to capitalize on these burgeoning capabilities. Integrating advanced AI into their monitoring and observability platforms will transform mountains of raw operational data into actionable, predictive security intelligence. This translates into significantly faster threat identification and more effective, automated incident response.

The ultimate strength of this new paradigm lies in the symbiotic relationship between human experts and artificial intelligence. AI excels at tirelessly sifting through immense codebases and recognizing obscure patterns, while human ingenuity provides critical context, validates complex findings like Carlini's, and strategizes against the most advanced, human-driven adversaries.

This powerful collaboration promises to redefine cybersecurity resilience across the globe. It ensures that the digital world, built on increasingly complex and interconnected code, benefits from an unparalleled layer of defense, securing our shared digital future against threats previously deemed undetectable or too intricate for timely discovery.

Frequently Asked Questions

What specific Linux bug did the Claude AI find?

Claude found a 23-year-old heap overflow vulnerability (CVE-2026-31402) in the Linux kernel's NFSv4 lock system. It allowed an unauthenticated attacker to read kernel memory over the network if NFS was exposed.

How did an AI find a bug that humans and tools missed for decades?

The AI understood the complex, multi-client interaction required to trigger the bug's specific edge case. Unlike static tools, it could reason about the NFS protocol's behavior, a type of contextual understanding that had previously eluded human reviewers.

Who is Nicholas Carlini and what was his method?

Nicholas Carlini is a research scientist at Anthropic. He used a simple 12-line bash script to loop through kernel source files and feed them to the Claude AI with the prompt, 'Find vulnerabilities, pretend it's a CTF.'

Is this Linux bug a current threat?

The specific vulnerability has been patched. However, its discovery proves that other critical, long-standing bugs likely exist in mature software, making it vital for users to keep all systems updated.

Found this useful? Share it.

One short daily email of tools worth shipping. No drip funnel.

one email a day · unsubscribe in two clicks · no third-party tracking

AI Just Hacked Linux's 23-Year-Old Secret