The Dev Stack That Ended AI Lies

AI coding assistants are lying to you with outdated code, costing you hours. Here's the two-tool MCP stack that forces them to tell the truth.

tutorials
Hero image for: The Dev Stack That Ended AI Lies

The Hidden Tax on 'Free' AI Code

Free AI coding help comes with a line item most teams never budget for: hours lost untangling code that never had a chance of running. You save 30 seconds generating a React hook, then burn two hours discovering the API changed last year and your “assistant” never got the memo. That gap between confident output and current reality is where the real cost hides.

Developer and toolsmith Robin Ebers has a blunt name for this: outdated code is “very expensive.” When an AI hands you a broken integration for Stripe, Next.js, or AWS, you are not just fixing syntax; you are reverse‑engineering what changed since the model’s training cutoff. Every minute spent diffing docs against hallucinated snippets is productivity you thought you were outsourcing.

Modern LLMs ship with a built‑in handicap: a model cutoff date that freezes their knowledge months or years in the past. Frameworks like Next.js, React, and FastAPI ship breaking changes on a 6–12 week cadence. Cloud APIs from AWS, Google Cloud, and OpenAI evolve even faster, deprecating parameters, renaming methods, and changing auth flows while your model remains stuck in time.

That mismatch turns AI assistants into unreliable narrators for fast‑moving stacks. Ask for a Stripe Checkout example and you might get the 2022 API, complete with deprecated fields. Call into the GitHub REST API and the model might confidently recommend endpoints that no longer exist or require scopes that changed after its cutoff. The code looks plausible, compiles cleanly, and fails silently in production.

Most advice today tries to paper over this with better prompting: “ask it to double‑check,” “tell it to verify against docs,” “be specific about versions.” That shifts the burden back onto developers, who now have to design prompts as carefully as they design schemas. You are still relying on a system that guesses first and only sometimes checks its work.

Ebers’ work points toward a different answer: change the dev stack, not just the prompts, so the model cannot lie about APIs without hitting reality first.

Your AI is Lying. Here's Why.

Illustration: Your AI is Lying. Here's Why.
Illustration: Your AI is Lying. Here's Why.

Hallucination sounds mystical, but for developers it means your assistant confidently returns code that never worked anywhere. A large language model predicts the next token based on patterns in its training data, not on a live compiler or runtime. When that data freezes at a cutoff date, your AI happily fabricates methods, parameters, and config flags that only exist in its imagination.

Ask for a Stripe API integration today and a static model might still suggest v2-style calls deprecated years ago. Request an OpenAI client example and it may use pre-2023 signatures that now throw 400s. The model does not know it is wrong; it optimizes for plausibility, not truth.

Developers often try to patch this with generic web search bolted onto chat. That usually means scraping the same SEO-choked blog posts and 2019 Stack Overflow threads you already stopped trusting. You get jQuery-era React patterns, `componentWillReceiveProps` examples, or Kubernetes YAML that predates your cluster version.

Search engines optimize for clicks, not for ground-truth documentation. They surface content that ranks, not content that is correct for `v4.2.1` of the library you actually use. Your AI then summarizes this mess, compounding stale advice into fresh-looking nonsense.

What you really need is a system that can separate three kinds of information: general background, community examples, and authoritative specs. General context can come from blogs and Q&A. Examples can come from code search. But when signatures, flags, or behaviors matter, the model must hit official docs or typed SDKs.

Static LLMs sit on a snapshot of the world, while software moves on a weekly release cadence. React, Next.js, Stripe, OpenAI, AWS, and Kubernetes all shipped breaking changes after most popular models’ training cutoffs. That disconnect guarantees drift between what your AI “knows” and what your toolchain actually does.

Without a way to route questions to up-to-date sources—API references, changelogs, migration guides—you force a probabilistic text generator to act like a live debugger. That is how you end up paying for “free” code with hours of debugging and unexplained 500s.

Stop Prompting, Start Directing

Prompt engineering treated the model like an artsy collaborator. Tool orchestration treats it like an employee following a runbook. You stop begging with clever prompts and start wiring hard rules into the environment the model cannot ignore.

Cursor’s “always apply” rules flip that switch. Instead of ad‑hoc instructions buried in a chat, you define a standing order: every request must pass through a scripted research workflow. The rule injects context, constraints, and a strict tool priority so the model behaves like a deterministic agent, not a moody chatbot.

Robin Ebers’ setup shows how aggressive this can get. His rule forces Cursor to prefer the Exa MCP first for almost everything, because Exa is “good enough” most of the time and “way cheaper” than hammering official docs. Inside Exa, a newer “code context” tool must run before any generic web search.

Only after that chain fails does the stack escalate. Web search unlocks as a secondary option, and the Ref MCP for official documentation comes last, gated by three conditions: - When the user explicitly requests Ref - When Exa results contradict each other - After two failed attempts to fix an external API or library where docs likely changed post‑cutoff

Those conditions are guardrails, not suggestions. The model cannot “vibe” its way to an answer; it must walk the same repeatable research path every time, which slashes hallucinations and keeps cost predictable. You get a workflow you can debug and refine, instead of a black box that sometimes feels smart.

Under the hood, this is all powered by Model Context Protocol (MCP), which cleanly bridges LLMs to external tools like Exa and Ref. MCP standardizes how models discover, call, and chain tools, and the Model Context Protocol - Official Documentation reads less like marketing and more like a spec for turning LLMs into real agents.

Your New Toolkit: Exa and Ref

Your new stack revolves around two MCP servers wired directly into your editor: Exa MCP and Ref MCP. Instead of begging a model to “please use the docs,” you hard‑code a research strategy that decides which tool runs, when, and why.

Exa sits in the hot path. Robin Ebers configures Cursor so the model always prefers Exa first because it is “good enough” most of the time and “way cheaper” than hammering official docs for every question.

Inside Exa, a newer code context tool does the heavy lifting. The model calls that specific tool before any generic web search, pulling in code‑relevant snippets, examples, and discussions tailored to the libraries and patterns you actually use.

Think of Exa as a savvy junior developer who lives in Stack Overflow, GitHub issues, and blog posts. You ask a question; it comes back with three plausible approaches, recent code samples, and a rough sense of what changed in version 5.2 versus 5.3.

Ref MCP plays the opposite role: slower, pricier, and far more authoritative. Ref connects directly to official, up‑to‑date documentation for APIs and libraries, acting as your ground truth layer when you suspect the model’s training data cutoff is sabotaging you.

Robin’s rule only allows Ref in three cases: - When the user explicitly asks for Ref - When Exa results contradict each other - After two failed attempts to fix an external API or library where docs may have changed

That escalation path turns Ref into the equivalent of cracking open the official API reference when the junior dev’s guesses stop panning out. You do not waste tokens on full‑text docs until you have concrete evidence that something in the real world moved.

Exa plus Ref creates a two‑tier research system that mirrors how experienced engineers actually work. You skim community wisdom for 80% of problems, then drop into canonical docs when version numbers, authentication flows, or breaking changes start to matter.

Instead of a model hallucinating “probably right” code from a 2023 snapshot of npm, you get a directed pipeline. Exa finds cheap, code‑aware context; Ref confirms the exact method names, parameters, and edge cases that decide whether your build passes or burns two more hours.

The 'Exa-First' Priority Rule

Illustration: The 'Exa-First' Priority Rule
Illustration: The 'Exa-First' Priority Rule

Priority in this stack is brutally simple: tell Cursor to hit Exa first, every time. Robin Ebers wires his “always apply” rule so the model follows a hard hierarchy: 1) Exa for code context, 2) Exa for web search, 3) Ref only on specific triggers. The model never freelances its own tool order.

Inside Exa, the star of the show is the newer “code context” tool. Cursor instructs the model to call that before any generic search, so the AI looks at code‑relevant results that match your stack, frameworks, and recent issues. Only when that specialized context fails does it fall back to broader web results.

This priority order is economic as much as technical. Exa is “good enough most of the times” and “way cheaper” than hammering Ref MCP or a generic web search on every question. You pay for network calls and latency, but you pay far more when the model spits out bad code and you burn 2 hours debugging it.

Starting with a specialized, cheaper tool also clamps down on hallucinations. When Cursor forces Exa’s code context first, the model sees real repositories, recent gists, and concrete usage patterns before it guesses. That alone kills a huge class of “I think this API works like…” fabrications.

Ref MCP sits at the very end of the chain as a high‑cost escalation. Cursor only allows Ref when one of three conditions triggers: - The user explicitly requests Ref - Exa results contradict each other - Two failed attempts to fix an external API or library suggest post‑cutoff doc changes

Those guardrails stop the AI from defaulting to expensive, generic documentation lookups for problems that have a simple, code‑specific answer. If Exa’s code context can tell you how `fetch` behaves in a popular SDK, you do not need a full crawl of vendor docs. You only escalate when reality and the model’s prior collide.

In Cursor, that logic looks like a small, ruthless policy layer. Pseudo‑code for the rule might resemble:

```jsonc { "alwaysApply": true, "priority": [ "exa.code_context", "exa.web_search", "ref.docs" ], "usagePolicy": { "ref.docs": { "allowedWhen": [ "user_explicitly_requests", "exa_results_conflict", "after_two_failed_external_api_fixes" ] } } } ```

The 'Break Glass' Triggers for Official Docs

Ref only comes out when you hit one of three hard “break glass” conditions. Everything else runs on Exa because it’s faster and “way cheaper,” as Robin Ebers stresses. Treat Ref as the emergency line straight to official docs, not another search tab.

First trigger: explicit user request. If a developer types “use ref” or clearly asks for official documentation, the stack must call Ref MCP immediately. That keeps humans in charge of cost and latency, instead of hiding expensive calls behind opaque agent logic.

Second trigger: Exa contradicts itself. If one Exa result says a method is deprecated in v4 and another shows it as the recommended path in v5, the system flags that conflict. At that point, Ref becomes the tie‑breaker, pulling the vendor’s canonical docs so the model stops guessing which answer matches reality.

Third trigger operationalizes model cutoff awareness. When the AI suspects an external API or library issue and has already tried and failed to fix it twice, it assumes the world has changed since training. Only after those two failed attempts does the rule allow a Ref call to fetch current, official docs for that package, SDK, or REST endpoint.

Those three conditions turn “model is probably outdated” from a vague fear into a concrete workflow. The model can’t silently brute‑force random fixes forever; it must either succeed quickly with Exa or escalate via Ref under strict rules. That structure slashes the risk of burning an afternoon on hallucinated migration guides or dead configuration flags.

Developers who want to replicate this stack in Cursor wire these triggers into an “always apply” rule that governs tool use across Exa MCP and Ref MCP. Under the hood, it’s just deterministic orchestration on top of the Model Context Protocol. For deeper implementation details, the Model Context Protocol - GitHub Repository documents how to register tools, enforce priority, and keep Ref as the last‑resort, break‑glass path to official documentation.

Cost Control as a Coding Strategy

Cost talk usually happens after the cloud bill lands, not while you are debugging a broken Stripe integration at 1 a.m. Robin Ebers flips that: Exa is “way cheaper” than Ref, so cost becomes part of how you design the workflow, not an afterthought. The MCP stack bakes that bias in by defaulting to Exa’s code context tool and only escalating when absolutely necessary.

Treat Exa + Ref as a financial control plane, not just an accuracy patch. Every Ref call pulls official docs and burns more tokens, latency, and tool‑usage fees than a quick Exa MCP query against real‑world code. By encoding the priority rule directly into Cursor’s “always apply” rule, Robin effectively hard‑codes a budget policy into the assistant.

Failed attempts are where money quietly evaporates. Each hallucinated fix means: - Extra LLM calls to re‑explain the bug - More Exa or web searches - Potential Ref lookups when you finally suspect cutoff issues

Cut those retries in half and you slash token usage, tool calls, and dev time, all at once. Two failed attempts before touching Ref is not just a quality gate; it is a cost throttle.

Intelligent tool routing becomes a form of rate limiting for your wallet. Route 80–90% of questions through Exa’s code context, fall back to web search rarely, and reserve Ref for the three “break glass” triggers. You end up with a system that behaves like a senior engineer: fast guesses first, expensive research only when contradictions or API drift demand it.

Most AI workflows obsess over model choice and ignore this layer entirely. The MCP stack shows that sustainable AI coding is less about picking GPT‑4 vs. Claude and more about who answers first, how often they retry, and when they are allowed to escalate.

From Theory to Terminal: A Real-World Fix

Illustration: From Theory to Terminal: A Real-World Fix
Illustration: From Theory to Terminal: A Real-World Fix

You’re building a small dashboard in React, following an AI’s suggestion to use `componentWillReceiveProps` to sync props into state. You paste the code, hit save, and your terminal lights up with a bright red warning: this lifecycle method is deprecated and ignored in React 18. Your “free” AI helper just handed you a landmine.

Under Robin Ebers’ Exa‑first rule, the model’s first move is not to guess. Cursor routes the error through Exa MCP’s code context tool, asking for recent examples of how people handle prop changes in modern React. Exa returns a grab bag of tutorials: one recommends `UNSAFE_componentWillReceiveProps`, another pushes `getDerivedStateFromProps`, a third says “just use hooks.”

The AI picks what looks most popular: a class‑component refactor using `UNSAFE_componentWillReceiveProps`. You run it again. React compiles, but your state updates misfire and a fresh warning tells you this method is legacy and should be avoided. Two attempts down, your terminal is still broken and your trust in AI is dropping.

That’s exactly when the “two unsuccessful attempts” trigger fires. The Cursor rule now allows the expensive Ref MCP path. Instead of scraping more blog posts, the model calls Ref with a very specific query: “current React 18+ official docs for responding to prop changes; class vs function components; recommended APIs.”

Ref hits the official react.dev docs and returns canonical guidance: class lifecycles like `componentWillReceiveProps` and its `UNSAFE_` variant are deprecated; new code should favor function components with `useEffect`. It surfaces a precise snippet from the “Synchronizing with Effects” page, including the latest signatures and caveats around dependency arrays.

Armed with that, the AI rewrites your component as a function:

  • Props flow directly into JSX
  • Local derived state uses `useState`
  • Side effects run in `useEffect` keyed to the relevant prop

You paste the new code, run the app, and the warnings vanish. No deprecated calls, no ghost lifecycle methods, no stale patterns from a 2018 blog. Exa handled the cheap, broad search; Ref stepped in once the model had proven it could not resolve a likely post‑cutoff API change on its own.

The Future is Deterministic AI Agents

Deterministic agents are quietly replacing chatty copilots. Instead of a single model guessing its way through your codebase, you now get specialized MCPs wired to specific jobs: search, docs, issues, repos, even deployment.

Robin Ebers’ Exa + Ref stack is just the tip of that architecture. Exa handles 90–95% of search calls, while Ref sits behind a strict “break glass” policy for official documentation, turning what used to be vibes‑based prompting into a predictable research pipeline.

Zoom out and you see the same pattern in his other MCPs. A GitHub MCP doesn’t “talk about” issues; it fetches them, links PRs, and maps failures back to specific commits. A code‑context MCP doesn’t summarize your repo; it loads concrete files, symbols, and call graphs into the model’s working memory.

Instead of one giant chatbot, you get a mesh of narrow, deterministic tools. Each MCP exposes a small, typed surface area—“search this code,” “pull that issue,” “fetch these docs”—and the LLM becomes an orchestrator that chains them together under hard rules and cost ceilings.

This is a clean break from the “ask anything” assistant model. In Cursor, Robin’s always‑on rule turns the agent into a workflow engine: it must hit Exa code context, then web search, then Ref, in that order, or it is misbehaving. No ad‑hoc browsing, no hallucinated APIs unless every cheaper, grounded path fails.

Viewed that way, the future dev stack looks more like UNIX pipes than ChatGPT. You wire: - Exa for code and web - Ref for canonical docs - GitHub MCP for issues and PRs - Repo MCPs for full‑tree context

Each piece stays small, auditable, and replaceable. You can swap search providers, change doc sources, or point the same orchestration rules at a different monorepo without retraining anything. The “intelligence” lives in routing and policy, not in a mythical all‑knowing model.

Documentation like the **Cursor Documentation** now doubles as an API surface for these agents, not just human‑readable help. That shift—from chatbot UX to deterministic, context‑aware MCP networks—is what finally puts AI on the hook for reliable coding, not creative guessing.

Build Your Anti-Hallucination Stack Today

Start with your editor. Install Cursor if you have not already, then open Settings → Rules. Create a new rule and set it to “always apply” for coding sessions that touch external APIs, frameworks, or SDKs.

Next, wire in tools. Install the Exa MCP server following the docs at exa.ai and add it to your Cursor MCP configuration. Do the same for Ref MCP from its repo or marketplace listing, but keep it as a secondary, higher‑cost option.

Now encode the priority order as boilerplate. Use language Cursor can parse as instructions to the model, for example:

  • Always use Exa’s code‑context tool first for any coding or debugging task.
  • If more information is needed, use Exa’s general web search.
  • Only use Ref when explicitly requested, when Exa results contradict, or after two failed attempts to fix an external API or library.

You can drop a template like this into your rule:

“You must follow this tool order: 1) Exa code‑context, 2) Exa web search, 3) Ref for official docs only on these triggers: user explicitly asks; Exa results contradict; two unsuccessful attempts to fix an external API or library where documentation changes after model cutoff are suspected. Prefer cheaper tools when possible.”

Treat this as infrastructure, not a one‑off hack. Save the rule, enable it for all coding workspaces, and version it in your dotfiles so your whole team can share the same guardrails.

Once this stack runs, your AI stops guessing and starts retrieving. You trade two‑hour debugging spirals for minutes‑long, tool‑driven answers, and you ship code that tracks current docs instead of model folklore. The result: less time fighting hallucinations, more time pushing reliable features with an AI assistant you can finally trust.

Frequently Asked Questions

What is the MCP stack described in the article?

It's a two-tool stack using the Exa MCP for general, cost-effective searches and the Ref MCP for fetching official documentation, all managed by a priority rule inside the Cursor editor.

Why does this stack prefer Exa over other tools?

The rule prioritizes Exa because it's significantly cheaper than alternatives and 'good enough' for most coding queries. It starts with Exa's specific 'code context' tool before falling back to web search.

When does the stack use the Ref MCP for official docs?

Ref is used as a last resort in three specific cases: when the user explicitly requests it, when Exa's results are contradictory, or after two failed attempts to fix a library/API where outdated documentation is suspected.

What core problem does this MCP stack solve?

It solves the problem of AI coding assistants generating outdated or incorrect code (hallucinations) due to their training data cutoff, which saves developers significant debugging time and reduces costs.

Tags

#MCP#Cursor#AI Development#Exa#Developer Tools

Stay Ahead of the AI Curve

Discover the best AI tools, agents, and MCP servers curated by Stork.AI. Find the right solutions to supercharge your workflow.