Ray Fernando's AI Coding Workflow: A Guide to Context Engineering

Why Your 'Prompt-and-Pray' Method Is Failing

Most developers hit the same wall: you open Cursor or VS Code, fire up Claude or GPT‑4, type a clever prompt, and watch it spit out 200 lines of code that almost work. The first 10 minutes feel magical. The next 2 hours vanish into fixing off‑by‑one errors, missing imports, and functions that don’t match your actual data model.

That pain comes from a quiet failure mode: context rot. Large models juggle tens of thousands of tokens, but your project quickly blows past that with multiple files, API contracts, and half‑remembered design decisions. By the third or fourth prompt, the AI no longer “remembers” why you chose Postgres over Firebase or why user roles live in a separate service.

Context rot shows up as the model confidently re‑introducing old patterns you already refactored away. You ask it to update the new `BillingClient`, and it resurrects the deprecated `StripeService` from an earlier message. It forgets environment variable names, re‑invents types, and quietly drifts from your real architecture.

That drift has a brutal downstream cost. Instead of reviewing a tight, surgical diff, you get a grab bag of changes: new helper functions scattered across files, duplicated logic, and inconsistent error handling. Each AI pass compounds the mess, so debugging stops being about one bug and starts being about reconciling three slightly different versions of the same feature.

Developers report spending more time auditing AI output than writing code themselves, especially on projects with more than 5–10 files. You see: - Duplicate DTOs with conflicting shapes - Divergent naming conventions for the same concept - Silent breaking changes to public interfaces

Naive prompting also hides complexity behind fake confidence. The model will happily generate a “working” OAuth flow that ignores refresh tokens, PKCE, or proper token storage, leaving you with a security liability that looks fine at a glance. The same thing happens with database migrations, background jobs, and caching layers.

A serious AI coding workflow treats the model as part of a system, not a genie. That means planning, explicit project context, and stack‑aware prompts that keep the AI grounded in your actual codebase instead of your last message.

Shift Your Mindset: Become a Context Engineer

Shift from “prompting a chatbot” to operating a system. That’s the core philosophy Ray Fernando and Cole Medin keep circling back to: serious AI coding means you orchestrate models, tools, and context like an engineer running a pipeline, not a user typing into a box. You don’t just ask for code; you shape what the model knows, sees, and remembers about your project.

Traditional prompt engineering focuses on one message: wordsmithing a query, adding constraints, maybe pasting a snippet of code. That helps, but it caps out fast because every interaction resets the model’s understanding. You get smart answers to narrow questions, not a collaborator who understands the repo, the architecture, and the roadmap.

Context engineering flips that. You design how the AI ingests your stack: docs, schemas, APIs, data flows, and constraints. Instead of a single clever prompt, you build a persistent project brain using tools like repo indexers, project summaries, and structured “system” prompts that travel with every request.

Ray’s workflows treat the context window—whether 8K, 32K, or 200K tokens—as a scarce, high-value resource. He advocates curating high-signal artifacts: a one-page architecture overview, feature specs, data models, and dependency maps. Those live as reusable context blocks you feed into Cursor, Claude, or GPT-style models before any new feature work.

Think of it as the difference between a senior engineer and a new intern. The senior has read the design docs, understands the trade-offs, and remembers that weird migration from six months ago. The intern only sees the one file you handed them and has no idea why anything is structured the way it is.

Prompt-only workflows turn your AI into that intern: reactive, myopic, constantly surprised by existing patterns. Context-engineered workflows turn it into a senior dev who can reason across modules, catch architectural regressions, and propose consistent abstractions. Same model, radically different behavior.

Operate your AI stack like infrastructure, not a toy. Once you internalize that you are a context engineer, every part of your workflow changes: how you write docs, how you structure repos, and how you talk to the model.

Blueprint Before Build: The Project-First Mandate

Blueprint-first workflows start with something aggressively low-tech: a written plan. Ray Fernando treats this as non-negotiable; he refuses to open Cursor or Claude until he has a project doc that reads like a mini product spec, architecture sketch, and test plan rolled into one.

That document starts with feature requirements in plain language. He writes user stories, explicit “done” criteria, and constraints like performance targets, latency budgets, or browser support, so the model can’t quietly optimize for the wrong goal.

Next comes data flow. Ray maps entities, inputs, outputs, and how data moves across the system: request payloads, DB schemas, cache layers, external APIs, and background jobs. If a feature touches authentication, billing, and notifications, each hop gets named and described before a single line of code exists.

He then locks in tech stack decisions instead of leaving them to the model’s defaults. The plan specifies language, framework, ORM, queue system, testing tools, and deployment target, often down to versions: “Next.js 15, React Server Components, Prisma with PostgreSQL, Redis for rate limiting, Vitest for unit tests, GitHub Actions for CI.”

Ray also forces a pass on edge cases and failure modes. He lists scenarios like “partial payment success,” “webhook retries,” “stale JWT,” or “mobile offline sync,” and calls out how the system should degrade or recover. Those bullets later become test prompts and monitoring checks.

That initial doc becomes the seed context for the entire AI session. He pastes it into a fresh chat and treats it as the contract: every subsequent prompt references sections of the plan instead of re-explaining the project from scratch.

This step blocks the model from inventing architecture. No more surprise MongoDB when you meant Postgres, no mystery microservices when you wanted a monolith, no auto-generated auth when you already use Auth0. The model can only “be creative” inside the box you drew.

You can watch this discipline play out in his live builds and deep dives on the Ray Fernando YouTube Channel – AI Coding & Workflows, where the plan, not the prompt, drives the entire coding session.

Mastering the AI's Memory: Context as a Resource

Context works like RAM for your AI pair programmer: fast, powerful, and absolutely finite. Modern LLMs juggle anywhere from 8,000 to 200,000 tokens, but that window fills up shockingly fast once you throw in framework glue code, third-party libraries, and your own half-documented services. Treat that space as a budget, not a bottomless pit.

Most devs burn this budget by pasting raw files until the model taps out. You get partial understanding, hallucinated interfaces, and “helpful” rewrites of the wrong module. Ray Fernando’s approach flips this: you invest early in a compact, high-signal project summary that travels with every request, so the model orients instantly.

Automated project indexers are the power tool here. Instead of feeding 120 files, you generate a structured “table of contents” that might be 2–5% of the repo size but encodes 80–90% of what the AI needs. That index can include: - High-level architecture - Key modules and their responsibilities - Data models and relationships - External APIs and integration points

Tools like Cursor, Windsurf, and custom CLI scripts can maintain this index as the project evolves. Cursor’s repo-level prompts let you pin that summary as a standing instruction, so every chat, edit, and “fix this” command runs through the same shared mental model. You’re effectively giving the AI a persistent design doc instead of re-explaining your stack from scratch.

System prompts act as the policy layer on top of that context. In Cursor, you can define rules like “never touch auth,” “prefer functional React components,” or “all new code must include Jest tests.” Those guardrails live above individual messages, so the AI respects them even when you’re focused on a tiny diff.

High-signal context beats raw volume every time. A 1,500-token architecture summary, data schema, and routing map will outperform 20,000 tokens of unfiltered controllers, utils, and dead code. You want the AI reading the map, not bushwhacking through your node_modules.

Noise kills model performance long before you hit the hard token limit. By curating a project index and repo prompt, you compress your codebase into something a model can reason about consistently. That’s the leap from “paste files until it breaks” to operating a real context-engineered workflow.

Use Your Best AI to Think, Not Just to Type

Most people treat their smartest model like a very fast typist. Workflow engineers like Ray Fernando and Cole Medin treat it like a chief architect. You don’t pay Claude 3 Opus or GPT-4 Turbo to grind out for-loops; you pay them to decide what should exist in the first place.

Use your top-tier model for high-compression thinking: system design, trade-off analysis, and failure modes. Ask it to propose multiple architectures, compare them, and justify choices around data models, API boundaries, and deployment. Have it output a written technical spec: components, responsibilities, interfaces, and a stepwise implementation plan.

That “architect pass” might cost a few dollars of API calls, but it front-loads the hard reasoning. You get a plan with explicit assumptions, constraints, and risks that you can audit like a design doc from a senior engineer. Once you approve it, you freeze that spec as the single source of truth for everything that follows.

Then you switch to cheaper, faster models—Claude 3 Haiku, GPT-4o mini, or your AI IDE’s built-in assistant—to execute the plan. Feed them only the relevant slice of the spec plus the local files, and ask for small, shippable diffs: a single module, a test suite, or a migration script. Review, run tests, and iterate at high speed without burning premium tokens.

A typical stack might look like: - Claude 3 Opus / GPT-4 Turbo: architecture, specs, risk analysis - Claude 3 Sonnet / GPT-4o: feature-level code, refactors, docs - Claude 3 Haiku / GPT-4o mini: boilerplate, tests, minor edits

This “architect then execute” pattern maximizes quality and controls cost. You concentrate expensive reasoning into a few high-signal passes, then let commodity models handle repetitive coding. You also reduce variance: when every change traces back to a vetted plan, you fight fewer mysterious regressions and context-induced hallucinations.

Over time, you can even regenerate the spec when requirements shift, then re-run only the affected implementation steps. The result feels less like chatting with a robot and more like operating a build system for software ideas.

Divide and Conquer: Spawning Sub-Agents for Clarity

Modern AI coding workflows increasingly look less like a single chat window and more like a small studio of specialist agents. Each chat session becomes a scoped workbench for one concern, with its own context, history, and constraints, instead of a chaotic all‑purpose thread that tries to remember everything.

Picture a full‑stack feature: user profiles with avatars and activity feeds. One agent focuses purely on the database schema and data model, another handles the React UI, and a third manages integration wiring and API contracts.

In the database thread, you park everything related to tables, indexes, and migrations. You ask it to iterate on a PostgreSQL schema, generate Prisma models, and reason about foreign keys and performance, without React components or CSS ever polluting that context.

Meanwhile, a separate React thread owns layout, state management, and component structure. It can stay deep in the weeds of hooks, props, and Tailwind classes, referencing only the API shapes you paste in, not the entire backend codebase.

This mirrors how AI IDEs like Cursor, Replit, and GitHub Copilot Workspace push you toward multi‑agent thinking. They encourage: - One “architect” chat for high‑level design - Localized “file” or “diff” chats for specific changes - Background indexers that surface only relevant code

Ray Fernando’s own systems formalize this pattern with ruthless context separation. His My Pro Claude Code Workflow – Ray Fernando walkthrough shows how he spins up fresh Claude sessions for schema design, API contracts, and UI flows, then stitches them together through a master project brief.

This approach directly combats context pollution, where a single overstuffed chat derails into half‑remembered decisions and contradictory instructions. By isolating concerns, you keep each model instance “mentally” narrow and high‑signal, which reduces hallucinations and conflicting suggestions.

Your main, high‑level conversation stays clean: goals, constraints, milestones, and tradeoffs. Sub‑agents handle “how,” but the primary thread guards the “why,” acting as the project’s source of truth.

When you need to adjust direction, you update the master plan first, then propagate changes down into the specialized chats. That top‑down flow turns a messy prompt pile into an explicit, multi‑agent workflow you can reason about, debug, and improve.

Shipping Code, Not Chaos: The Stacked Diffs Revolution

Massive AI-generated pull requests feel impressive until someone tries to review them. You get 800-line diffs touching 14 files, mixing refactors, new features, and drive‑by fixes. No human, or code owner bot, can reliably validate that kind of change, so teams either rubber‑stamp it or block it forever.

Modern AI workflows counter that chaos with stacked diffs. Instead of one mega‑commit, you ship a vertical slice of work as a sequence of small, logically isolated changes: a type tweak here, a new helper there, then the feature wiring, then tests. Each step compiles, runs, and can ship independently.

Practically, you guide the model to operate at the diff level, not the repo level. Tell it: “Propose a single, minimal commit that only adds the data model for X. No controllers, no UI, no tests yet.” Then you paste the current git diff, ask it to refine only that patch, and stop as soon as the change feels reviewable.

Good stacked workflows turn into explicit micro‑prompts, for example:

1“Step 1: add interfaces and types only.”
2“Step 2: add pure functions that use those types.”
3“Step 3: integrate into existing endpoints.”
4“Step 4: add tests and docs for the new behavior.”

Each step becomes a separate branch or commit that tools like GitHub, GitLab, or Phabricator can show as its own diff stack. Reviewers see a 40‑line change that “adds validation helpers,” not a 400‑line surprise that silently rewrites auth. You keep context tight for the AI and for humans.

Stacked diffs turn AI from a code hose into a controlled change generator. You get smaller blast radiuses, easier rollbacks, cleaner git history, and realistic code review, which makes AI‑written code actually safe to merge into a production team’s main branch.

The End Goal: Use AI to Need AI Less

Counterintuitive reality of good AI workflows: they make you need the model less over time. If your setup makes you more dependent on autocomplete and chat threads every week, you’re not coding with AI — you’re outsourcing your brain to it.

Treat the model as a Socratic partner, not a vending machine. After each significant change, ask it to explain what the code does, why this design over alternatives, and where it will likely break. Force it to argue with itself: “List 3 weaknesses in this approach and propose safer patterns.”

Push the model to generate artifacts you’d normally skip when rushing: diagrams, docs, and narratives. Have it produce: - A sequence diagram for your request pipeline - A data-flow map for your key entities - A one-page architecture overview in plain English

Use those outputs as study material, not decoration. Read the architecture docs it generates and then ask follow-ups like, “Redraw this diagram assuming we shard this service,” or “Explain this to a junior dev in 5 bullet points.” You’re training your own mental model of the system while the AI handles the rote drawing and formatting.

Offload memorization aggressively. Instead of trying to remember every helper function or config flag, have the AI maintain a living index: “Summarize the responsibilities of each module in 1–2 lines.” That turns your limited working memory into a cache of concepts, not line numbers.

Over weeks, this shifts the balance of power. You start using AI for scaffolding — boilerplate, migrations, test harnesses — while you own the architecture, invariants, and failure modes. The model becomes a fast pair programmer, not the lead engineer.

Mature workflows that Ray Fernando and Cole Medin advocate push toward eventual mastery. You should be able to sketch a feature, implement 80% of it unaided, then bring in AI for targeted refactors, tests, and edge cases. The endgame: AI accelerates you, but your understanding ships the product.

Your New Daily Standup with Your AI Co-Pilot

Start your day by writing a 3–5 sentence brief in plain language: what you’re building, why, and how you’ll know it works. This is your seed context. Include constraints: target API, performance expectations, and any “must not change” areas of the codebase.

Next, turn that blurb into a micro‑spec. Add a checklist of 3–7 concrete outcomes, like “add pagination to /users endpoint” or “log all failed auth attempts.” You now have a tiny, auditable contract between you and your AI co‑pilot.

Now open a fresh AI session. Paste the plan, then attach or paste your project’s index files: route maps, main entrypoints, schema files, and config. You want the model to see the skeleton of the app, not every random utility.

If your IDE supports repo context (Cursor, GitHub Copilot Workspace, Codeium), point it at the repo but still foreground the plan. Ask the model to restate its understanding of the task in 5–10 bullet points. Correct anything off by even 10–20%, because that error will echo through every suggestion.

With alignment locked, tell the AI explicitly: “Implement this as stacked diffs, one logical change at a time.” Your standup now becomes a loop of: - Propose next smallest diff - Show files it will touch - Generate patch - You review and run tests

Insist on patches that fit on a single screen or two. If a diff touches more than ~5 files or 150–200 lines, send it back: “Split this into smaller, reviewable steps.” Ray Fernando’s Stop Pushing AI Code Straight to Main – Ray Fernando breaks down exactly why this discipline saves you from merge hell.

After each accepted diff, ask for a one‑paragraph summary and a short risk note: migrations, performance changes, or API surface tweaks. Paste that summary into your commit message. Your git history becomes an auto‑generated changelog instead of “wip again.”

End the session by turning the model into a documentation assistant. Tell it which docs exist—README, API reference, ADRs—and ask for concrete edits: new sections, updated examples, deprecation notes. Copy those changes into your docs repo as a final diff and ship it with the code.

The Future Is Here: You're a Conductor Now

You are no longer a typist feeding lines into a prediction engine; you are a conductor running an orchestra of models, tools, and automations. The job stops being “write code faster” and becomes “design a system that reliably ships working software.” Keystrokes matter less than how you shape context, decompose work, and route the right task to the right AI at the right time.

Modern AI coding looks like operating a mini platform: a high‑level planner model, repo‑aware assistants, test generators, and refactoring bots, all wired into your editor and CI. The strongest engineers already think in workflows: spin up a “requirements” agent, a “design” agent, then a “diff” agent that only touches a single feature branch. You don’t ask one model to do everything; you orchestrate a pipeline.

The developers who thrive will be the ones who treat AI as infrastructure, not as a toy autocomplete. They will own: - A documented project blueprint before any major feature - A stable of specialized AI sessions for architecture, implementation, and review - A stacked‑diffs discipline that keeps every change reviewable and reversible

AI IDEs are racing to make this orchestration native. Tools like Cursor, GitHub Copilot, and Replit are already bundling repo indexing, test‑aware refactors, and multi‑file edits into a single flow. Expect first‑class concepts like “feature plan,” “context profile,” and “review stack” to sit beside “Run” and “Debug” in your editor within the next 12–24 months.

The gap will not be model access; everyone will have roughly similar LLMs. The gap will be who can design robust AI‑powered systems that survive real‑world complexity, version control, and team workflows. That gap is already visible on teams where one engineer quietly ships 3–5x more reviewed, production‑ready code by operating a disciplined AI pipeline.

Old habits—prompt‑and‑pray, 1,000‑line diffs, zero planning—do not just waste time; they actively sabotage you. Start acting like a conductor: blueprint first, engineer context, spawn sub‑agents, stack your diffs, and use AI to make yourself need it less. The future of software is not “AI writes code for you”; it is you, running the orchestra that makes AI worth using.

Frequently Asked Questions

What is 'context engineering' in AI coding?

It's the practice of structuring and feeding project information (plans, diagrams, summaries) to an AI model to ensure it has a high-signal, accurate understanding of the codebase before generating code.

Why is a 'project-first, not prompt-first' approach better?

It forces high-level planning and documentation first, which leads to more coherent, accurate, and maintainable AI-generated code, reducing rework and debugging.

What tools are best for this advanced AI workflow?

AI-native IDEs like Cursor are ideal. They integrate features like project indexing, multi-agent conversations, and stacked diffs that directly support context-first workflows.

How does this workflow help you depend less on AI over time?

By using AI to document and understand your own stack, you build a strong mental model of the codebase. This empowers you to implement simpler features yourself, using AI only for more complex, novel tasks.

Your AI Coding Workflow Is Wrong