TL;DR / Key Takeaways
The AI Quality Paradox: Why Great Models Give Bad Results
Modern AI models like Claude Opus 4.6 and GPT 5.4 represent a pinnacle of computational intelligence. Anthropic's Opus 4.6, released in February 2026, boasts a 1 million token context window and excels at complex agentic tasks, demonstrating sophisticated reasoning. These are not mere incremental upgrades; they are robust, highly capable systems designed for sophisticated problem-solving and long-horizon work. Yet, a perplexing paradox plagues many users: top-tier models frequently deliver frustratingly mediocre results, often wasting significant computational resources.
The issue rarely lies with the foundational model itself. As AI expert Ras Mic emphasizes, current models are "exceptionally good," capable of discerning complex patterns and executing intricate instructions. The critical differentiator, then, becomes the harness and context users construct around them. This surrounding architecture dictates whether the model yields quality output or mere "slop," transforming powerful AI into an expensive, underperforming tool.
This disconnect fuels widespread user frustration, leading to substantial financial waste. Developers and everyday users alike invest in cutting-edge AI, only to encounter agents that produce generic, incorrect, or woefully inefficient outputs. Common culprits include verbose `agent.md` or `cloud.md` files, which get loaded into context on every turn, burning thousands of tokens and degrading performance as the context window fills. The promise of intelligent automation gives way to a cycle of over-prompting, escalating costs, and diminishing returns.
Moving beyond this inefficiency demands a fundamental shift in strategy. Instead of brute-force instructions and token-bloated `agent.md` files—which Ras Mic argues 95% of users can skip entirely, costing 944+ tokens per turn—the focus must pivot to elegant, efficient workflows. This involves understanding intricate context window mechanics and leveraging advanced techniques like custom Skills, which cost roughly 53 tokens per turn, for targeted, token-efficient interaction, stopping the cycle of wasted tokens and unlocking genuine productivity.
Inside the AI's Brain: Deconstructing the Context Window
An AI agent's effectiveness hinges on its context window, essentially the model's short-term memory for any given task. This critical component defines the scope of information the AI can access and process to execute an action.
This window isn't empty; it's a dynamic stack of information. It comprises several elements loaded into the model's active memory: - Foundational system prompt, guiding the AI's core behavior. - Agent files, such as `agent.md` or `cloud.md`, intended to provide specific instructions. - Custom skills, designed for specialized workflows. - Integrated tools and the relevant codebase. - Ongoing user conversation, including all previous turns.
Ras Mic, an expert in AI agent mechanics, argues that `agent.md` files often prove redundant for 95% of users. These files consume significant tokens, loading on every turn and degrading performance as the window fills unnecessarily.
Every piece of information, from a single character to an entire codebase, translates into tokens—the fundamental units of data an AI processes. Models like Claude Opus 4.6 and GPT 5.4 boast impressive context windows, often around 250,000 tokens. However, this capacity has a hard limit.
Once an agent reaches its token limit, it resorts to compaction, summarising older information to make room for new data. This process inevitably leads to a sharp decline in performance and output quality, akin to a human struggling to recall details from a heavily summarized memory.
Mastering agent performance and optimizing token spend requires a deep understanding of this context window anatomy. Strategically managing what enters this memory, particularly by leveraging progressive disclosure through custom skills (which cost roughly 53 tokens per turn versus 944+ for equivalent `agent.md` files), becomes paramount for consistent, high-quality AI output.
The 'agent.md' File Is a Trap (And You Fell For It)
Conventional wisdom dictates crafting extensive `agent.md` or `claude.md` files, believing these detailed instructions are crucial for an agent’s performance. This common practice, however, often proves counterproductive, needlessly consuming resources and hindering efficiency. Ras Mic, an expert in AI agent optimization, challenges this notion, asserting that 95% of users can — and should — abandon these large contextual files entirely.
Modern large language models like Claude Opus 4.6 and GPT 5.4 are exceptionally capable; they infer context directly from the codebase and ongoing conversation. Telling an agent a project uses React becomes redundant when the model already has the React files within its context window. It possesses the inherent intelligence to understand the development environment without explicit, repeated instruction. This allows for a "super, super minimal" approach to context building, dramatically simplifying agent setup. For deeper insights into Anthropic's advanced models and their capabilities, including Claude Opus, refer to their official announcement: Introducing Claude 3: Opus, Sonnet, Haiku.
The primary pitfall of the oversized `agent.md` lies in its loading mechanism. Agents load these entire files into their context window on every single turn, burning thousands of tokens unnecessarily. A custom skill, by contrast, costs roughly 53 tokens per turn, while an equivalent `agent.md` file can consume upwards of 944 tokens for the same interaction. This leads to significant token waste and degraded performance as the context window rapidly fills.
So, when are these files appropriate? The remaining 5% of use cases involve highly specific, proprietary company methodologies or unique workflows that an agent cannot infer from code or conversation alone. These scenarios demand constant, non-negotiable instructions, such as adhering to complex internal compliance protocols or specialized data handling procedures. In these instances, a compact, precisely defined `.md` file can still serve a vital purpose. Otherwise, trust the model's intelligence and strip away the superfluous.
The Secret Weapon: 'Progressive Disclosure' with Skills
Abandoning those bloated `agent.md` files unveils a superior alternative: Skills. These specialized, modular instruction sets dramatically optimize how your AI agent operates, transforming token management from a liability into a strategic advantage. Skills represent a fundamental paradigm shift in agent design, moving away from static, always-on directives that choke the context window. They empower agents to access vast capabilities without the constant overhead.
Core to Skills' efficiency is the principle of progressive disclosure. Instead of embedding entire instruction manuals into every turn of the conversation, only a skill's succinct name and a brief, high-level description reside in the agent's active context window. For instance, a skill might be described as "analyze financial reports" or "generate marketing copy for social media," offering just enough information for the agent to understand its purpose. This tiny token footprint keeps the working memory lean and focused.
Here is how the workflow unfolds: the AI agent, whether powered by Claude Opus or GPT-5.4, first scans the list of available skill names and descriptions. It leverages its advanced reasoning capabilities to determine if a particular skill is relevant to the immediate task at hand. For a marketing agent, if a user requests a social media post, the "generate marketing copy" skill becomes immediately salient. Only upon identifying a clear need does the agent dynamically load the full, detailed instructions for that specific skill into its context, executing the required actions.
Consider the stark contrast in token consumption, a critical factor in both cost and performance. A typical, well-crafted skill, with its name and description, occupies a mere 53 tokens within the context window for each turn. This minimal investment allows for a vast library of potential actions to be "on deck." An equivalent `agent.md` file, however, packed with general instructions, conditional logic for multiple scenarios, and tool definitions, devours upwards of 944 tokens per turn. This staggering difference means thousands of tokens saved over the course of an extended conversation or complex, multi-step task.
This token-efficient approach not only slashes operational costs but also significantly enhances agent performance and reliability. By preventing the context window from prematurely filling with irrelevant information, agents maintain higher fidelity reasoning and reduce the likelihood of "context compaction," where older, potentially crucial information is summarized or discarded. Progressive disclosure with Skills ensures your agent operates with maximum precision, accessing specialized knowledge only when truly necessary, delivering precise results without the exorbitant token tax.
The Co-Pilot Method: Build Skills *With* Your Agent, Not For It
Many users, eager to leverage advanced AI capabilities, instinctively identify a complex workflow and immediately attempt to write a comprehensive skill file for it from scratch. This conventional approach, reminiscent of pre-programming a rigid script, often leads to an inefficient trial-and-error loop, burning valuable tokens and generating inconsistent results because theoretical instructions inevitably miss nuances of real-world execution. Such upfront authoring presumes perfect foresight, a flaw that quickly becomes evident when the agent encounters unforeseen edge cases.
Ras Mic, an expert in agentic AI, champions a radically different strategy: the Co-Pilot Method. This iterative, hands-on methodology transforms skill development from a solitary coding task into a collaborative learning experience with the AI itself. Instead of dictating instructions, you guide the agent through a process, allowing it to learn and then document its own successful journey.
Mic's methodology provides a five-step blueprint for building robust, practical skills: - First, identify the specific workflow the agent needs to master, whether screening sponsor emails or generating analytics reports. - Next, execute the entire workflow manually, **step-by-step, *with* the agent, treating it as a highly capable but untrained apprentice. - Crucially, actively correct any errors, refine prompts, and guide the agent through successful micro-actions in real-time. - Only after achieving a complete, flawless run of the entire workflow does the pivotal final step occur. - Command the agent to create the skill based on that successful interaction context**, effectively self-documenting its own proven process.
Consider training a new human employee: you wouldn't simply hand them a dense, theoretical manual and expect immediate, perfect execution. Instead, you'd sit beside them, guiding them through tasks, offering immediate feedback, and letting them learn by doing. Only once they've demonstrated proficiency would you then document the refined, proven process for future reference. This human-centric approach is precisely what the Co-Pilot Method applies to AI agents, fostering organic learning before formalizing the knowledge.
This iterative, "learn-by-doing" approach ensures that agent skills are not abstract, theoretical constructs, but rather robust instructions built upon proven, real-world execution. Such skills are inherently more resilient to edge cases and dramatically more token-efficient because they capture the precise sequence of successful actions and decisions. By building skills *with* your agent rather than *for* it, you move beyond mere instruction and towards genuine, contextually aware competence, directly addressing the token waste inherent in speculative `agent.md` files.
Case Study: From Email Chaos to Automated Insight
Ras Mic, a leading voice in AI agent development, encountered a familiar problem when building an agent to screen sponsor emails. His initial attempt, armed with a vague prompt, resulted in an agent that approved every single incoming sponsor. The core issue was a fundamental lack of defined rejection criteria within the agent's context, leading to indiscriminate acceptance.
Without explicit instructions on what constituted an unsuitable partner or how to evaluate potential conflicts of interest, the agent defaulted to a positive bias. This common pitfall underscores how even powerful models like Claude Opus 4.6 or GPT 5.4 require precise guardrails and negative constraints to perform effectively and avoid "slop" output.
Mic then applied the Co-Pilot Method, abandoning the traditional approach of pre-writing a complex, static skill file. Instead, he interactively guided the agent through the sponsor screening process step by step. This collaborative, iterative approach allowed the agent to learn directly from his real-world workflow, capturing nuanced decision-making.
He began by having the agent thoroughly research a hypothetical sponsor, instructing it to pull relevant data from various external sources. Next, he worked with the agent to define granular criteria for both desirable and undesirable partners, articulating specific data points, red flags, and brand alignment considerations. Finally, they established a clear, standardized output format for its recommendations, ensuring consistency. For more on structuring agent tasks, particularly with advanced functionalities, refer to Tool use for Claude.
This collaborative process culminated in a highly reliable skill that could autonomously vet incoming sponsor emails. Mic further refined this skill through recursive feedback, treating every misclassification or edge case as an opportunity. He fed failures back to the agent, prompting it to update the skill file and learn from its mistakes.
After several iterations of this refinement loop, the agent now operates with remarkable accuracy, autonomously handling a task that previously consumed hours of manual work. The final skill effectively transformed a time-consuming, error-prone manual process into an automated insight generator, demonstrating the profound efficiency gains possible when agents are trained interactively to build robust skills.
Turn Failures into Features: The Recursive Refinement Loop
Even the most meticulously crafted skills, designed to optimize AI agent performance and token efficiency, will inevitably encounter edge cases. New data formats, unexpected user inputs, or unforeseen workflow complexities can cause an agent to stumble, leading to errors or suboptimal outputs. These aren't just bugs; they represent critical, real-world learning opportunities.
Enter the Recursive Refinement Loop, a powerful methodology that transforms agent failures into robust, self-improving features. This process treats every misstep not as a defect to be patched externally, but as invaluable feedback the agent leverages to enhance its own capabilities. It instills a continuous improvement cycle, fundamentally altering how resilient AI systems are built.
This iterative refinement follows a precise three-step sequence, putting the agent in the driver's seat of its own evolution: - First, identify the specific error or deviation from the desired outcome. Pinpoint the exact moment and reason for the failure, providing concrete context. - Second, prompt the agent to analyze its own failure. Instruct it to explain *why* it failed and, crucially, propose a logical fix or additional instruction to prevent a recurrence of that specific mistake. - Third, command the agent to update its own skill file directly with the newly proposed logic. This direct modification hard-codes the lesson learned into its operational guidelines, making the agent profoundly self-correcting and adaptive.
Ras Mic vividly demonstrated this principle with his YouTube analytics report generator. Initially, the agent struggled with the inherent variability of diverse data inputs and report formats, frequently producing inconsistent or incomplete results. Through five rigorous iterations of the Recursive Refinement Loop, he systematically fed each unique failure back into the agent's learning process.
Each time, the agent meticulously diagnosed its shortcomings, formulated precise solutions, and updated its internal instructions within the skill file. This disciplined, iterative approach transformed a previously failure-prone system into a flawless data aggregator. Now, the agent executes complex reports across eight distinct data sources in approximately ten minutes, consistently delivering accurate and comprehensive insights without human intervention.
Productivity Over Pizazz: Scaling Agents the Smart Way
Developers frequently rush to deploy elaborate multi-agent systems from day one, seduced by the allure of intricate architectures. This common misstep prioritizes perceived sophistication over tangible output, often leading to token bloat and inefficient workflows before any real value is generated. Ras Mic, however, champions a more pragmatic approach, emphasizing a foundational strategy that prioritizes efficiency.
Instead of immediate architectural complexity, Ras Mic advocates for beginning with a single, powerful generalist agent. This core agent handles a wide array of tasks—from comprehensive email screening to detailed spreadsheet analysis and in-depth research—without the unnecessary overhead of specialized, premature counterparts. The objective remains establishing a robust, highly capable core before considering any expansion.
Focus efforts on meticulously building a comprehensive library of robust, reliable skills for this primary agent. Each skill, refined through iterative "recursive refinement loops" as detailed previously, becomes a precise, token-efficient tool, honed to perfection. This strategy ensures the generalist agent masters its core workflows, consistently delivering high-quality, predictable results that minimize token waste and maximize accuracy.
Scaling occurs only after the generalist agent's foundational workflows are perfected and its skill library is mature. Introduce specialized sub-agents—for distinct areas like marketing, business development, or personal tasks—strategically, when specific, complex needs arise. This measured, productivity-driven expansion avoids the pitfalls of premature complexity, ensuring every new component serves a proven, efficient purpose rather than merely contributing to a cool-looking, yet underperforming, system. Prioritize genuine utility over architectural pizazz.
The Agentic Future is Here, If You Build It Right
Agentic AI is not a distant promise; it is the immediate reality with models like Claude Opus 4.6 and GPT-5.4. These advanced systems demonstrate unprecedented autonomy and reasoning, moving beyond simple prompt-response to genuinely orchestrate complex tasks. Their power, however, remains contingent on the quality of their operational framework.
A meticulously curated skill library becomes the indispensable foundation for leveraging these autonomous models. Instead of attempting to cram every potential instruction into a single, monolithic context file, this modular approach provides agents with a precise, on-demand toolkit. It allows the AI to dynamically access specialized capabilities, significantly enhancing efficiency and reducing the token waste associated with bloated `agent.md` files.
Insights from incidents like the Claude Code leak further underscore this necessity, revealing the profound, underlying complexity of professional-grade agent orchestration. These leaked system prompts demonstrated how even leading AI developers rely on highly structured, modular components to guide their agents effectively. For a deeper understanding of these developments, explore Claude 3 Opus and the frontier of AI agents.
Developing a robust skill-building methodology, rooted in progressive disclosure and recursive refinement, is therefore not merely a temporary hack. This is a fundamental discipline for anyone serious about working with AI in the coming years. Mastering this approach ensures agents can scale for true productivity, rather than collapsing under the weight of poorly managed context.
Your Action Plan for Agent Mastery
Your AI agent’s true potential isn't unlocked by massive `agent.md` files or complex multi-agent setups from day one. Instead, it lies in a disciplined approach to context management and skill development. Mastering this methodology transforms AI from a token-wasting novelty into a productivity powerhouse.
Take these concrete steps to revolutionize your agent workflow:
- 1Optimize Context: Abandon the token-bloating `agent.md` files. Leverage the inherent intelligence of models like Claude Opus 4.6 and GPT-5.4, trusting them to infer context from the codebase and conversation.
- 2Employ Progressive Disclosure: Utilize Skills as your primary method for extending agent capabilities. Only the skill’s name and description reside in the active context, loading full instructions only when needed, drastically reducing token consumption.
- 3Build Skills Co-Pilot Style: Do not attempt to write skill files from scratch. Instead, identify a repetitive task and perform it step-by-step with your agent. Once successful, instruct the agent to encapsulate that workflow into a new skill.
- 4Refine Recursively: Treat every agent failure as a feature opportunity. Feed the error back to the agent, allowing it to diagnose the issue and update the skill file for future resilience. This recursive refinement loop continuously hardens your agent’s capabilities.
- 5Scale for Productivity: Resist the urge to build sprawling multi-agent systems immediately. Begin with one agent, focusing on building a robust library of highly effective skills for its core tasks. Expand only after achieving consistent, reliable performance.
This week, identify one repetitive workflow in your professional or personal life. It could be drafting routine emails, summarizing meeting notes, or organizing data. Apply the Co-Pilot Method: work through that task with your agent in a live conversation, documenting each step. Once complete, have the agent write the skill for you.
This hands-on exercise will not only yield your first custom skill but also embed the foundational principles of efficient agentic AI. By mastering this lean, iterative approach, you move beyond mere interaction to unlock the profound productivity gains that the agentic future, powered by models like Claude Opus 4.6 and GPT-5.4, truly promises.
Frequently Asked Questions
What is the main problem with how people use AI agents today?
Most users overload the AI's context window with unnecessary information, like lengthy agent.md files. This wastes tokens, degrades performance, and leads to poor results.
What are AI 'skills' and why are they more efficient?
Skills are self-contained instructions for an agent. They use 'progressive disclosure,' meaning only the name and description are in the context window until needed, saving thousands of tokens per turn compared to other methods.
What is the best way to create a new AI skill?
Instead of writing a skill from scratch, you should first walk through the task step-by-step with the AI agent. Once you achieve a successful outcome, have the agent write the skill based on that proven conversation.
Do I need to use agent.md or claude.md files?
According to expert Ras Mic, 95% of users do not need these files. They should only be used for proprietary information that must be referenced in every single interaction with the agent.