The Future of AI: Insights from Google DeepMind, Groq, & VCs

The New AI Economy Is Already Here

Forget the hype cycle graphs—Matthew Berman opens his Forward Future Live panel by declaring that the AI gold rush has quietly changed phases. Raw model capability still matters, but the real battle now centers on efficiency, latency, and whether anyone can turn foundation models into durable, profitable products. The race is no longer “Who has the biggest model?” but “Who can deliver intelligence at the lowest cost per query, with the least friction, at global scale?”

Conversation on his stream no longer orbits abstract AGI timelines. It drills into the build‑out of AI factories: hyperscale data centers, on‑prem clusters, and edge deployments tuned for nonstop inference. Berman and his guests talk like operators, not futurists—obsessing over utilization rates, throughput per watt, and how to wire models into existing workflows without blowing up compliance or budgets.

AI factories, as framed here, mirror industrial plants more than research labs. They require predictable supply chains for chips and power, standardized “assembly lines” for data and fine‑tuning, and SLAs measured in milliseconds and nines of uptime. Enterprises now ask about RPO, SOC 2, and vendor lock‑in before they ask about model parameter counts.

Berman’s panel lines up four pillars of this new AI economy. From hardware, Groq’s Sunny Madra pushes ultra‑low‑latency LPUs; from models, Google DeepMind’s Logan Kilpatrick represents Gemini and Google’s agent roadmap; from capital, Emergence Capital’s Joseph Floyd speaks for growth‑stage SaaS; from agents, Augment’s Guy Gur‑Ari focuses on workflow automation in real companies. Together they map the stack from silicon to user interface.

Each pillar solves a different piece of the same equation. Hardware must crush inference cost, models must stay state‑of‑the‑art yet controllable, capital must fund business models that survive post‑novelty churn, and agents must turn API calls into measurable productivity. None of that works in isolation.

Central tension runs through the whole conversation: viral demos versus systems that actually scale. Berman keeps returning to a simple filter for 2026: can these four layers integrate tightly enough that a CFO signs a multi‑year contract, not just a pilot? That, more than benchmark scores, defines the new AI economy already taking shape.

Your LLM is Too Slow. Speed is the New Moat.

Latency, not model size, will decide who wins the AI platform war. Sunny Madra from Groq argues that by 2026, most AI dollars will flow into inference, not training, because that is where users actually feel the product. A 300-billion-parameter model that takes three seconds to respond loses to a smaller, cheaper model that answers in 100 milliseconds.

Traditional GPU stacks behave like sports cars stuck in rush-hour traffic: fast on paper, unpredictable in practice. GPUs juggle thousands of threads, context switches, and mixed workloads, so token times fluctuate wildly—50 ms one moment, 400 ms the next. That jitter kills experiences like live voice agents, where humans detect delays above roughly 200 ms.

Groq’s LPU architecture flips the script by going deterministic. Instead of general-purpose compute, LPUs run token-by-token pipelines with fixed execution paths, so you can literally quote latency per token—e.g., ~10 ms/token—under load. Developers can design products around guarantees, not averages.

When cost-per-token collapses by 10x–100x and latency becomes boringly predictable, entire product categories unlock. Real-time agents that listen, reason, and talk back in under 150 ms suddenly feel like human conversation, not a call center IVR. Complex chain-of-thought reasoning—hundreds or thousands of tokens of internal deliberation—stops being prohibitively slow and expensive.

Economic gravity then shifts. If a support agent costs $0.10 per interaction instead of $3, companies can route nearly every touchpoint through an AI layer. If a coding assistant can run multi-step refactors locally in under a second, developers stop context-switching and start relying on AI for continuous pair programming.

The future “winning stack” looks less like “best model wins” and more like “good-enough model on blisteringly fast, cheap inference fabric.” That stack pairs: - A strong but not necessarily frontier LLM - Specialized inference hardware like LPUs or optimized ASICs - Aggressive compilation, caching, and batching at the infra layer

Users will not ask which model powers their assistant; they will only feel whether it talks back instantly and costs almost nothing. Speed becomes the moat, and inference silicon becomes the battlefield.

Google’s Plan for a 'Personal Intelligence' Agent

Google DeepMind’s Logan Kilpatrick frames Gemini’s roadmap as a shift from general-purpose model to persistent “personal intelligence” that actually lives in your life. Not just answering prompts, but sitting across Gmail, Calendar, Drive, Docs, and Chrome, constantly ingesting signals to anticipate what you need before you ask.

That means Gemini as a standing agent that knows your travel patterns, recurring meetings, and unread threads, then proactively drafts replies, reschedules conflicts, and surfaces relevant files. Think “auto-briefing” every morning: key emails, meetings, documents, and news tailored to your current projects, not a generic feed.

To get there, Google has to wire Gemini directly into user context at massive scale. Kilpatrick’s vision implies APIs that expose structured slices of your personal graph—messages, events, tasks, browsing—while enforcing strict scoping, revocation, and auditability for every app that touches it.

Developers will demand three core primitives: - Fine-grained, per-dataset permissions (e.g., “read calendar only, no email”) - Verifiable logs of agent actions and data access - Sandboxed execution so agents cannot exfiltrate or leak user data

Google also needs policy-level guardrails that encode safety, not just rely on model behavior. Expect default-deny access, time-bounded tokens, on-device filters, and “view-only” modes, plus enterprise controls that let admins centrally govern which Gemini-powered agents can run inside a company’s stack.

Privacy and trust sit at the center of this strategy. Chrome’s 3+ billion user footprint, Gmail’s 1.8+ billion accounts, and Android’s 3+ billion active devices give Google an unprecedented context surface—but also a giant liability if any agent misbehaves or over-collects data.

Sundar Pichai has already flagged the collision course between powerful agents and the existing web ecosystem. If Gemini summarizes everything, publishers lose pageviews, ad impressions, and direct relationships, especially as agents answer in-place inside Search, Android, and Chrome.

To avoid detonating the open web, Google must treat publishers as first-class participants in the agent economy. That could mean structured “agent feeds,” revenue-sharing on AI answers, and explicit integration hooks—similar to how hardware players like Groq expose low-latency inference platforms via Groq – Official Site while still depending on a healthy ecosystem of apps and content.

The VC Filter: Separating Real Value from AI Hype

Honeymoon for “AI-powered” pitch decks ended months ago. Joseph Floyd, a partner at Emergence Capital, describes a market where investors now ask a blunt question first: does this actually move a P&L line item, or is it just a slick demo wrapped around OpenAI’s API?

VCs are tearing into unit economics. Founders must show how AI changes customer acquisition cost, gross margin, or expansion revenue, not just feature checklists or vague productivity claims.

CAC becomes the first stress test. If a startup adds an AI co-pilot to sales outreach, Floyd wants proof that outbound conversion rates rise 20–30%, or that reps handle 2–3x more accounts without burning out or churning.

Margins sit under equal pressure. A team claiming “AI automation” must show fewer support tickets per customer, shorter resolution times, or a measurable reduction in headcount per $1 million of ARR, not just higher cloud bills from inference.

Defensibility has quietly become the new moat obsession. With model commoditization—Gemini, GPT-4.1, Claude, open weights—Floyd argues that raw model access no longer differentiates; everyone can call the same APIs for a few cents per thousand tokens.

Real moats form around three assets: - Proprietary or hard-to-replicate data - Unique, high-friction workflows deeply wired into operations - Distribution advantages like embedded partnerships or existing SaaS footprints

Proprietary data means more than an S3 bucket of logs. Emergence-backed founders talk about labeled workflows, outcome data, and customer-specific ontologies that let their models learn patterns no public model sees, creating compounding performance gaps.

Workflow depth matters just as much. An AI product that lives only as a Chrome extension or chat sidebar looks fragile; one that rewires how invoices get approved, code gets shipped, or deals get forecast becomes impossible to rip out without breaking the business.

From Emergence’s B2B SaaS lens, the strongest AI startups look less like tools and more like systems of record with an embedded brain. They sit on top of core data, orchestrate actions across apps, and become the default place where work starts and gets measured.

ROI becomes the final arbiter. Floyd pushes teams to quantify time-to-value in weeks, not quarters, and to prove retention with cohort data: if AI truly changes workflows, net dollar retention should climb above 120%, and expansion should feel inevitable, not optional.

Rise of the AI Workforce: How Agents Will Change Your Job

Forget sci-fi agents that run your life; Guy Gur-Ari is busy building ones that quietly run your inbox. At Augment, his team wires LLM-powered agents directly into the tools that define modern knowledge work: Gmail, Salesforce, Jira, Notion, and a forest of internal dashboards. The mandate: shave minutes off thousands of tiny tasks until entire roles look different.

Augment’s customers don’t start with moonshots; they start with email triage. Agents read inbound threads, classify intent, draft responses, and route messages to the right human or system. For sales teams, another agent updates CRMs automatically—logging calls, syncing notes, closing opportunities—so reps stop spending 30–40% of their day on data entry.

Instead of one godlike “super agent,” Gur-Ari argues for a swarm of narrow, dependable workers. One agent specializes in weekly pipeline reports; another compiles customer health scores; a third reconciles billing discrepancies. Each might save only 5–10 minutes per user per day, but across 5,000 employees that compounds into millions of dollars in annual productivity.

This modular approach also lets enterprises phase adoption. A company might roll out three agents first: - Email triage for support queues - Automatic CRM hygiene - Standard report generation for finance and ops

Once those prove reliable—single-digit error rates, measurable time savings—teams expand to more complex workflows. Gur-Ari frames it as building an AI workforce, not a single assistant: you hire agents, give them a job description, and watch their metrics.

Model capability rarely blocks deployment anymore. GPT-4-class systems already write solid emails, SQL queries, and summaries. The real walls are reliability, security, and auditability: can you trust an agent with customer data, and can you see exactly what it did at 3:17 p.m. last Tuesday?

Augment solves this with strict scopes and full action logs. Agents operate under least-privilege access, every API call gets recorded, and humans can replay decisions step by step. For regulated industries—finance, healthcare, big SaaS—no audit trail means no deployment, no matter how smart the model looks in a demo.

The 'AI Factories' Powering This Revolution

AI now runs on concrete, copper, and cooling towers. Hyperscalers are racing to stand up dedicated AI factories—single campuses drawing 500+ megawatts—just to keep up with model upgrades and the explosion in inference demand that Sunny Madra argues will dominate spend.

Microsoft, Google, Amazon, and Meta have quietly shifted from “add GPUs to existing regions” to designing AI-only data centers with custom power substations and on‑site substations. Microsoft has reportedly committed over $100 billion to new AI infrastructure, while Google and Amazon track close behind with multi‑year, multi‑tens‑of‑billions capex plans.

Inside those buildings, NVIDIA still rules, but not alone. Hyperscalers now juggle a zoo of accelerators: NVIDIA H100/B100, AMD Instinct parts, and homegrown chips like AWS Trainium and Inferentia, Google’s TPU v5p, and Meta’s MTIA, each tuned for different model sizes and workloads.

That mix creates a brutal optimization puzzle. Cloud teams now decide not just “how many GPUs?” but which silicon, which interconnect, and which region can even deliver the required megawatts without tripping local grid limits or regulatory alarms.

Berman has hammered on this in his newsletter: AI data centers already account for an estimated 2–3% of global electricity use, with some projections pushing AI-related demand toward 4–6% by 2030. Local utilities in Northern Virginia, Dublin, and parts of Oregon have started delaying or throttling new data center hookups because grids cannot expand fast enough.

Backlash is building. Community groups push moratoriums, regulators scrutinize water usage for cooling, and governments ask why AI chatbots should compete with housing and transit for scarce electrons. That political pressure collides directly with the hyperscaler arms race.

Against that backdrop, performance-per-watt stops being a nice-to-have and becomes survival math. Groq’s LPU pitch—lower latency, higher tokens-per-second, and better efficiency per watt—suddenly aligns with grid constraints, ESG mandates, and enterprise cost models.

Even Google’s own push toward more efficient Gemini deployments and agentic workloads, which Logan Kilpatrick hints at, surfaces in infrastructure strategy documents and on resources like Google DeepMind – Official Site.

Beyond the API Wrapper: What VCs Actually Fund Now

Joseph Floyd does not mince words: the “API wrapper” era is dead. Emergence Capital now screens AI pitches on one brutal axis—would this product have any reason to exist without machine intelligence at its core, or is AI just a shiny feature stapled onto SaaS?

For Floyd, an AI-native workflow rewires how work happens, not just how fast a button clicks. A sales platform that auto-writes emails is incremental; a system that continuously monitors pipeline, drafts outreach, reprioritizes accounts, and executes campaigns across channels with minimal human input is a new workflow entirely.

True AI-native products embed models in the feedback loop of the job itself. They watch actions, learn preferences, and then start taking initiative—flagging anomalies in finance, proposing code changes, or routing support tickets without being explicitly told every rule.

That creates a go-to-market problem most founders underestimate. You are not selling static software; you are selling a tool that behaves differently on day 1, day 30, and day 365 because it keeps learning from usage and data.

Floyd pushes teams to design a GTM playbook that explains that evolution up front. Early adopters get a clear narrative: baseline value on week one, visible improvement by week four, and compounding automation by quarter two as models fine-tune on customer data.

Successful AI-native GTM often leans on land-and-expand motions tied to measurable lift. Investors want to see metrics like 30–50% cycle-time reduction, 10–20% revenue lift, or headcount-neutral scaling, not vanity “prompts per day” charts.

Emerging winners follow two patterns. Either they own a defensible slice of the infrastructure stack—think Groq’s LPU hardware or specialized vector databases—or they dominate a vertical with a tight data flywheel and hard-to-replicate corpus.

Vertical leaders look more like infrastructure than apps over time. A legal AI that ingests millions of contracts, annotations, and outcomes, or a healthcare assistant tuned on clinical notes plus outcomes data, accumulates proprietary signal that a generic LLM API customer never sees.

Data flywheels separate toys from platforms. The more customers use the product, the more labeled interactions, corrections, and edge cases it captures, which directly improve model performance and deepen lock-in.

Floyd’s filter is simple and ruthless: if switching to another model provider would erase most of your advantage, you do not have a company, you have a feature. Founders who understand that are the ones still getting term sheets in 2026.

Can We Trust Our Digital Co-Pilots?

Can you trust a bot to poke around your HR system, inbox, and CRM while you sleep? Guy Gur-Ari argues that until enterprises can answer that with a confident yes, agents stay on a tight leash. The next wave of AI isn’t about smarter chat—it’s about operational control and traceability.

Reliability now means more than “usually gives the right answer.” Enterprises want a Git history for agents: a tamper-proof log of every action, input, tool call, and decision path. If an AI misfires on payroll or discounts, teams need one-click rollback that restores previous state across SaaS tools and internal systems.

That’s pushing vendors to build full auditability stacks: time-stamped traces, structured reasoning logs, and replayable sessions. Think Datadog or Splunk, but for agent cognition and workflows. If a co-pilot changes 1,000 Salesforce records, security teams expect to see who authorized it, what prompt triggered it, and which policy allowed it.

Security and data privacy sit even higher on the checklist. Agents want to sit across email, HR platforms, and CRMs, but CISOs see an expanding blast radius: one compromised agent key, and suddenly it can read executive email and pull HR comp bands. Zero-trust isn’t optional; it’s the design constraint.

Modern agent stacks increasingly mirror human access models. Enterprises demand: - Per-user OAuth and SSO, not shared service accounts - Fine-grained scopes per tool (“read-only calendar,” “no attachments”) - On-the-fly redaction and data loss prevention before prompts hit the model

Path to real deployment looks aggressively incremental. Gur-Ari and others see companies start with low-risk, high-frequency tasks: drafting status emails, summarizing tickets, updating non-critical CRM fields. These jobs touch real data but can’t tank a quarter if something goes sideways.

Once agents prove they can run thousands of these micro-workflows with 99%+ success and clean audit trails, enterprises widen the lane. Only then do they let AI touch revenue operations, procurement approvals, or HR workflows—where a single hallucinated action can trigger legal review, not just an eye-roll in Slack.

The Battleground Shifts from Models to Ecosystems

Benchmarks made sense when GPT-3 versus PaLM looked like a horse race. Now, with GPT-4.1, Claude 3.5 Sonnet, and Gemini 1.5 Pro all “good enough” for most tasks, raw model scores feel like arguing over supercar lap times in a city full of traffic. Power shifts from single models to ecosystems that bind silicon, software, and distribution into one compounding loop.

Hardware sits at the base of that stack. NVIDIA still owns most training, but inference is fragmenting fast: Groq’s LPU architecture posts sub-50 ms end-to-end responses on 70B-parameter models, while GPU clusters often struggle to stay reliably under 300 ms at scale. That latency gap doesn’t just feel nicer; it decides whether an AI co-pilot can live inline in your IDE, inbox, or CRM without driving users back to keyboard shortcuts.

On top of that silicon, model intelligence becomes a feature, not the product. Open-source models like Llama 3.1 and Phi-3 close capability gaps monthly, especially when tuned on proprietary data. Sunny Madra’s point lands hard: whoever runs those models fastest, cheapest, and most predictably wins the right to sit in every workflow.

Google’s answer leans on distribution gravity. Gemini wired into Search, Android, and Workspace gives Logan Kilpatrick’s “personal intelligence” agent instant reach to billions of users and petabytes of behavioral data. Every doc edit, Meet transcript, and Gmail thread becomes training signal for better suggestions, summarization, and autonomous actions.

Groq plays the opposite card: own the inference layer, then let open-source models and independent developers swarm on top. That strategy treats models as interchangeable cartridges, with Groq hardware and tooling as the persistent platform. Low-latency APIs plus transparent pricing invite SaaS founders and enterprises to standardize on Groq for production workloads.

Investors like Joseph Floyd see this as a four-pillar game: hardware, models, developer tools, and distribution. Emergence Capital – Official Site spells out that defensible AI-native companies tie all four into a flywheel: - Faster, cheaper hardware unlocks new real-time applications - New apps generate proprietary workflows and data - Better data improves models and agents - Superior products attract more users, revenue, and capital

Whoever closes that loop fastest sets the rules for AI’s 2026 economy.

Your Action Plan for the Agentic Age

AI’s agentic wave will not wait for perfect strategy decks. Over the next 18–24 months, winners will be the people who treat agents like a new runtime for work: fast, observable, and wired directly into high-frequency workflows, not just chat windows.

Builders and developers should obsess over latency. Users bounce when responses cross 1–2 seconds; by 10 seconds, engagement falls off a cliff. That puts inference front and center: experiment with Groq-style LPUs, NVIDIA GPU variants, and emerging specialized hardware APIs from AWS, Google Cloud, and Azure to benchmark cost per 1,000 tokens and real-world response times.

Focus product bets on one painful, repeatable job. Think “triage every inbound support email,” “prepare sales briefings from CRM + email,” or “close monthly books from ERP exports.” Design an agentic workflow that owns the loop: observe tools, decide, act, then summarize for a human, with strong guardrails and replayable logs.

Investors should assume foundation models commoditize. Gross margins collapse if a startup cannot drive down inference costs or negotiate better infra. Press teams on: - Unit economics per task, not per seat - Proprietary data advantages - Workflow lock-in and switching costs

Look for products where usage grows with data and process depth, not just user count. A defensible moat in 2026 looks like a proprietary ontology of a domain, embedded across thousands of customer workflows, continuously fine-tuned on real outcomes.

Business leaders need a sandbox, not a moonshot. Start with internal, low-risk agents: knowledge search across docs, meeting summarization, ticket triage, or expense classification. Use these pilots to build an institutional playbook for security, privacy, and auditing before agents touch customers or money.

Codify rules for: - Data access and retention - Human-in-the-loop approval thresholds - Incident response when agents misbehave

Frequently Asked Questions

What is the main argument for specialized AI hardware like Groq's LPU?

Specialized hardware like LPUs dramatically lowers latency and cost-per-token for AI inference. This makes real-time, conversational AI experiences feasible and affordable at a massive scale, shifting the competitive focus from model training to model serving.

How are AI agents evolving beyond simple chatbots?

They are becoming 'personal intelligence' systems that understand user context and can orchestrate complex actions across multiple applications (email, CRM, docs). The goal is to create proactive assistants that automate entire workflows, not just answer questions.

What are venture capitalists looking for in AI startups now?

VCs are moving past the initial hype, prioritizing startups with AI-native workflows, proprietary data moats, and a clear ROI for customers. They are scrutinizing unit economics and defensibility against commoditized foundation models.

What is an 'agentic workflow'?

An agentic workflow is a process where an AI agent automates a series of interconnected tasks across different software tools to achieve a complex goal. For example, an agent could monitor a sales CRM, generate a performance report, and then draft a summary email to the team.

AI's Next Tsunami: Experts Reveal 2026 Game Plan