Beyond Pilot Purgatory – O’Reilly

The hard truth about AI scaling is that for most organizations, it isn’t happening. Despite billions in investment, a 2025 report from the MIT NANDA initiative reveals that 95% of enterprise generative AI pilots fail to deliver measurable business impact. This isn’t a technology problem; it’s an organizational design problem.

The reason for this systemic failure is surprisingly consistent: Organizations isolate their AI expertise. This isolation creates two predictable patterns of dysfunction. In one model, expertise is centralized into a dedicated team—often called a Center of Excellence (CoE). While intended to accelerate adoption, this structure invariably becomes a bottleneck, creating a fragile “ivory tower” disconnected from the business realities where value is actually created. Business units wait months for resources, incentives become misaligned, and the organization’s overall AI literacy fails to develop.

In the opposite model, expertise is so distributed that chaos ensues. Autonomous business units build redundant infrastructure, hoard knowledge, and operate without coordinated governance. Costs spiral, incompatible technology stacks proliferate, and the organization as a whole becomes less intelligent than its individual parts.

Both approaches fail for the same underlying reason: They treat AI development as a separate activity from the core business.

The numbers confirm this struggle. Gartner predicts that 30% of GenAI projects will be abandoned after proof of concept by 2025 due to poor data quality, inadequate risk controls, and escalating costs. McKinsey’s State of AI in 2025 report reveals that while adoption is high, only one-third of organizations have scaled AI enterprise-wide. Even fewer—just 5%, according to BCG—have built the capabilities to generate significant value at scale.

The organizations that have successfully scaled AI beyond this “pilot purgatory”—companies like JPMorganChase, Walmart, and Uber—didn’t choose between these broken models. They built a third way, discovering through pressure from reality that the only thing that works is an outcome-oriented hybrid architecture. This model combines centralized enablement with distributed execution, aggressive governance with operational autonomy, and technical excellence with a relentless focus on business value.

This isn’t abstract theory. The characteristics of these successful architectures are becoming clear enough to articulate—and specific enough to implement. Here is what actually works.

What Actually Works: Outcome-Oriented Hybrid Architecture

The organizations that have successfully scaled AI share surprising structural similarities—not because they all studied the same framework, but because they independently discovered the same operating model through trial and error.

This model has several key characteristics:

Platform teams with product thinking, not project thinking

Rather than treating central AI infrastructure as a cost center or a research lab, successful organizations build it as an internal product with defined customers (the business units), success metrics, and a roadmap.

Airbnb’s “Bighead” platform exemplifies this. The team didn’t just build ML infrastructure; they built a product that product teams could consume. Standardized feature engineering, model training, and deployment pipelines reduced development time from months to weeks. The platform team measured success not by research excellence but by adoption rates and time-to-market reductions for dependent teams.

Uber’s Michelangelo platform followed a similar pattern: develop shared ML infrastructure, price it internally to make resource allocation explicit, measure platform adoption and the business impact of applications built on it, and evolve the platform based on actual usage patterns.

Implementation reality: Platform teams need authority to make technical decisions while remaining accountable for business adoption. They require sustained funding separate from individual project budgeting. They need internal customers who participate in roadmap planning. Most organizations struggle with this because platform thinking requires executives to invest in capability that won’t generate revenue for 18+ months.

Outcome-driven embedded specialists, not isolated teams

Successful organizations don’t ask centralized AI teams to deliver solutions. They embed AI specialists directly into business value streams where they co-own business outcomes.

A telecommunications company we studied restructured its 50-person AI CoE by embedding team members into four core business units. Instead of business units requesting AI solutions, they now had dedicated specialists sitting in weekly operations meetings, understanding real problems, building real solutions, and feeling the pressure of business metrics. The result? Deployment speed increased 60% and adoption tripled.

The model works because:

  • Embedded specialists develop tacit knowledge about business constraints and operational realities that remote teams can never have.
  • They face direct accountability for outcomes, aligning incentives.
  • They become translators between technical and business languages.

Implementation reality: Embedding requires letting go of centralized command-and-control. The embedded specialists report dotted-line to central leadership but primary accountability to business unit leadership. This creates tension. Managing that tension (not eliminating it) is essential. Organizations that try to eliminate tension by centralizing authority again lose the benefits of embedding.

Dynamic governance, not static policies

Traditional governance models assume relatively stable, predictable environments where you can write policies in advance and enforce them. AI systems exhibit emergent behavior that governance can’t predict. You need frameworks that adapt as you learn.

JPMorganChase demonstrates this through its multilayered governance approach:

  • The Centralized Model Risk team reviews all AI systems before production deployment using consistent technical standards.
  • Domain-specific oversight committees in lending, trading, and compliance understand business context and risk appetite.
  • Ongoing monitoring systems track model performance, drift, and unintended consequences.
  • Clear escalation protocols activate when algorithmic decisions fall outside acceptable parameters.
  • Continuous improvement mechanisms incorporate lessons from deployed systems back into policies.

Want Radar content delivered straight to your inbox? Join us on Substack. Sign up here.

Implementation reality: Dynamic governance requires specialists who combine technical AI expertise with organizational knowledge and the authority to make decisions. These are expensive, scarce roles. Most organizations underinvest because governance doesn’t appear as a direct cost center. It gets underfunded relative to its importance.

Capability building, not just capability buying
Organizations that scale AI sustainably invest heavily in building organizational AI literacy across multiple levels:

  • Frontline workers need basic understanding of how to use AI tools and when to trust them.
  • Team leads and domain experts need to understand what AI can and can’t do in their domain, how to formulate problems for AI, and how to evaluate solutions.
  • Technical specialists need deep expertise in algorithm selection, model validation, and system integration.
  • Executives and boards need enough understanding to ask intelligent questions and make strategic decisions about AI investment.

Implementation reality: Capability building is a multiyear investment. It requires systematic training programs, rotation opportunities, and senior engineers willing to mentor junior people. It requires tolerance for people operating at reduced productivity while they’re developing new capabilities.

Measuring What Matters

Organizations caught in pilot purgatory often measure the wrong things. They track model accuracy, deployment cycles, or adoption rates. These vanity metrics look good in board presentations but don’t correlate with business value. Successful organizations understand AI is a means to an end and measure its impact on the business relentlessly.

Business outcomes: Track AI’s direct impact on primary financial and customer metrics.

  • Revenue growth: Does AI increase cross-sell and upsell opportunities through hyperpersonalization? Does it improve customer retention and Net Promoter Score (NPS)?
  • Cost and efficiency: Does AI increase throughput, lower operational cycle times, or improve first-contact resolution rates in customer service?
  • Risk reduction: Does AI reduce financial losses through better fraud detection? Does it lower operational risk by automating controls or reducing error rates?

Operational velocity: This measures time-to-market. How quickly can your organization move from identifying a business problem to deploying a working AI solution? Successful organizations measure this in weeks, not months. This requires a holistic view of the entire system—from data availability and infrastructure provisioning to governance approvals and change management.

Value-realization velocity: How long after deployment does it take to achieve a positive ROI? Organizations that track this discover that technical integration and user adoption are often the biggest delays. Measuring this forces a focus not just on building the model, but on ensuring it’s used effectively.

System resilience: When individual components fail—a key person leaves, a data source becomes unavailable, or a model drifts—does your AI capability degrade gracefully or collapse? Resilience comes from modular architectures, shared knowledge, and having no single points of failure. Organizations optimized purely for efficiency are often fragile.

Governance effectiveness: Is your organization proactively catching bias, drift, and unintended consequences, or are problems only discovered when customers complain or regulators intervene? Effective governance is measured by the ability to detect and correct issues automatically through robust monitoring, clear incident response procedures, and continuous learning mechanisms.

The Implementation Reality

None of this is particularly new or revolutionary. JPMorganChase, Walmart, Uber, and other successfully scaling organizations aren’t doing secret magic. They’re executing disciplined organizational design:

Start with business, not technology capability. Identify key business drivers and values that you measure, look at balance sheet levers, and see how AI can unlock value. Don’t build impressive systems for nonproblems.

Address technical debt first. You can’t deploy AI efficiently on fragile infrastructure. Many organizations waste 60%–80% of AI development capacity fighting integration problems that wouldn’t exist with better foundations. This doesn’t mean leaving speed behind but adopting a balanced infrastructure with clear integration points.

Design human-AI decision patterns intentionally. The most successful AI implementations don’t try to create fully autonomous systems. Instead, they create hybrid systems where algorithms handle speed and scale while humans maintain meaningful control. Commerzbank’s approach to automating client call documentation exemplifies this: Rather than replacing advisors, the system freed them from tedious manual data entry so they could focus on relationship-building and advice.

The pattern: AI proposes; rules constrain; humans approve; every step is logged. This requires API-level integration between algorithmic and rule-based processing, clear definitions of what gets automated versus what requires human review, and monitoring systems that track override patterns to identify when the algorithm is missing something important.

Invest heavily in governance before scaling. Don’t treat it as an afterthought. Organizations that build governance structures first scale much faster because they don’t have to retrofit controls later.

Embed AI expertise into business units but provide platform support. Neither pure centralization nor pure distribution works. The hybrid model requires constant attention to balance autonomy with coordination.

Accept that 18–24 months is a realistic timeline for meaningful scale. Organizations expecting faster transformations are usually the ones that end up with integration debt and abandoned projects.

Build organizational capability, not just buy external talent. The organizations that sustain AI advantage are those that develop deep organizational knowledge, not those that cycle through external consultants.

Why This Still Matters

The reason organizations struggle with AI scaling isn’t that the technology is immature. Modern AI systems are demonstrably capable. The reason is that enterprises are fundamentally organizational problems. Scale requires moving AI from skunkworks (where brilliant people build brilliant systems) to operations (where average people operate systems reliably, safely, and profitably).

That’s not a technology problem. That’s an operating-model problem. And operating-model problems require organizational design, not algorithm innovation.

The organizations that figure out how to design operating models for AI will capture enormous competitive advantages. The organizations that continue bolting AI onto 1980s organizational structures will keep funding pilot purgatory.

The choice is structural. And structure is something leadership can control.

Leave a Comment