This Week in AI: Production Viability – O’Reilly

On this week’s episode, host and the founder of AI advisory firm Intelligence Briefing Andreas Welsch brought together Maya Mikhailov, cofounder and CEO of Savvi AI, and Doug Shannon, generative AI and intelligent automation leader, to cover a handful of interconnected topics that practitioners are navigating right now: OpenAI’s push into personal finance, the role of metacognition in AI-assisted technical work, the growing backlash against token-based productivity metrics, and the new role of forward-deployed engineer. Together, these stories sketch a picture of an industry that’s good at generating output but is still figuring out what output is worth.

Why OpenAI wants your bank account data

When OpenAI announced it was analyzing users’ transaction data in partnership with financial institutions, the coverage focused on the consumer benefit: a smarter way to track spending, comparable to what Credit Karma or Mint offered but with a more conversational interface.

But that’s not all the company’s interested in, or even the main thing. Maya reframed the stakes: “What OpenAI wants to do is figure out consumer intent.” Being able to access users’ financial data is less about helping people manage their money and more about completing a profile the company can then monetize. OpenAI already builds a surprisingly accurate picture of users from their chat histories. Add transaction data and you get specifics that weren’t there before: what someone is saving for, what they’re anxious about, where their money is actually going. That’s a data asset worth a great deal to advertisers.

We’ve seen this pattern before, and as Andreas noted, companies have long held (and used) potentially invasive data to recommend products. The Target pregnancy prediction story is now more than a decade old, but it’s still being taught in business school, including by Andreas, precisely because it illustrates how behavioral data can be combined to infer things people haven’t explicitly disclosed—and spotlights the fine line between effective recommendations and those that feel too personalized, reminding consumers just how much information companies have on them. Companies’ profile-building capability hasn’t changed, but AI chat adds a new wrinkle, said Maya. A conversational interface makes disclosure feel natural, so the knowledge graph based on your chat history is very powerful. And these tools are also better positioned to share recommendations than traditional avenues. “By having this style that is agreeable, that is engaging,” Maya explained, “those recommendations are going to be a lot stickier than what a fragment of a sentence I type into a regular search engine.”

Metacognition as a professional skill

When you delegate thinking to a system that averages across a massive range of inputs to produce an answer, you need to know when that answer is good enough and when it isn’t.

“We’re essentially being averaged out,” Doug said. The model is doing many things behind the scenes to find a mean response. The human’s job is to ask questions about the questions, to push past the first answer, and to know whether their own judgment is still in the loop. That’s why Doug’s been pushing for a renewed interest in metacognition, or “thinking about thinking.” Offloading cognitive load that’s peripheral to your work is fine, Doug and Maya agreed. Offloading the reasoning that’s central to your job’s value—what Doug called cognitive surrender—is where organizations get into trouble.

The future advantage won’t come from access to AI. Everyone will have some kind of access to it. The advantage will come from knowing what to offload, what to question, and what should never leave human judgment. This is a skill-development question as much as a philosophical one. The people who’ll be most effective with AI tools aren’t the ones who use them most; they’re the ones who understand what to hand off and what to keep. That requires domain knowledge, judgment about when a model’s answer is plausible but wrong, and enough fluency with how these systems work to recognize when you’re being handed an average instead of an answer.

Tokenmaxxing and the wrong incentive

The tokenmaxxing debate seems to be coming to a head. Amazon abolished its AI productivity leaderboard after employees started gaming it by writing inefficient code to rack up token usage. And one company reportedly burned through $500M in Anthropic tokens in a single month after failing to set limits. The companies encouraging tokenmaxxing are incentivizing the wrong metrics, Maya argued. It’s like determining which bakery is best by the amount of flour it uses. The right question is “Are we making a quality product?”

Andreas shared his own vibe coding experience as an example of how token consumption and technical debt compound in practice. A developer starts with a modest plan and burns through their quota running agents in half an hour. They upgrade to a higher tier, paying five times more, but now the sunk-cost logic kicks in. As Andreas pointed out, now they feel like they “should also be getting five times more the value out of [their subscription],” so scope expands from a single tool into a unified business operating system. Three weeks later, the accumulated complexity has outpaced the ability to evaluate it: Repeated security audits keep surfacing new issues, each pass generating recommendations that require cybersecurity expertise most vibe coders don’t have. Here’s where Doug’s point about metacognition applies: The more a builder stays actively involved in understanding what the system is actually doing, the better their judgment about whether it is working. For less engaged users, the risk is accepting the output, shipping the debt, and discovering the consequences later.

Most of the misalignment originates in the gap between what executives expect from AI and what practitioners deal with day-to-day. Executives see a capability that could change the slope of productivity, Maya explained. Engineers and analysts live with the technical debt, the version control problems, and the regulatory constraints that don’t disappear because you have a better code completion tool. The leaderboard problem is a symptom of that disconnect.

GitHub’s recent shift from unlimited to usage-based pricing for Copilot is likely to realign these incentives faster than any internal policy change would. When more CFOs start seeing the actual bills, the leaderboards will all come down.

Doug identified a related problem emerging with the “cognitive surrender” to LLMs. When organizations encourage employees to pipe internal processes, proprietary logic, and institutional knowledge into foundation models without governance, they’re not just running up token bills. They’re giving away the operational knowledge that differentiates them. Process documentation, workflow logic, and institutional memory about why certain decisions were made are all forms of intellectual property, and once they’re encoded into a general-purpose model, the organization’s advantage from them diminishes.

Forward-deployed engineers aren’t enough on their own

Is the answer to these challenges to put a skilled engineer directly inside the customer environment to translate between what a model produces and what an organization actually needs? That’s the promise of the forward-deployed engineer (FDE) approach popularized by AI firms. Doug and Maya both had some criticisms of the model.

Maya’s objection was structural. Enterprise AI deployment isn’t a matter of adding capability on top of existing infrastructure. Organizations arrive with siloed data, legacy systems, and regulatory constraints that no forward-deployed engineer can resolve on technical skill alone. You can’t “just sprinkle some AI on it, and it’ll work just by a package of tokens,” she said. Engineers have to know the context behind why certain data can’t be used or why a particular model can’t be deployed in a regulated context. FDEs coming into an organization fresh don’t have this understanding and as a result may undo decisions that were made carefully and for reasons that aren’t written down anywhere obvious.

Doug’s concern was about communication. FDEs, in his experience, tend to arrive with strong technical instincts and limited organizational context. They get into the work quickly but struggle to communicate across the full stack of stakeholders involved. That’s why business analysts exist, to understand the customers’ problems and what the process actually is before engineers can address them. Skip that step and you get technically correct output that solves the wrong problem.

What both Maya and Doug were underscoring is that AI deployment at the enterprise level is fundamentally a context problem. The models are capable. What’s hard is knowing which capability to apply, where to do it, and with what constraints in place. That knowledge doesn’t live in the model; it lives in the people who’ve worked inside the organization long enough to know why things are the way they are.

The measurement problem

All the topics in this episode circle back to the same question: What are we actually measuring, and what incentives are we setting in place with those measurements? Token counts and lines of code don’t always correlate to the outcomes companies want. You need human expertise and a contextual knowledge of the business to figure out what goals you want to achieve and what to measure to ensure you get there.

On next Monday’s episode of This Week in AI, RecoMind founder Miguel Fierro joins host Christina Stathopoulos to discuss responsible AI, multimodal content creation, and more on how LLMs are changing personalization and user understanding. Miguel will also lead a live demo that offers a glimpse of the next generation of recommendation experiences—register here.

We’ll continue to publish our takeaways here on Radar each Friday and share full episodes on YouTube, Spotify, Apple, or wherever you get your podcasts.