Gyms for Them, Mirrors for Us – O’Reilly

Personal AI doesn’t have to run your life to change it. It just must see you clearly and feed your behavior back to you in a way you can’t dodge. Once you look at AI as feedback loops instead of little butlers, the whole “agent” conversation starts to feel upside down.

We’ve overrotated on agents that act and massively underinvested in systems that watch, interpret, and train, for humans and for models.

Stop shipping little butlers

Most personal AI demos orbit the same fantasy: inbox‑zero sidekicks, calendar‑tuning bots, or agents that “just handle it” so you can “focus on what matters.” They’re great on stage but terrible as a risk posture.

The butler model hides a simple asymmetry. A read‑only system that misinterprets you is mildly annoying; you ignore a bad suggestion. A write‑enabled system that misfires in your inbox or CRM is career-limiting. One error is a shrug, the other is an incident report.

That’s the asymmetric agent in one line: Read is cheap; write is expensive. Read can be broad, but write should be narrow, rare, and very hard‑earned. The first, highest‑leverage thing you can build is a mirror: an AI that reads your digital exhaust, synthesizes what it sees, and reflects it back, without ever touching the systems that move money, time, or relationships. Šimon Podhajský’s talk, “Cognitive Exhaust Fumes, or: Read‑Only AI Is Underrated,” is a great example of this pattern in the wild.

This isn’t a temporary sandbox before “real agents.” Treating read‑only as a stepping stone and write as the prize is how you hand a chainsaw to a toddler because they’ve proven they can hold a spoon.

Cognitive exhaust is the real dataset

Your day produces a ridiculous amount of cognitive exhaust: emails half‑written, tabs abandoned, tasks snoozed, articles skimmed, and notes forgotten. Any one stream is noisy. The value appears when you correlate across all of them.

A serious personal AI can sit over multiple sources—mail, calendar, notes, browser history, docs, and CRM—and build a cross‑cutting view of what you do versus what you say you care about. You want it a bit judgmental. You want it to surface things like:

Intention–action gaps: projects you “prioritize” but never touch
Attention drift: where your time really went
Relationship decay: people you insist are key but haven’t contacted in months

Podhajský’s system does exactly this, using a read‑only agent that writes only into its own Obsidian vault—no edits to the original systems, no auto‑emails, just brutally honest reflections and suggested experiments.

Here’s the trap: Your agent must only observe. The moment an agent writes back into the systems it’s monitoring, you’ve poisoned the well. You’re not observing your behavior anymore; you’re observing an AI‑amplified feedback loop. You’ve built an observability rig that forges its own logs. The data stops being “you” and becomes “you plus a stochastic autocomplete with opinions.”

For personal AI, that’s existential. If the whole point is to help you see yourself more clearly, having the same system both author and interpret the traces destroys the value proposition. The mirror starts painting your reflection.

Feedback loops, not party tricks

Seen as feedback loops, the symmetry becomes obvious.

A mirror is a loop targeting your nervous system. The “model” being updated is the human. The exhaust is your digital activity. The environment is your toolchain. The reward shows up as shame, insight, or resolve when you see your week laid bare.

A gym is a loop targeting model weights. The model acts in a world, receives rewards or penalties, and updates its policy. The exhaust is trajectories of prompts, actions, outcomes. The environment is a task harness. The reward is a verifiable signal.

Two different learners, same structure:

In the mirror, the user is the learner and the agent is a silent observer.
In the gym, the model is the learner and the environment is the judge.

Both are broken for the same reason: We obsess over agents doing flashy things and neglect the quality of the signal that trains the system—human or model. We ship chatty butlers and call it “intelligence” instead of asking, “How clean is the feedback?”

Environments are the new unit of deployment

On the model side, we’re still trying to prompt‑engineer our way into reliability. That’s cute for prototypes but reckless for systems you depend on.

We spent 20 years perfecting CI/CD for deterministic code—version control, reproducible builds, test harnesses, staging, blue‑green deploys—all so we could ship well informed. Meanwhile, we vibe‑check stochastic agents into production with a handful of prompts and a cherry‑picked demo.

A more sensible default is to treat the environment definition—the code and configuration that specify the world the model lives in—as the unit of deployment. Libraries like Verifiers make this concrete by packaging environments for LLMs with tools, datasets, parsing logic, rewards, and rollout policies in one place.

To make that definition precise, you need four anchors:

State schema: The shape of the world the environment exposes to the model at each step (fields, types, invariants)
Action interface: The tools or functions the model is allowed to call, with their inputs and outputs
Reward spec: The checks you run to score behavior (correct/incorrect, passed/failed, right tool, right schema)
Rollout policy: How you exercise the environment (single‑turn versus multi‑turn, maximum steps, termination conditions)

You’re not “deploying state” in the sense of a frozen snapshot of production. You’re deploying the rules of the game: what the model can see, what it can do, how you score it, and how you run episodes. Any candidate model you plug into that environment is evaluated and constrained the same way. You then treat that environment definition like a test suite plus staging cluster, comparing models on behavior that matters for your workflow, training smaller, specialized models using verifiable rewards instead of vibes, and detecting regressions when either models or tools change.

For enterprises, this means you don’t “deploy an LLM” with some prompts. You ship an environment package: code, config, and test data that define the world; plus metrics and logging. The model is a plug‑in you can swap or retrain based on how it behaves inside that package, not in an ad hoc prompt sandbox.

Observers, gyms, and asymmetric agents

Mirrors and gyms are both environments built around feedback loops. The difference is who’s allowed to touch reality.

Mirrors watch you. The AI reads broadly, writes only to its own notes, and hands you structured feedback. You learn; you act.
Gyms watch the model. The AI acts inside a sandbox, collects rewards, updates its weights. The model learns; the environment constrains.

Agents—the things that take actions in live systems—should sit downstream of both. They should be asymmetric by design:

In production, agents default to read‑only or read‑mostly. Write access is narrow, logged, reviewable, and easy to kill.
In training and evaluation, agents can be fully read‑write but only inside deliberately engineered environments.

Anything else is YOLO alignment: You train in production, corrupt your own telemetry, and then argue with the logs when something goes wrong.

Think of it as risk management for agents. Every new write permission expands the blast radius. If you haven’t instrumented the read path, you’re taking on unpriced risk. Gyms for them, mirrors for us, asymmetric agents at the edges—that’s a risk posture you can explain to an auditor.

Butler agents are security theater

Now add security to the mix. Simon Willison’s “lethal trifecta” of agent risk is simple: private data, untrusted inputs, and external communications. Get all three in one agent and you’ve basically handed an attacker a loaded gun.

Most “do‑everything” butler agents proudly hit the trifecta: They ingest piles of sensitive internal data, they cheerfully process whatever the internet throws at them, and they’re allowed to send emails, modify records, or call external APIs. You’ve built a hyper‑efficient exfiltration and amplification engine.

Observer AI pulls in the opposite direction. It can still see private data but uses it only to generate internal reflections or drafts. It treats untrusted inputs as something to analyze, not something to obey. And it doesn’t touch external channels; you stay in the loop.

Butler agents give executives the feeling that “AI is doing work for us” while dramatically increasing the blast radius of prompt injection, model hallucinations, or compromised keys. Observers are actual governance: They help humans see, reason, and decide before anything gets written where it counts.

In the enterprise, “agentic workflows” without observer environments are just shadow IT with better branding. If you can’t instrument and audit what the system reads, you have no business trusting what it writes.

Boots on the ground: The friction is real

This isn’t just a whiteboard problem. In big bank reality, the conversation often goes like this:

Client: “We want an AI assistant that updates customer records, sends follow‑ups, and opens tickets automatically.”

Me: “Great. Show me your observability. How do you know what it’s reading today and how those reads map to actions?”

Client: “…we have logs?”

Say, “No, your shiny new bot should not have direct write access to the CRM,” and the first reaction is disbelief. Then come the workarounds: “What if it drafts and auto‑sends unless someone clicks reject?” “What if it only updates ‘safe’ fields?” “What if the human is technically in the loop but the default is accept?” All of them duck the hard work of building the mirror and the gym first.

In a post‑GDPR, postbreach world, an observer that doesn’t push data is a compliance gift. A write‑enabled agent is a data‑deletion nightmare and a discovery headache. We’re desperate to give agents hands before we’ve given ourselves eyes. Until you can trace the read path—what’s accessed, why, and with what downstream effect—every new write permission is architectural debt with a ticking clock.

A simple playbook

If you’re trying to bring order to this chaos, here’s a blunt playbook.

Build observers first
Aggregate your cognitive exhaust—or the org’s. Start with a read‑only layer across mail, tickets, docs, code, CRM, usage logs. Have it produce structured reflections: where work happens, where intent and action diverge, and where relationships or processes are decaying. Let it write only into its own vault.

Encode scary workflows as environments
Pick high‑risk, high‑value flows: claims adjudication, payment routing, change approval, remediation—anything with money, legal exposure, or brand risk. For each, define an environment with clear state schema, action interface, reward spec, and rollout policy. Use frameworks like Verifiers to make these reusable instead of bespoke scripts.

Treat environments as deployable artifacts
Think of an environment as a repo you can clone—not a frozen copy of production but the minimum code, configuration, and sample data needed to exercise a workflow reproducibly. You version, test, and promote that environment package the way you do services. When APIs, schemas, or policies change, you update the package and rerun the suites. You don’t “prompt harder” in production and hope.

Only then, grant narrow write access
Once mirrors and gyms are in place, start handing out tightly scoped write capabilities—one surface at a time, with metrics and rollback. And have your observers watching both human and agent behavior for drift. This is slower. It’s also professional.

Rethinking “personal” and “agentic”

Reframing AI around feedback loops does odd things to our buzzwords. “Personal AI” stops being “a bot that talks like you and acts for you” and becomes “an observability layer on your own cognition.” It’s closer to therapy than outsourcing. Therapy doesn’t send emails for you; it changes how you write them.

“Agentic AI” stops being “a thing that chains tools together” and becomes “a thing that lives inside an environment with explicit constraints and signed‑off rewards.” The swagger moves from the model to the environment. The question shifts from “How smart is your agent?” to “How well‑designed is the world you’re letting it inhabit?”

Gyms for them, mirrors for us. Agents only where the feedback loops are strong enough to justify the risk. Less demo‑friendly than a bot that spams your calendar, sure. But a lot closer to something you can live with—in your personal life, and in a production architecture that must survive contact with reality.