
Another day, another example of an AI Agent “running rogue” and doing something the human operator didn’t want it to do. The tl;dr is that Jeremy (Jer) Crane, founder of PocketOS, was using Claude to perform some routine DB maintenance. Claude then proceeded to delete the production database and all backups hosted at their cloud provider, Railway. To their credit Railway managed to recover the lost data. The initial deletion took less than 10 seconds; I’m sure the recovery took much longer. Let’s look at what we can learn from what happened, and why AI is really just an amplifier of existing issues, rather than the cause itself.
We know about the incident because Jer wrote about it after it happened. First, taking time to reflect after something goes wrong is important; it’s how we learn. Sharing your mistakes with the world can be difficult, but it creates chances for us all to learn from each other. Second, I’ve seen a lot of people publicly dunking on both PocketOS and Railway. I would guess that none of those people have ever experienced the sheer terror and panic that happens during an incident like this. The feeling that you just want the ground to open and swallow you whole. It’s a feeling I’ve only experienced once or twice before, and it’s not an experience I’m keen to repeat.
One point in Railway’s credit is that they got PocketOS’s data back. If you called for a deletion via the APIs on AWS, Azure, Google Cloud or whatever, using a valid credential, that data is gone—unless you have your own backups of course. AWS et al. aren’t maintaining backups of customer data to hedge against customer mistakes. This is your yearly reminder to look into the 3-2-1 backup strategy.
What can we learn about what happened? Well, for all the discussion around how this is AI’s fault, what we have here is a much simpler example of common system weaknesses being exploited both accidentally and at speed.
What Did Claude Do?
Claude had been asked to carry out a task against PocketOS’s staging environment. The agent hit an issue, searched out and found a long-lived API token which gave access to production, and then proceeded to delete the production volume that contained both the production databases and the backups.
When asked what had happened, Claude’s reaction was objectively funny. It seemed to be totally aware of what went wrong, and what it should have done instead. This implies a set of reasoning that was not evident during the actual operation itself—I do wonder if recent attempts to reduce how much reasoning Claude does in certain modes to reduce token use—and Anthropic’s operating costs might partly be to blame.
Breaking it all down, there seem to be a couple of fairly straightforward issues at play that at first glance have very little to do with AI itself.
The token Claude had access to gave overly broad access. It’s common for cloud-based infrastructure providers like AWS or Azure to allow you to create tokens that are limited in what they do. This helps implement the principle of least privilege. The idea is that an actor in a system should be given access to what they need, and no more. The principle of least privilege reduces the impact if an inappropriate party gains access to the actor’s credentials, or if the actor themself goes rogue. Consider what happens if someone steals your hotel room key. They can get into your hotel room, which isn’t great, but they can’t get into anyone else’s. It seems that Railway has a limitation that its auth tokens cannot have their scope limited.
The second problem was that the credentials were stored on disk and had not expired. This makes the impact of the broadly scoped auth token much worse. Credentials should be time limited, so that if they are found later they cannot be used. If tokens are generated on demand, which could have been done in this specific case, then this particular issue could have been mitigated. Claude would have had to ask for a human to provide a credential—at which point, hopefully, the operator would have had a chance to work out what was going on.
I take minor issue with Jer’s assertion that Railway’s GraphQL API should have required a confirmation before deletion. This, to me, is a fundamental misunderstanding of what cloud APIs are for. APIs are there for automation; if you want a human-in-the-loop confirmation model, you have to build that yourself. This has always been the case. However, in the aftermath of an incident like this, we should give Jer a lot of leeway around his view of the problems, and some of Jeremy’s requests for how Railway should change appear to be very sensible (e.g. more clear SLAs, easier to scope tokens).
How Could These Issues Be Mitigated?
One obvious takeaway is to ensure that access tokens are more aggressively expired, but also made more limited in scope. This reduces the chance of Claude accessing something it shouldn’t. This would need to be solved on the Railway side, as they generate the token in the first place.
Unfortunately, having a more limited token for Claude isn’t a total fix for this scenario. Claude was given a token that limited its behavior, and went looking for a better token—and found it. This is not the first time I’ve heard of this happening; the same thing happened to a client of mine recently.
As our agents become more sophisticated, it seems that some sort of sandboxing is key. The production token was viewable by Claude, so it was used. Running agents in a restricted sandbox where they are only able to see parts of your filesystem would help greatly. However that also limits their usefulness.
Another option would be for the agent to ask for confirmation before it does something like delete data. It seems conceivable that having a human in the loop model when the agent has to escalate privileges could help. But again, if it gets access to an access token with broad scope, it won’t need to ask a human.
Finally, I’ve seen a lot of discussion about how the agent should “know” that deleting the data was bad, and that it should have checked first. This is a fundamental limitation of an LLM-based agent. It has no concept of causality. It cannot predict what will happen. There is a field of AI study known as world models, which could allow these agents to make more informed decisions. For example, a world model that understands physics would be able to predict that the egg would likely break if the egg was pushed from a table on to the concrete floor below. World models are used a lot in video generation and autonomous driving (where prediction of motion is key), but are sparsely used elsewhere.
AI Not To Blame?
I said just a moment ago that these issues seem to have little to do with AI. That isn’t entirely true.
In the recent DORA report on the state of AI-assisted Software Development, the authors noted that AI seems to be an amplifier: that AI-assisted software development tends to help good teams go faster, and slow teams go slower. Bad practices get encoded and done more. In the PocketOS and Railway situation, we have a set of credentials that were overly broad, with long-lived credentials stored on disc, combined with an apologetic AI agent doing something other than what was expected of it. If a human had made the same mistakes, they would have made them much more slowly, and may well have had the chance to work out their mistake part way through. AI works so fast that it can go more quickly in the wrong direction.
More importantly, unlike LLM-based AI, a human being has the chance to learn from experience, and for that learning to be rooted in a very specific, emotional response. When I first heard about the PocketOS story, I was brought back to a dim echo of that same horrific feeling I had in the midst of a major production issue that I had contributed to. Those feelings don’t leave you—those lessons don’t leave you. Every time I touched a production system, those memories were with me, and helped guide me towards more sensible working practices.