
Starting with this issue of Trends, we’ve moved from simply reporting on news that has caught our eye and instead have worked with Claude to look at the various news items we’ve collected and to reflect on what they tell us about the direction and magnitude of change. William Gibson famously wrote, “The future is here. It’s just not evenly distributed yet.” In the language of scenario planning, what we’re looking for is “news from the future” that will confirm or challenge our assumptions about the present.
AI has moved from a capability added to existing tools to an infrastructure layer present at every level of the computing stack. Models are now embedded in IDEs and tools for code review; tools that don’t embed AI directly are being reshaped to accommodate it. Agents are becoming managed infrastructure.
At the same time, two forces are reshaping the economics of AI. The cost of capable AI is falling. Laptop-class models now match last year’s cloud frontiers, and the break-even point against cloud API costs is measured in weeks. The competitive map has also fractured. What was a contest between a few Western labs is now a broad ecosystem of open source models, Chinese competitors, local deployments, and a growing set of forks and distributions. (Just look at the news that Cursor is fronting Kimi K2.5.) No single vendor or architecture is dominant, and that mix will drive both innovation and instability.
Security is a thread running through every section of this report. Each new AI capability reshapes the attack surface. AI tools can be poisoned, APIs repurposed, images forged, identities broken, and anonymous authors identified at scale. At the same time, foundational infrastructure faces threats that have nothing to do with AI: A researcher has come within striking distance of breaking SHA-256, the hashing algorithm underlying much of the web’s security. Organizations should audit both their AI-related exposures and the assumptions baked into the cryptographic infrastructure they depend on.
The technical transitions are easy to talk about. The human transitions are slower and harder to see. They include workforce restructuring, cognitive overload, and the erosion of collaborative work patterns. The job market data is beginning to clarify: Product management is up, AI roles are hot, and software engineering demand is recovering. The picture is more nuanced than either the optimists or the pessimists predicted.
AI models
The model market is moving fast enough that architectural and vendor commitments made today may not look right in six months. Capable models are now available from open source projects and a widening set of international competitors. The field is also starting to ask deeper questions. Predicting tokens may not be the only path to capable AI; the arrival of the first stable JEPA model suggests that alternative architectures are becoming real contenders. NVIDIA’s new model, which combines Mamba and Transformer layers, points in the same direction.
- Yann LeCun and his team have created LeWorldModel, the first model using his Joint Embedding Predictive Architecture (JEPA) that trains stably. Their goal is to produce models that do more than predict words; they understand the world and how it works.
- NVIDIA has released Nemotron 3 Super, its latest open weights model. It’s a mixture of experts model with 120B parameters, 12B parameters of which active at any time. What’s more interesting is its design: It combines both Mamba and Transformer layers.
- Gemini 3.1 Flash Live is a new speech model that’s designed to support real-time conversation. When generating output, it avoids gaps and uses human-like cadences.
- Cursor has released Composer 2, the next generation version of its IDE. Composer 2 apparently incorporates the Kimi K2.5 model. It reportedly beats Anthropic’s Opus 4.6 on some major coding benchmarks and is significantly less expensive.
- Mistral has released Forge, a system that enables organizations to build “frontier-grade” models based on their proprietary data. Forge supports pretraining, posttraining, and reinforcement learning.
- Mistral has also released Mistral Small 4, its new flagship multimodal model. Small 4 is a 119B mixture of experts model that uses 6B parameters for each token. It’s fully open source, has a 256K context window, and is optimized to minimize latency and maximize throughput.
- NVIDIA announced its own OpenClaw distribution, NemoClaw, which integrates OpenClaw into NVIDIA’s stack. Of course it claims to have improved security. And of course it does inference in the NVIDIA cloud.
- It’s not just OpenClaw; there’s also NanoClaw, Klaus, PiClaw, Kimi Claw and others. Some of these are clones, some of these are OpenClaw distros, and some are cloud services that run OpenClaw. Almost all of them claim improved security.
- Anthropic has announced that 1-million token context windows have reached general availability in Claude Opus 4.6 and Sonnet 4.6. There’s no additional charge for using a large window.
- Microsoft has released Phi-4-reasoning-vision-15B. It is a small open-weight model that combines reasoning with multimodal capabilities. They believe that the industry is trending toward smaller and faster models that can run locally.
- Tomasz Tunguz writes that Qwen3.5-9B can run on a laptop and has benchmark results comparable to December 2025’s frontier models. Compared to the cost of running frontier models in the cloud, a laptop running models locally will pay for itself in under a month.
- OpenAI has released GPT 5.4, which merges the Codex augmented coding model back into the product’s mainstream. It also incorporates a 1M token context window, computer use, and the ability to publish a plan that can be altered midcourse before taking action.
- TweetyBERT is a language model for birds. It breaks bird songs (they use canaries) into syllables without human annotation. They may be able to use this technique to understand how humans learn language.
- Vera is a new programming language that’s designed for AI to write. Unlike languages that are designed to be easy for humans, Vera is designed to help AI with aspects of programming that AIs find hard. Everything is explicit, state changes are declared, and every function has a contract.
- The Potato Prompt is a technique for getting GPT models to act as critics rather than yes-things. The idea is to create a custom instruction that tells GPT to be harshly critical when the word “potato” appears in the prompt. The technique would probably work with other models.
Software development
The tools arriving in early 2026 point toward a deep reorganization of the role of software developers. Writing code is becoming less important, while reviewing, directing, and taking accountability for AI-generated code is becoming more so. How to write good specifications, how to evaluate AI output, and how to preserve the context of a coding session for later audit are all skills teams will need. The ecosystem around the development toolchain is also shifting: OpenAI’s acquisition of Astral, the company behind the Python package manager uv, signals that AI labs are moving to control developer infrastructure, not just models.
- OpenAI has added Plugins to its coding agent Codex. Plugins “bundle skills, app integrations, and MCP servers into reusable workflows”; conceptually, they’re similar to Claude Skills.
- Stripe Projects gives you the ability to build and manage an AI stack from the command line. This includes setting up accounts, billing, managing keys, and many other details.
- Fyn is a fork of the widely used Python manager uv. It no doubt exists as a reaction to OpenAI’s acquisition of Astral, the company that developed and supports uv.
- Anthropic has announced Claude Code Channels, an experimental feature that allows users to communicate with Claude using Telegram or Discord. Channels is seen as a way to compete with OpenClaw.
- Claude Cowork Dispatch allows you to control Cowork from your phone. Claude runs on your computer, but you can assign it tasks from anywhere and receive notification via text when it’s done.
- Opencode is an open source AI coding agent. It can make use of most models, including free and local models; it can be used in a terminal, as a desktop application, or an extension to an IDE; it can run multiple agents in parallel; and it can be used in privacy-sensitive environments.
- Testing is changing, and for the better. AI can automate the repetitive parts, and humans can spend more time thinking about what quality really means. Read both parts of this two-part series.
- Claude Review does a code review on every pull request that Claude Code makes. Review is currently in research preview for Claude Teams and Claude Enterprise.
- Andrej Karpathy’s Autoresearch “automates the scientific method with AI agents.” He’s used it to run hundreds of machine learning experiments per night: running an experiment, getting the results, and modifying the code to create another experiment in a loop.
- Plumb is a new tool for keeping specifications, tests, and code in sync. It’s in its very early stages; it could be one of the most important tools in the spec-driven development tool chest.
- “How I Use AI Before the First Line of Code“: Prior to code generation, use AI to suggest and test ideas. It’s a tremendous help in the planning stage.
- Git has been around for 10 years. Is it the final word on version control, or are there better ways to think about software repositories? Manyana is an attempt to rethink version control, based on CRDTs (conflict-free replicated data types).
- Just committing code isn’t enough. When using AI, the session used to generate code should be part of the commit. git-memento is a Git extension that saves coding sessions as Markdown and commits them.
- sem is a set of tools for semantic versioning that integrates with Git. When you are doing a diff, you don’t really want to which lines changed; you want to know what functions changed, and how.
- Claude can now create interactive charts and diagrams.
- Clearance is an open source Markdown editor for macOS. Given the importance of Markdown files for working with Claude and other language models, a good editor is a welcome tool.
- The Google Workspace CLI provides a single command line interface for working with Google Workspace applications (including Google Docs, Sheets, Gmail, and of course Gemini). It’s currently experimental and unsupported.
- At the end of February, Anthropic announced a program that grants open source developers six months of Claude Max usage. Not to be left out, OpenAI has launched a program that gives open source developers six months of API credits for ChatGPT Pro with Codex.
- Here’s a Claude Code cheatsheet!
- Claude’s “import memory” feature allows you to move easily between different language models: You can pack up another model’s memory and import it into Claude.
Infrastructure and operations
Organizations should be thinking about agent governance now, before deployments reach a scale where the lack of governance becomes a problem. The AI landscape is moving from “Can we build this?” to “How do we run this reliably and safely?” The questions that defined the last year (Which model? Which framework?) are giving way to operational ones: How do we contain agents that behave unexpectedly? Where do we store their memory? How do we coordinate agents from multiple vendors? And when does it make sense to run them locally rather than in the cloud? Agents are also acquiring the ability to operate desktop applications directly, blurring the line between automation and user.
- Anthropic has extended its “computer use” feature so that it can control applications on users’ desktops (currently macOS only). It can open applications, use the mouse and keyboard, and complete partially done tasks.
- OpenAI has released Frontier, a platform for managing agents. Agents can come from any vendor. The goal is to allow business to organize and coordinate their AI efforts without siloing them by vendor.
- Most agents assume that memory looks like a filesystem. Mikiko Bazeley argues that filesystems aren’t the best option; they lack the indexes that databases have, which can be a performance penalty.
- Qwen-3-coder, Ollama, and Goose could replace agentic orchestration tools that use cloud-based models (Claude, GPT, Gemini) with a stack that runs locally.
- KubeVirt packages virtual machines as Kubernetes objects so that they can be managed together with containers.
- db9 is a command line-oriented Postgres that’s designed for talking to agents. In addition to working with database tables, it has features for job scheduling and using regular files.
- NanoClaw can now be installed inside Docker sandboxes with a single command. Running NanoClaw inside a container with its own VM makes it harder for the agent to escape and run malicious commands.
Security
This issue has an unusually heavy security section, and not only because AI keeps expanding the attack surface. A researcher has come close to breaking SHA-256, the hashing algorithm that underpins SSL, Bitcoin, and much of the web’s security infrastructure. If hash collisions become possible in the coming months as predicted, the implications will reach every organization that relies on the internet. At the same time, AI systems are now capable of gaming their own benchmarks, and the pace of new attack techniques is outrunning the pace of security review.
- A researcher has come close to breaking the SHA-256 hashing algorithm. While it’s not yet possible to generate hash collisions, he expects that capability is only a few months away. SHA-256 is critical to web security (SSL), cryptocurrency (Bitcoin), and many other applications.
- When running the BrowseComp benchmark, Claude hypothesized that it was being tested, found the benchmark’s encrypted answer key on GitHub, decrypted the answers, and used them.
- Anthropic has added auto mode to Claude, a safer alternative to the “dangerously skip permissions” option. Auto mode uses a classifier to determine whether actions are safe before executing them and allows the user to switch between different sets of permissions.
- In an interview, Linux kernel maintainer Greg Kroah-Hartman said that the quality of bug and security reports for the Linux kernel has suddenly improved. It’s likely that improved AI tools for analyzing code are responsible.
- A new kind of supply chain attack is infecting GitHub repositories and others. It uses Unicode characters that don’t have a visual representation but are still meaningful to compilers and interpreters.
- AirSnitch is a new attack against WiFi. It uses layers 1 and 2 of the protocol stack to bypass encryption rather than breaking it.
- Anthropic’s red team worked with Mozilla to discover and fix 22 security-related bugs and 90 other bugs in Firefox.
- Microsoft has coined the term “AI recommendation poisoning” to refer to a common attack in which a “Summarize with AI” button attempts to add commands to the model’s persistent memory. Those commands will cause it to recommend the company’s products in the future.
- Deepfakes are now being used to attack identity systems.
- LLMs can do an excellent job of de-anonymization, figuring out who wrote anonymous posts. And they can do it at scale. Are we surprised?
- It used to be safe to expose Google API keys for services like Maps in code. But with AI in the picture, these keys are no longer safe; they can be used as credentials for Google’s AI assistant, letting bad actors use Gemini to steal private data.
- WIth AI, it’s easy to create fake satellite images. These images could be designed to have an effect military operations.
People and organizations
The workforce implications of AI are more complicated than either the optimistic or pessimistic predictions suggest. The cognitive load on individuals is increasing, and the collaborative habits that distribute that load across a team are eroding. Managers should track not just velocity but sustainability. The skills that AI cannot replace, including judgment, communication, and the ability to ask the right question before writing a single line of code, are becoming more valuable. And the volume of AI-generated content is now large enough that organizations built around reviewing submissions, including app stores, publications, and academic journals, are struggling to keep up with it.
- Lenny Rachitsky’s report on the job market goes against this era’s received wisdom. Product manager positions are at the highest level in years. Demand for software engineers cratered in 2022, but has been rising steadily since. Recruiters are heavily in demand; and AI jobs are on fire.
- Apple’s app store, along with many other app stores and publications of all sorts, is fighting a “war on slop“: deluges of AI-generated submissions that swamp their ability to review.
- Teams of software developers can be smaller and work faster because AI reduces the need for human coordination and communications. The question becomes “How many agents can one developer manage?” But also be aware of burnout and the AI vampire.
- Brandon Lepine, Juho Kim, Pamela Mishkin, and Matthew Beane measure cognitive overload, which develops from the interaction between a model and its user. Prompts are imprecise by nature; the LLM produces output that reflects the prompt but may not be what the user really wanted; and getting back on track is difficult.
- A study claims that the use of GitHub Copilot is correlated with less time spent on management activities, less time spent collaboration, and more on individual coding. It’s unclear how this generalizes to tools like Claude Code.
Web
- The 49MB Web Page documents the way many websites—particularly news sites—make user experience miserable. It’s a microscopic view of enshittification.
- Simon Willison has created a tool that writes a profile of Hacker News users based on their comments, all of which are publicly available through the Hacker News API. It is, as he says, “a little creepy.”
- A personal digital twin is an excellent way to augment your abilities. Tom’s Guide shows you how to make one.
- It’s been a long time since we’ve pointed to a masterpiece of web play. Here’s Ball Pool: interactive, with realistic physics and lighting. It will waste your time (but probably not too much of it).
- Want interactive XKCD? You’ve got it.