The Clique: Issue 003

16th April 2026

Apr 16, 2026

Everyone’s getting in on the AI action now. Even my AI-ophobe Grandad has started sending me AI-generated pictures of his own model aircraft put into a real-world scene. If he can do it, so can you. Welcome back to the Clique.

This week:

One Thing I’ve Been Thinking About: the YouTubification of software
Three Stories: Stanford’s annual AI reckoning, how researchers are mapping the ways agents can be misled online, and agent infrastructure as a managed service
One Thing Worth Trying: Gemini’s interactive visualisations
In Other News: Shopify resets the hiring bar, Perplexity goes financial, Meta ships its first new model, and much more…

1. One Thing I’ve Been Thinking About

We may be witnessing the “Youtubification” of software, and it’s opening the door to a fascinating new era. What follows is my own speculation.

Historically, software creation was an institutional, gate-kept process. If you wanted to build something substantial, you needed a massive team, a large budget, and significant technical overhead. Now, AI is doing for code exactly what YouTube did for video: democratizing creation. What a single developer can build in an afternoon may previously have required a dozen people and months of work. The nature of software shifts from a “static, long-term subscription” to something more modular and commodity-like.

We may be moving away from “Software as a Service” models and toward potentially something like TikTok, but for apps. This isn’t necessarily a loss for the industry but a potential next step. We are entering a period of incredible competition where software shelf life will shrink, and the focus will shift from the act of writing code to the art of orchestrating it. The winners in this new world will be the marketplace platforms that make it simple to discover, deploy, and maintain these bite-sized, AI-generated tools.

We’ve finally reached a point where the bottleneck is no longer the ability to build, but the ability to decide what is actually worth building. It’s an exciting time to be an architect of solutions, regardless of how much code you write yourself.

2. Three Stories That Actually Matter

i. Stanford’s annual AI check-in: experts are optimistic, the rest of us are not

Stanford’s AI Index is the closest thing we have to a yearly ground truth on where AI actually stands. The 2026 edition was published this week, ran to almost five hundred pages, and has a lot of numbers in it. The headline is that capability is moving faster than almost any benchmark predicted twelve months ago.

AI agents achieved a 66% success rate on OSWorld, a benchmark that tests autonomous computer use, up from 12% a year ago. On SWE-bench (which tests coding and bug-fixing), scores jumped from 60% to near 100% in the same window. However, the models that won gold at the International Mathematical Olympiad this year correctly read an analogue clock only 50.1% of the time. Capability gains remain uneven in ways that are hard to predict from the outside.

The part of the report that deserves more attention is the gap between how AI experts and the general public feel about all of this. 73% of experts expect AI to have a positive impact on how people do their jobs. Among the general public, that figure is 23% - a 50-point gap. On healthcare, 84% of experts said AI would largely benefit medical care over the next 20 years; only 44% of the US public agreed. 64% of Americans believe AI will lead to fewer jobs over the same period. Among Gen Z, the generation most likely to be building careers alongside these tools, the share who describe themselves as excited about AI fell from 36% in 2025 to 22% this year, while the share feeling angry rose from 22% to 31%.

The accountability data makes this harder to ignore. Documented AI incidents rose from 233 in 2024 to 362 in 2025. The average score on the Foundation Model Transparency Index (which measures how openly companies disclose training data, compute, known risks, and policies) dropped from 58 to 40 out of 100. Expert confidence and capability are accelerating together. Public trust is not keeping up, and the transparency numbers suggest the gap is being made wider, not narrower.

Sources: Stanford HAI · TechCrunch · The Decoder · KQED

ii. Google DeepMind has started mapping the ways agents get misled on the open web

As AI agents are given broader access to the internet, researchers are starting to document the ways they can be led astray. A paper from Google DeepMind, published this week, identifies six categories of what the authors call “agent traps”. These are patterns in web content that can cause agents to behave in ways their users didn’t intend.

The most interesting examples are the ones that exploit the gap between what a human sees and what an agent processes. A browser renders a page visually. An agent reads the underlying structure, including HTML comments, image metadata, and accessibility tags that never appear on screen. Instructions embedded in those invisible layers are readable to an agent but hidden from the person using it.

Agents with persistent memory face a compounding version of this problem. If misleading content makes it into an agent’s long-term knowledge base, it can shape how that agent responds to all future queries. The paper also documents a real-world case involving Microsoft’s M365 Copilot, where a manipulated email caused it to handle information in unintended ways.

The researchers’ proposed mitigations are practical rather than anything exotic. Tighter permission scopes, monitoring what agents output before they act on it, and scanning content at the point where agents interact with external sources.

The paper reads less as an alarm and more as an attempt to formalise what the field already knows is a risk, so that it can be designed against. For anyone using agents on a daily basis, it’s more important than ever to maintain oversight of your agents, control what they have access to, and what they have permission to do.

Sources: The Decoder · SecurityWeek · SSRN (paper)

iii. Anthropic turns the hard part of deploying AI agents into a managed service

Building an AI agent that works in a demo is one kind of problem. Getting it to run reliably in production is another.

Imagine an AI that handles a support ticket from start to finish. It receives the message, looks up the customer’s account, checks the relevant policy, drafts a response, and updates the record. Every step in that chain needs to stay connected; the agent needs to track its position if one step takes time, and the whole thing needs to keep working when something fails partway through. Multiply that across hundreds of tickets a day, and you start to see why teams previously spent months building infrastructure before it could be shipped.

Claude Managed Agents is that infrastructure, provided as a service. It handles session persistence, failure recovery, security, permissions, and audit logging. Companies already shipping on top of it include Notion, Rakuten, Asana, and Sentry. Vibecode, a development platform built on it, reports users deploying applications at least ten times faster than before. It’s in public beta now on the Claude Platform.

Sources: Anthropic Blog

3. From the Blog

How to Rot your brain with AI - Anthropic’s research shows the most experienced AI users refuse to let it replace their critical thinking. It can be easy to let AI erode the very skills we rely on it to replace. Of course, if that’s your goal, this guide is for you!

4. One Thing Worth Trying

Gemini’s interactive visualisations - gemini.google.com

Most AI tools return text. Occasionally, a static image. Gemini Pro can now produce interactive visualisations, physics simulations, and 3D models that live inside the chat window and respond to input in real time.

Google’s own examples give a sense of what this looks like in practice. Ask how a stable orbit forms, and Gemini generates a simulation where you can adjust gravity and initial velocity with sliders and watch what happens to the orbit. Ask about fractals, and you can explore the structure directly, zooming in, rotating, and varying the parameters.

There’s a difference between knowing a fact and understanding a concept. Reading that increasing gravity pulls an orbit tighter is fine, but having a system you can play around with is a much more effective way to build a mental model of the system and deepen your understanding.

The feature is available to Gemini Pro users. But for the rest of us, there’s no reason we can’t achieve something similar with a prompt-based workaround.

5. In Other News

And in no particular order…

Shopify’s CEO sent a company-wide memo requiring teams to demonstrate that a task cannot be done by AI before any new headcount request will be approved - the first major company to make AI capability assessment a formal precondition to hiring.
Meta Superintelligence Labs shipped its first model, Muse Spark, a multimodal reasoning model with tool use, visual chain of thought, and a specialised health capability developed with input from over a thousand physicians. Available now at meta.ai.
Notion custom agents for teams, with four types covering Q&A, task routing, reporting, and fully custom workflows, designed to run continuously and handle repeating work without human hand-offs.
Claude Cowork is now generally available on all Claude paid plans, with enterprise controls including role-based access, per-team spend limits, usage analytics, and a new Zoom connector that pulls meeting summaries directly into Cowork workflows.
Perplexity has integrated with Plaid to give its AI a real-time view of your personal finances, allowing users to query their actual spending and balances in the chat interface. This is one of the first AI assistants with live access to financial data rather than summaries you paste in.
ChatGPT has announced a partnership with Upwork, embedding AI into the freelance hiring platform in a way that is expected to change how clients scope work and evaluate proposals.
Anthropic’s Claude is being integrated into Microsoft Word, with early adoption focused on legal teams using it to draft, review, and compare documents inside the tools they already work in.
OpenAI published a child safety blueprint outlining its framework for preventing the generation of harmful content involving minors across its products and the broader AI ecosystem.
Google AI Overviews may be hallucinating tens of millions of times per day - the individual error rate sounds manageable until you factor in the volume of searches it handles, at which point the scale of the problem looks different.
Gemini’s new Notebooks feature connects the Gemini app with NotebookLM as a shared project base, letting you move between AI-assisted research and chat without losing context between sessions.
Spotify’s prompted playlist feature now works for podcasts, letting users describe what they want to listen to in plain language and have a playlist generated.
Google quietly released an offline AI dictation app for iOS powered by Gemma models, with voice-to-text transcription running entirely on-device. No internet connection required.
Gemini AI skills are now available in Google Chrome, adding AI-assisted actions directly in the browser for summarising pages, drafting replies, and interacting with on-screen content.
Claude Code’s desktop interface has been redesigned, with a new layout that makes the agent’s ongoing work easier to follow during long sessions.
Willow Voice is an AI voice dictation tool for Mac, Windows, and iPhone that adapts to your writing style and context as you speak. Worth knowing about if you find typing a bottleneck in your day.
Clicky is a Mac AI companion that sits next to your cursor, sees what’s on your screen, and can be asked questions or given tasks via keyboard shortcut. It’s designed for in-context help across any application.

The Stanford data on public trust is really worth thinking about. The difference between how experts feel about AI and how everyone else feels about it seems like a reflection of how these tools are perceived in our daily lives. It’s a communication problem that these companies are going to have to put some work into addressing.

Most people are not yet in a position to judge AI on their own terms, which is exactly when anxiety tends to fill the blanks. The research in this week’s blog post is a useful reminder that the answer to that is not to hand more over to the AI, but to stay in the loop with it. The people getting the most out of these tools are the ones who keep showing up.

If you made it this far, I appreciate you!
Stay curious,

James

Enjoyed this issue? Consider forwarding to a friend or colleague!

Hey, look, there’s even a little button for it and everything 👇

James Wilkins

Discussion about this post

Ready for more?