The Clique: Issue 005

30th April 2026

James Wilkins

Apr 30, 2026

There are a lot of new faces here this week, so a warm welcome if this is your first issue. Welcome back to The Clique.

This week:

One Thing I’ve Been Thinking About: using AI for a colour analysis, and what it shows about where image generation has actually got to
Three Stories: Claude’s new everyday connectors, a new approach to making AI more honest about what it doesn’t know, and a busy week for the big three
One Thing Worth Trying: a technique called self-consistency
In Other News: Copilot in Word and Excel, Deezer’s AI music numbers, the Claude Mythos story, and more

Feel free to skip around to whatever piques your interest.

1. One Thing I’ve Been Thinking About

This was all over my feeds last week - a colour analysis using ChatGPT’s new image generation model. You upload a photo of yourself and ask the model to work out what colours and temperatures suit your complexion, then visualise the results as an image.

Here’s the prompt I used, but feel free to adapt it.

Conduct a colour analysis of me. I've attached a few different photos in different lighting and different angles. I want to know what colours and temperatures will work best for my complexion. Visualise the results as an image.

I tried it a few times, and the results were mixed but fairly impressive overall. My initial attempt used only a single uploaded photo, but after trying again with multiple reference images, I think I prefer the second one. See for yourself:

Single Photo Upload

5 input photos

I also tried out a hairstyle analysis to see what it would give me.

Now, I don’t know how useful this really is. That’s probably up to you to decide. Nevertheless, I do find it interesting for two reasons:

a) Those photos are clearly supposed to be me, but something’s very slightly off, and I can’t quite put my finger on it. Image generation tools have improved astronomically, but we’re still not quite out of the uncanny valley of image reproduction, particularly for reproducing a specific person’s face. Obviously, if you have far more training data, it can do a better job (look no further than the deep fake), but for everyday image generation tools, it’s still subtly off.

b) The text. These outputs are perfect infographics with a clear layout, legible and accurately rendered text, and a nice design/composition. Compare that to a (deliberately) AI-generated infographic I used for an article in January 2025, about 16 months ago:

The older one got the title roughly right, and at the time, I remember thinking that was impressive. Side by side with what the same kind of prompt produces now, you would not believe they came from the same technology. In-image text has come a long way in a short time, and it does not get talked about nearly as much as it deserves.

2. Three Stories That Actually Matter

i. With Spotify, AllTrails and Uber now in the mix, Claude’s connector directory has crossed into territory that affects daily life

The directory of apps and services that connect to Claude, which launched last summer, focused on work tools. It has since expanded to cover the apps most people actually use outside the workplace.

The new additions include AllTrails, Spotify, Uber, Booking.com, and more. The directory as a whole has grown to over 200 connectors. The way it works has also changed. Claude now suggests the relevant connector for whatever you are doing, without you having to navigate a menu. Ask about a weekend hike, and AllTrails surfaces nearby options matching your preferences. Refine by distance, difficulty or whether you are bringing your dog, all in the same conversation.

Spotify is a particularly interesting one to explore right now. The platform recently launched guided fitness content for Premium subscribers, including Peloton classes (no equipment required) alongside content from a range of wellness creators. With the Spotify connector active, asking Claude to help plan a workout week can surface relevant classes directly in the conversation.

It is worth having a browse of what is available at claude.ai/directory/connectors. Once a service is connected, Claude suggests it when relevant in future conversations. Connecting a service gives Claude access on your behalf, without your data being used for training and without the connected app seeing your other conversations. Available on all plans, with mobile access currently in beta.

Sources: Anthropic · Spotify Newsroom

ii. Researchers have found a way to train AI models to express genuine uncertainty, reducing overconfidence by up to 90%

This is not a new problem. AI models have always had a tendency to express confidence, whether they are right or essentially guessing, giving users no reliable indication of whether they should seek a second opinion. The AI Sycophancy story I reported on in issue 001 is another presentation of the same problem. Stanford researchers recently found that 11 major models agree with users 49% more often than humans would, even on harmful or illegal behaviour.

In both cases, it’s largely down to training incentives. Current training methods reward getting the right answer. If you guess on a true-false question, you’ll get it correct 50% of the time on average. If you say ‘I don’t know’, you’ll always be wrong. There’s nothing there to teach a model when it should not appear confident.

Researchers at MIT’s computer science lab have published a method that targets this directly. Their training approach adds a term to the reward function that penalises the gap between how confident a model sounds and how accurate it actually is. If a model expresses near-certainty and turns out to be right only half the time, it gets penalised. So does being unnecessarily hesitant about things it does actually know.

Across a range of tasks, including several the model had never seen during training, the method reduced the gap between stated confidence and actual accuracy by up to 90%, with no loss in performance.

Models start to learn to match what they say to what they genuinely ‘believe’.

I’m looking forward to seeing the results of this implemented in frontier models. Although they have improved, it’s definitely an avenue in which they still have room to grow. In the meantime, see the practical takeaway below for some ways we can combat inaccurate results.

Sources: MIT CSAIL · Stanford News

iii. A busy week for big releases: GPT-5.5, two new agent platforms, and a pace from the major AI labs that shows no sign of slowing

GPT-5.5, OpenAI’s latest model, became available this week. It handles multi-step and longer-form tasks more reliably than its predecessor, matches it on speed, and completes equivalent jobs using fewer computational steps. It is available now across OpenAI’s products and via the API.

The second release from OpenAI is arguably more interesting for teams. Workspace Agents let organisations build AI assistants that handle specific recurring tasks automatically, running in the background in ChatGPT and Slack. You describe a job in plain language, and the agent handles it on a schedule without further prompting. Teams build once, share across the organisation, and agents can be improved as they are used. It is in research preview for business, enterprise and education plans.

Google released Gemini Enterprise Agent Platform: Infrastructure for organisations that want to build and manage fleets of AI agents across a business. Where OpenAI’s tool is aimed at teams setting up specific workflows, Google’s is aimed at technical teams who need oversight across many agents at once, with built-in tools for access control, security monitoring, performance tracking and testing before deployment. Both products are aimed at organisations, rather than individual users.

What stands out, looking at the week as a whole, is the timing. GPT-5.5, GPT Image 2.0, Workspace Agents and the Gemini Enterprise Agent Platform all arrived within a few days. Just last week, we had Claude Opus 4.7 and Claude Design, and Claude Managed Agents weren’t long before that. The pace of meaningful releases from the major AI companies has accelerated noticeably this year and shows no sign of slowing.

Sources: OpenAI (GPT-5.5) · OpenAI (Workspace Agents) · Google Cloud

3. From the Blog

Quite a busy one this week, so nothing new from me.

4. One Thing Worth Trying

Self-consistency: ask the question more than once

Self-consistency is a prompting method for improving the reliability of AI answers, particularly when accuracy matters. Instead of asking a question once, you ask it several times (and ideally from slightly different angles) and look at where the responses agree. AI models give probabilistic answers, meaning the same question can produce different responses on different runs. Where answers converge across multiple attempts, you have more reason to trust them. Where they diverge, that is an indication to dig further or verify independently.

I cover this alongside several other techniques that consistently outperform the common advice to “just ask it to think step by step” in my article on how to actually improve the accuracy of AI answers. A deeper dive into self-consistency specifically is coming soon.

5. In Other News

There’s one piece of research this week that didn’t make the cut for a top 3 stories, but I think deserves some recognition, as I found it particularly interesting.

Research into AI self-preferencing in automated hiring found that AI-written CVs are rated more favourably by AI screening tools than human-written ones, and models consistently preferred CVs they themselves had generated; candidates who used the same model as the one doing the screening were 23% to 60% more likely to be shortlisted than equally qualified applicants with human-written CVs.

The rest, however, are in no particular order...

Copilot’s agentic features in Word, Excel and PowerPoint are now the default experience for Microsoft 365 subscribers, allowing multi-step in-app actions across all three tools; Microsoft reports engagement in Excel up 67% and in Word up 52% since the rollout.
Google launched a set of workplace AI tools built directly into Chrome for enterprise users, including a browser-integrated assistant and tools for managing context across open tabs.
44% of all new music uploaded to Deezer is now AI-generated, according to figures published by the platform, a share that has risen sharply over the past year.
YouTube has expanded its likeness detection tools to the entertainment industry, making it easier for artists and performers to flag AI-generated content that uses their face or voice without consent.
Anthropic confirmed it is investigating an unauthorised access claim related to its Claude Mythos model, which is not publicly available; a community-built Mythos access tracker appeared independently around the same time.
OpenAI launched a dedicated ChatGPT plan for healthcare professionals, with features built around clinical documentation and patient communication.
Taylor Swift has filed to trademark her voice and likeness, apparently as a pre-emptive measure against AI-generated content using her identity without authorisation.
DeepSeek published V4 Pro, an open-weight model positioned to compete with frontier proprietary models on reasoning and coding tasks, available to run without API access.
OpenAI launched Privacy Filter, an open-source, on-device model that strips personal data from enterprise datasets before processing, aimed at organisations with data compliance requirements.
OpenAI published an updated set of company principles, drawing considerable commentary given its timing relative to other news about the company this week.
Wan 3.0 is a new open-source video generation model that produces high-quality output and can be run without API access, adding to a growing set of capable open-weight video tools.
Kimi K2.6, the latest open-source model from Moonshot AI, posted competitive results on coding benchmarks, continuing the pattern of open-weight models closing the gap on proprietary alternatives.
Claude’s shared workspace, Cowork, can now generate interactive charts and diagrams directly in conversations, a capability that previously required a separate tool.
Instagram Instants is a new feature that lets users share real-time, unfiltered photos without the usual editing options, part of a broader pattern of platforms nudging users back towards less curated content.
Sam Altman posted about changes to OpenAI’s rate limit reset scheme, giving heavy ChatGPT users more flexibility in how their usage is distributed across the day.
Anthropic published a postmortem after reports of quality drops in Claude Code and separately launched Ultrareview, an automated bug-finding tool built into Claude Code for developers.
The system prompts used with Claude Opus 4.7 are now publicly documented, useful reference material for anyone building applications with the model.

If you made it this far, I appreciate you!
Stay curious,

James

Enjoyed this issue? Consider forwarding to a friend or colleague!

Hey, look, there’s even a little button for it and everything 👇

James Wilkins

Discussion about this post

Ready for more?