AI Pulse

Coding Best right now

Claude Opus 4.7

Anthropic

Default to Claude Opus 4.7 inside Claude Code for repo-scale edits. Codex is the strongest first-party OpenAI alternative for repo-scale work; pick Cursor instead when you need multi-model routing in the same workflow.

Image / Video Best right now

Sora 2

OpenAI

Sora 2 for video, Midjourney v7 for still images. Imagen 4 inside Gemini 3 is the best one-surface answer when you want both from a single account.

Writing / Agents Best right now

Claude Opus 4.7

Anthropic

Claude Opus 4.7 leads for long-form editorial and stable agent loops. GPT-5.5 wins when you need broader tool integrations, especially around Microsoft and OpenAI surfaces.

Latest releases

10 entries · newest first

12 May Capability

Claude Code adds long-running background agents

Anthropic enables background agent tasks up to 60 minutes inside Claude Code.

Anthropic · Claude Opus 4.7
9 May Capability

GPT-5.5 picks up custom tool registries

OpenAI exposes shareable tool registries for the Responses API.

OpenAI · GPT-5.5
6 May Version bump

Gemini 3 Pro routes to Imagen 4 by default

Image generation inside Gemini 3 switches to Imagen 4 with stricter prompt adherence.

Google DeepMind · Gemini 3 Pro
3 May Capability

DeepSeek R2 adds parallel tool calls

R2 now batches independent tool calls in a single step.

DeepSeek · DeepSeek R2
28 Apr New model

xAI releases Grok 4 mini

Smaller Grok 4 variant aimed at sub-second response latency.

xAI · Grok 4
22 Apr New model

Gemini 3 Pro ships

Google replaces Gemini 2.5 Pro with Gemini 3 Pro across API and Vertex.

Google DeepMind · Gemini 3 Pro
15 Apr New model

DeepSeek R2 launches

Reasoning successor to R1, priced 60% below the GPT-5.5 reasoning tier.

DeepSeek · DeepSeek R2
10 Apr New model

Z.ai opens GLM-5 weights

GLM-5 ships under a permissive commercial license.

Z.ai · GLM-5
2 Apr New model

OpenAI ships GPT-5.5

GPT-5.5 replaces GPT-5 as the default model on chatgpt.com and the API.

OpenAI · GPT-5.5
30 Mar New model

Microsoft rebrands its assistant as Copilot Spark

Spark consolidates the Microsoft 365 and Windows assistant into one surface.

Microsoft · Copilot Spark

Comparison matrix

Sort and filter; scores are within a bucket, not across.

Sort by

Provider

Model	Provider	Code	Image / Video	Writing / Agents	Released
Claude Opus 4.7Default for coding and agent loops in Claude Code.	Anthropic	92 Leads SWE-Bench Verified and holds the longest reliable agent loops in coding.↗	—	90 Top-tier long-form prose, strongest tool-use compliance across published evals.↗	Mar 2026
GPT-5.5Strong all-rounder; image generation via integrated DALL-E successor.	OpenAI	87 Competitive on SWE-Bench, weaker on multi-file refactors than Claude Opus 4.7.↗	82 Best general-purpose image generation; video remains Sora-2 surface.↗	86 Reliable agent runner, strong function calling, slightly looser editorial voice.↗	Apr 2026
Sora 2Video-first; 60s coherent clips, audio bed.	OpenAI	—	91 Best long-form video coherence and prompt adherence in published comparisons.↗	—	Feb 2026
Gemini 3 ProStrong multimodal context, integrated Imagen 4 and Veo 3 routing.	Google DeepMind	84 Good repo-scale reasoning, behind Claude on agent loop stability.↗	88 Imagen 4 + Veo 3 combination is the most consistent image-plus-video pair.↗	84 Long-context wins on research-style tasks; weaker tool-call discipline.↗	Apr 2026
Copilot SparkWraps GPT-5.5 with Microsoft IDE and Office tooling.	Microsoft	83 Best IDE-integrated experience; raw model behind Claude on hard tasks.↗	70 Uses GPT image generation under the hood; trails dedicated image leaders.↗	80 Solid for Office-tethered workflows; less flexible as a standalone agent.↗	Mar 2026
Grok 4Real-time X integration, looser safety posture.	xAI	78 Improving fast; SWE-Bench still trails the top three.↗	74 Aurora image gen is competent; no first-party long-form video.↗	76 Strong with real-time data, weaker on stable long-form structure.↗	Jan 2026
Llama 4 405BOpen-weights flagship; the practical pick for self-hosting.	Meta	80 Best open-weights coder; closes the gap to GPT-5.5 on SWE-Bench Lite.↗	68 Companion Emu 3 model lags Imagen 4 and Midjourney 7.↗	78 Long-form is verbose but stable; tool-use is recent and improving.↗	Feb 2026
DeepSeek R2Cheapest top-tier reasoning model in the matrix.	DeepSeek	85 Excellent on competitive-programming and SWE-Bench Lite; weaker on long agent loops.↗	—	80 Strong reasoning, terser prose; cost-effective for batch agents.↗	Apr 2026
Mistral Large 3Strong European hosting and data-residency story.	Mistral	76 Improved on Large 2; still behind the top three on agent loops.↗	—	79 Tight, neutral prose; tool use is reliable but not best-in-class.↗	Mar 2026
GLM-5Open-weights, strong on Chinese-language tasks.	Z.ai	74 Competitive on HumanEval; weaker on English-language repo edits.↗	—	72 Solid agent runner; English long-form trails the top tier.↗	Apr 2026
Kimi K2Long-context specialist; up to 2M tokens.	Moonshot	73 Long-context wins on repo-wide reading; raw editing trails the leaders.↗	—	78 Excellent at synthesising large corpora; loose prose voice.↗	Mar 2026
Midjourney v7Image only; no chat or tool use.	Midjourney	—	90 Best aesthetic image fidelity; weaker on strict prompt adherence.↗	—	Feb 2026
Runway Gen-4Video editing primitives plus generation.	Runway	—	86 Strong on directed edits and motion control; behind Sora 2 on raw coherence.↗	—	Mar 2026

Coding agents

IDE wrappers and CLI agents. SWE-Bench Verified, higher is better.

1

Codex

OpenAI · default model: GPT-5.5 / GPT-5-Codex

OpenAI's first-party coding agent for CLI, IDE, cloud, and GitHub workflows.

75 SWE-B
2

Claude Code

Anthropic · default model: Claude Opus 4.7

Strongest agentic SWE-Bench results in the matrix.

71 SWE-B
3

Cursor

Anysphere · default model: GPT-5.5 / Claude Opus 4.7

Best multi-model IDE; lets you swap providers per task.

64 SWE-B
4

GitHub Copilot

GitHub · default model: GPT-5.5

Tight VS Code and PR-review integration.

58 SWE-B
5

Aider

Open source · default model: Pluggable

Best terminal-native option; strongest cost control.

55 SWE-B
6

Cline

Open source · default model: Pluggable

VS Code extension with explicit plan/act split.

52 SWE-B
7

Roo Code

Open source · default model: Pluggable

Cline fork with multi-mode workflow.

50 SWE-B

Methodology

Directional ratings curated from public benchmarks, model cards, and hands-on use. Scores are 0–100 within a bucket, not across buckets. Each rating links to the primary evidence. Refreshed after any model release. Not authoritative; AI-assisted editorial.

Scores are within a bucket. A 90 in code is not comparable to a 90 in image and video.
Each rating links to the primary evidence. Where no link exists, treat the score as editorial judgment.
This section is AI-assisted editorial. It is decision support, not an authoritative leaderboard.

Read the full methodology →