Skip to main content
Tech AI Pulse

AI Pulse

Directional ratings for the AI models people are actually using right now. Three buckets — code, image and video, writing and agents — refreshed after every release.

Updated 19 May 2026 13 models tracked Methodology
Coding Best right now

Claude Opus 4.7

Anthropic

Default to Claude Opus 4.7 inside Claude Code for repo-scale edits. Codex is the strongest first-party OpenAI alternative for repo-scale work; pick Cursor instead when you need multi-model routing in the same workflow.

Runner-up GPT-5.5
Image / Video Best right now

Sora 2

OpenAI

Sora 2 for video, Midjourney v7 for still images. Imagen 4 inside Gemini 3 is the best one-surface answer when you want both from a single account.

Runner-up Midjourney v7
Writing / Agents Best right now

Claude Opus 4.7

Anthropic

Claude Opus 4.7 leads for long-form editorial and stable agent loops. GPT-5.5 wins when you need broader tool integrations, especially around Microsoft and OpenAI surfaces.

Runner-up GPT-5.5

Latest releases

10 entries · newest first
  1. Capability

    Claude Code adds long-running background agents

    Anthropic enables background agent tasks up to 60 minutes inside Claude Code.

    Anthropic · Claude Opus 4.7

  2. Capability

    GPT-5.5 picks up custom tool registries

    OpenAI exposes shareable tool registries for the Responses API.

    OpenAI · GPT-5.5

  3. Version bump

    Gemini 3 Pro routes to Imagen 4 by default

    Image generation inside Gemini 3 switches to Imagen 4 with stricter prompt adherence.

    Google DeepMind · Gemini 3 Pro

  4. Capability

    DeepSeek R2 adds parallel tool calls

    R2 now batches independent tool calls in a single step.

    DeepSeek · DeepSeek R2

  5. New model

    xAI releases Grok 4 mini

    Smaller Grok 4 variant aimed at sub-second response latency.

    xAI · Grok 4

  6. New model

    Gemini 3 Pro ships

    Google replaces Gemini 2.5 Pro with Gemini 3 Pro across API and Vertex.

    Google DeepMind · Gemini 3 Pro

  7. New model

    DeepSeek R2 launches

    Reasoning successor to R1, priced 60% below the GPT-5.5 reasoning tier.

    DeepSeek · DeepSeek R2

  8. New model

    Z.ai opens GLM-5 weights

    GLM-5 ships under a permissive commercial license.

    Z.ai · GLM-5

  9. New model

    OpenAI ships GPT-5.5

    GPT-5.5 replaces GPT-5 as the default model on chatgpt.com and the API.

    OpenAI · GPT-5.5

  10. New model

    Microsoft rebrands its assistant as Copilot Spark

    Spark consolidates the Microsoft 365 and Windows assistant into one surface.

    Microsoft · Copilot Spark

Comparison matrix

Sort and filter; scores are within a bucket, not across.
Model Provider Code Image / Video Writing / Agents Released
Claude Opus 4.7Default for coding and agent loops in Claude Code. Anthropic 92 Leads SWE-Bench Verified and holds the longest reliable agent loops in coding. 90 Top-tier long-form prose, strongest tool-use compliance across published evals.
GPT-5.5Strong all-rounder; image generation via integrated DALL-E successor. OpenAI 87 Competitive on SWE-Bench, weaker on multi-file refactors than Claude Opus 4.7. 82 Best general-purpose image generation; video remains Sora-2 surface. 86 Reliable agent runner, strong function calling, slightly looser editorial voice.
Sora 2Video-first; 60s coherent clips, audio bed. OpenAI 91 Best long-form video coherence and prompt adherence in published comparisons.
Gemini 3 ProStrong multimodal context, integrated Imagen 4 and Veo 3 routing. Google DeepMind 84 Good repo-scale reasoning, behind Claude on agent loop stability. 88 Imagen 4 + Veo 3 combination is the most consistent image-plus-video pair. 84 Long-context wins on research-style tasks; weaker tool-call discipline.
Copilot SparkWraps GPT-5.5 with Microsoft IDE and Office tooling. Microsoft 83 Best IDE-integrated experience; raw model behind Claude on hard tasks. 70 Uses GPT image generation under the hood; trails dedicated image leaders. 80 Solid for Office-tethered workflows; less flexible as a standalone agent.
Grok 4Real-time X integration, looser safety posture. xAI 78 Improving fast; SWE-Bench still trails the top three. 74 Aurora image gen is competent; no first-party long-form video. 76 Strong with real-time data, weaker on stable long-form structure.
Llama 4 405BOpen-weights flagship; the practical pick for self-hosting. Meta 80 Best open-weights coder; closes the gap to GPT-5.5 on SWE-Bench Lite. 68 Companion Emu 3 model lags Imagen 4 and Midjourney 7. 78 Long-form is verbose but stable; tool-use is recent and improving.
DeepSeek R2Cheapest top-tier reasoning model in the matrix. DeepSeek 85 Excellent on competitive-programming and SWE-Bench Lite; weaker on long agent loops. 80 Strong reasoning, terser prose; cost-effective for batch agents.
Mistral Large 3Strong European hosting and data-residency story. Mistral 76 Improved on Large 2; still behind the top three on agent loops. 79 Tight, neutral prose; tool use is reliable but not best-in-class.
GLM-5Open-weights, strong on Chinese-language tasks. Z.ai 74 Competitive on HumanEval; weaker on English-language repo edits. 72 Solid agent runner; English long-form trails the top tier.
Kimi K2Long-context specialist; up to 2M tokens. Moonshot 73 Long-context wins on repo-wide reading; raw editing trails the leaders. 78 Excellent at synthesising large corpora; loose prose voice.
Midjourney v7Image only; no chat or tool use. Midjourney 90 Best aesthetic image fidelity; weaker on strict prompt adherence.
Runway Gen-4Video editing primitives plus generation. Runway 86 Strong on directed edits and motion control; behind Sora 2 on raw coherence.

Coding agents

IDE wrappers and CLI agents. SWE-Bench Verified, higher is better.
  1. 1

    Codex

    OpenAI · default model: GPT-5.5 / GPT-5-Codex

    OpenAI's first-party coding agent for CLI, IDE, cloud, and GitHub workflows.

    75 SWE-B
  2. 2

    Claude Code

    Anthropic · default model: Claude Opus 4.7

    Strongest agentic SWE-Bench results in the matrix.

    71 SWE-B
  3. 3

    Cursor

    Anysphere · default model: GPT-5.5 / Claude Opus 4.7

    Best multi-model IDE; lets you swap providers per task.

    64 SWE-B
  4. 4

    GitHub Copilot

    GitHub · default model: GPT-5.5

    Tight VS Code and PR-review integration.

    58 SWE-B
  5. 5

    Aider

    Open source · default model: Pluggable

    Best terminal-native option; strongest cost control.

    55 SWE-B
  6. 6

    Cline

    Open source · default model: Pluggable

    VS Code extension with explicit plan/act split.

    52 SWE-B
  7. 7

    Roo Code

    Open source · default model: Pluggable

    Cline fork with multi-mode workflow.

    50 SWE-B

Methodology

Directional ratings curated from public benchmarks, model cards, and hands-on use. Scores are 0–100 within a bucket, not across buckets. Each rating links to the primary evidence. Refreshed after any model release. Not authoritative; AI-assisted editorial.

  • Scores are within a bucket. A 90 in code is not comparable to a 90 in image and video.
  • Each rating links to the primary evidence. Where no link exists, treat the score as editorial judgment.
  • This section is AI-assisted editorial. It is decision support, not an authoritative leaderboard.

Read the full methodology →