๐Ÿšจ Breaking โ€” April 4, 2026

AI Model Replacement Guide

Prepared by Dwight ยท April 4, 2026 ยท Anthropic killed OpenClaw OAuth today

Table of Contents

  1. The Situation
  2. Sonnet Replacement
  3. Opus Replacement
  4. Full Model Rankings โ€” April 2026
  5. Subscription OAuth Options
  6. Can GPT-5.4 Do What Sonnet Does?
  7. Transition Plan (3 Options)
  8. Local Model Option (Gemma 4)
  9. Final Recommendation

The Situation

Anthropic killed third-party OAuth today (April 4, 2026 at 12pm PT). You can no longer use a Claude subscription (Pro/Max) to power OpenClaw. Two options remain for Claude:

  • Anthropic API key โ€” pay-per-token ($3/$15 per MTok for Sonnet, $5/$25 for Opus)
  • Switch to a different provider entirely

This guide answers: What should you actually do?

Section 1: The Sonnet Replacement

Current role: Fergus's main agent โ€” everyday conversation, task management, delegation, Discord responses, voice message handling, tool calls, persona consistency, multi-agent orchestration.

๐Ÿฅ‡GPT-5.4 via OpenAI Codex OAuth โ€” CAN REPLACE SONNET

This is the move. Here's why:

  • Quality: AI Top 40 score of 100.0 (vs Sonnet's #5-6 range). Terminal-Bench 2.0: 75.1% vs Sonnet's 59.1%.
  • Tool calling: Has "Tool Search" โ€” dedicated multi-tool orchestration. "Slight edge in complex orchestration where orchestrator must track multiple simultaneous sub-task states." This is exactly what Fergus does.
  • Session memory: ZooClaw benchmark rated GPT-5.4 "Excellent" for memory reliability after compaction โ€” Sonnet rated "Good".
  • OpenClaw community: "5.4 is more than capable of running main agent 95% of the time." Named ZooClaw overall winner above Sonnet AND Opus.
  • Access: Codex OAuth via ChatGPT Plus ($20/mo) or Pro ($200/mo). The ONLY subscription-based OAuth that still works in OpenClaw for a top-tier model.

Where GPT-5.4 is WORSE than Sonnet

  • Speed: 20-30 tok/s vs Sonnet's 44-63 tok/s. Slower Discord responses โ€” biggest real-world tradeoff.
  • Persona compliance: GPT-5.4 wants to DO work. Sonnet follows "don't do work, delegate" rules more naturally. Needs stronger guardrails.
  • Discord formatting: Occasionally generates markdown tables (broken in Discord). Add explicit rules to SOUL.md.
  • Verbosity: Tends toward longer responses. Add "be concise" rules.
Verdict: GPT-5.4 CAN replace Sonnet. Stronger on tool calling, session memory, and multi-step orchestration. Weaker on speed, persona compliance, and formatting โ€” all fixable with prompt engineering. Transition will require 1-2 weeks of SOUL.md tuning.
๐ŸฅˆClaude Sonnet 4.6 via Anthropic API Key

Keep Sonnet by moving to API billing. Identical quality. Cost: $3/$15 per MTok (~$50-150/mo with 90% prompt caching discount for heavy daily use).

Verdict: Obviously works โ€” it's the same model. Question is whether $50-150/mo variable beats $20/mo flat for GPT-5.4 Plus, which is arguably better.
๐Ÿฅ‰Gemini 3.1 Pro via Google API Key โ€” CANNOT FULLY REPLACE

Strong benchmarks (GPQA 94.3%, SWE-bench 80.6%) but no subscription OAuth in OpenClaw, Google is suspending accounts that use their API heavily with OpenClaw agents, and persona consistency is "underwhelming out of the box."

Verdict: CANNOT replace Sonnet for this use case. Too risky, no subscription, weak persona handling.
Sonnet Replacement Recommendation:
  • Primary: GPT-5.4 via Codex OAuth (ChatGPT Plus $20/mo or Pro $200/mo)
  • Fallback: Keep Sonnet via Anthropic API key ($50-150/mo variable)

Section 2: The Opus Replacement

Current role: Dwight โ€” deep analysis, strategy, research, complex reasoning, comparison tasks, financial analysis, product research.

๐Ÿฅ‡GPT-5.4 via OpenAI Codex OAuth โ€” CAN REPLACE OPUS (for 80% of tasks)
  • AI Top 40 #1 (100.0) vs Opus at #2 (93.2). Leads on 8 of 10 benchmarks.
  • Matches Opus on: multi-step reasoning, knowledge work, structured analysis, research synthesis.
  • Falls short: Opus has a unique quality for nuanced, creative reasoning โ€” "thinking around corners." GPT-5.4 is more systematic, Opus more intuitive. For the hardest 20% of Dwight tasks, Opus still has an edge.
Verdict: GPT-5.4 CAN replace Opus for most Dwight tasks. For the hardest 20%, you lose some nuance.
๐ŸฅˆClaude Opus 4.6 via Anthropic API Key โ€” STAYS AS OPUS

Arena #1 (Elo 1,504). Arena Code #1 (Elo 1,548). The acknowledged quality ceiling. Cost: $5/$25 per MTok. Dwight is used selectively so realistic monthly: $30-80.

Verdict: Opus via API CAN stay as Opus. $30-80/mo for the best reasoning model is reasonable for selective use.
Opus Replacement Recommendation:
  • Primary: Keep Opus via Anthropic API key ($30-80/mo) โ€” cheap enough for selective Dwight use
  • Alternative: Use GPT-5.4 for everything (accepts the 20% quality loss on deepest reasoning)

Section 3: Full Model Rankings โ€” April 2026

Ranked by overall capability. Pricing, access method, and value rating for each.

Tier S โ€” Frontier (Best Models Alive)

#1
GPT-5.4 OpenAI
AI Top 40: 100.0 ยท Terminal-Bench: 75.1% ยท SWE-bench: ~80%
API: $2.50/$15 per MTok ยท โœ… Codex OAuth subscription ($20-200/mo)
Best all-around agentic model. Tool calling. Computer use. Multi-step execution.
๐Ÿ”ต $20/mo
#2
Claude Opus 4.6 Anthropic
Arena: #1 (1,504) ยท Arena Code: #1 (1,548) ยท SWE-bench: 79.6%
API: $5/$25 per MTok ยท โš ๏ธ API key only (OAuth killed April 4)
Best human-preference quality. Best coding. Best nuanced reasoning. Best persona adherence.
๐ŸŸก Fair
#3
Grok 4 xAI
AI Top 40: 86.6 ยท HLE: 50.7% (#1) ยท Strong reasoning
API: $3/$15 per MTok ยท โœ… API key only. No OAuth.
Strongest on Humanity's Last Exam. Good deep reasoning.
๐ŸŸก Fair
#4
Gemini 3.1 Pro Google
GPQA: 94.3% (#1) ยท SWE-bench: 80.6% ยท ARC-AGI-2: 77.1% (#1)
API: $2/$12 per MTok ยท โœ… API key. โŒ No OAuth subscription.
โš ๏ธ Risk Google suspending accounts using API with OpenClaw agents.
๐ŸŸข Great value

Tier A โ€” Near-Frontier

#5
Claude Sonnet 4.6 Anthropic
SWE-bench: 79.6% ยท 44-63 tok/s ยท Excellent instruction following
API: $3/$15 per MTok (90% cache discount) ยท โš ๏ธ API key only (OAuth killed April 4)
Fastest frontier-class model. Best instruction following. Best persona consistency. Best Discord formatting.
๐ŸŸข Excellent
#6
GPT-5.4 Pro OpenAI
Enhanced reasoning version
API: $30/$180 per MTok (12x standard) ยท โœ… API key. Not on Codex OAuth.
Maximum reasoning when cost is no object.
๐Ÿ”ด Poor value
#7
GPT-5.2 OpenAI
Strong general-purpose. "Most reliable instruction following" per OpenAI.
API: $1.75/$14 per MTok ยท โœ… API key.
๐ŸŸข Excellent
#8
Qwen3-Max Alibaba
API: $1.20/$6 per MTok ยท Or Alibaba Coding Plan: $10-50/mo flat
โœ… API key + โœ… Alibaba Coding Plan OAuth
โš ๏ธ Quality Significantly below Sonnet/GPT-5.4. "TV robot" persona compliance.
๐ŸŸข Budget

Tier B โ€” Strong Performers

#9
Grok 4.2 xAI
Fast variant. API: $2/$6 per MTok ยท โœ… API key. 2M context window.
๐ŸŸข Excellent
#10
GPT-5.3-Codex OpenAI
400K context. Strong coding specialist. API: $1.75/$14 per MTok
โœ… Codex OAuth (subscription) โ€” included in Plus/Pro. This is what Cody/Bomb use.
๐Ÿ”ต Included
#11
GPT-5.3-Codex-Spark OpenAI
1,000+ tok/s on Cerebras. Ultra-fast coding. Pro subscription only.
โœ… Codex OAuth ยท Separate usage pool from GPT-5.4 (different hardware)
Pro users get GPT-5.4 quota PLUS Spark quota independently โ€” more total headroom.
Pro only
#12
DeepSeek V3.2 DeepSeek
API: $0.28/$0.42 per MTok (27x cheaper than Opus)
โš ๏ธ Privacy Chinese company. Data handling concerns for business use.
๐ŸŸข Cheapest
#13
Mistral Large 3 Mistral
API: $0.50/$1.50 per MTok ยท European data handling.
๐ŸŸข Excellent

Tier C โ€” Budget / Specialist

#15
Gemma 4 31B Google Open
AA Intelligence Index: #2 among open models (score 39). MMLU-Pro 85.2%. Native function calling.
โœ… Local via Ollama ยท Free if you have hardware. Apache 2.0 license.
โš ๏ธ Gap Sonnet has a 43-point advantage on AA Intelligence Index. Good for local fallback, not primary agent.
Free local
#16
Llama 4 Maverick Meta Open
AI Top 40: #31. 400B total params, MoE. Multimodal.
Free locally ยท API via OpenRouter/Groq ยท CANNOT replace Sonnet.
Fair

Section 4: Subscription OAuth in OpenClaw โ€” Full List

Which providers let you pay a flat monthly rate and use it through OpenClaw?

โœ… Confirmed Working Subscription OAuth

Provider Subscription Price Models Quality for Main Agent?
OpenAI Codex ChatGPT Plus $20/mo GPT-5.4, GPT-5.3-Codex, GPT-5.4-mini YES โ€” Top tier
OpenAI Codex ChatGPT Pro $200/mo All Plus + Spark, 6x limits YES โ€” Top tier + headroom
Alibaba Coding Plan Lite $10/mo ($3 first mo) Qwen3.5+, Kimi K2.5, GLM-5, MiniMax M2.5 No โ€” budget tier
Alibaba Coding Plan Pro $50/mo ($15 first mo) Same models, 5x more requests No โ€” budget tier
MiniMax Coding Plan OAuth ~$10-20/mo MiniMax models No โ€” budget tier
Z.AI / GLM Coding Plan OAuth ~$10-20/mo GLM models No โ€” budget tier

โŒ No Subscription OAuth

The reality: OpenAI is the ONLY provider offering top-tier subscription OAuth in OpenClaw.

Section 5: Can GPT-5.4 Do What Brent Does With Sonnet?

โœ…
Long complex sessions without degrading
ZooClaw benchmark: "Excellent" for memory reliability after compaction โ€” same as Opus, BETTER than Sonnet ("Good").
โš ๏ธ
Follow SOUL.md / AGENTS.md persona instructions
YES, WITH WORK. More opinionated than Sonnet. Needs stronger delegation rules, explicit "no markdown tables," explicit "be concise." Set personalityOverlay: "off".
โœ…
Handle tool calls reliably
BETTER than Sonnet. Terminal-Bench 75.1% vs 59.1%. Designed for agentic tool use.
โœ…
Multi-agent orchestration
"Instruction-following precision gives it a slight edge in complex orchestration where orchestrator must track multiple simultaneous sub-task states."
โš ๏ธ
Format Discord messages properly
YES, WITH RULES. Occasionally generates markdown tables, tends toward longer responses. Fixable with explicit SOUL.md rules.
โœ…
Flat-rate subscription
The ONLY top-tier model with subscription OAuth in OpenClaw right now.
Bottom line: GPT-5.4 CAN do everything Brent does with Sonnet. It's actually BETTER at tool calling and session memory. Worse at delegation compliance, formatting, and speed โ€” all fixable with prompt engineering. Will take 1-2 weeks of SOUL.md tuning.

Section 6: Transition Plan

Option A: All-In OpenAI (Simplest, Cheapest)
$200/mo (Pro) or $20/mo (Plus)
  • Main Agent (Fergus): openai-codex/gpt-5.4
  • Sub-Agent Dwight: openai-codex/gpt-5.4
  • Sub-Agent Scout: openai-codex/gpt-5.4-mini
  • Coding (Cody/Bomb): openai-codex/gpt-5.4 (already on this)

Pros

  • One provider, one bill, flat rate
  • Generous limits

Cons

  • No Opus for hardest reasoning
  • Slower speed
  • Prompt tuning needed

SOUL.md additions needed:

## ๐Ÿšจ GPT-5.4 SPECIFIC RULES
- NEVER use markdown tables. Use bullet lists instead. Always.
- Keep responses concise. Don't over-explain.
- You are a DISPATCHER. Do NOT do work yourself. This model
  has a tendency to try to do things โ€” resist it. Delegate EVERYTHING.
- When formatting for Discord: use bold headers + bullet lists.
  No tables. No code blocks for non-code content.

openclaw.json config:

{
  "plugins": {
    "entries": {
      "openai": {
        "config": {
          "personalityOverlay": "off"
        }
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai-codex/gpt-5.4"
      }
    }
  }
}
Option B: Hybrid (Best Quality, Higher Cost)
~$280-430/mo total
  • Main Agent (Fergus): openai-codex/gpt-5.4 ($200/mo Pro or $20/mo Plus)
  • Sub-Agent Dwight: anthropic/claude-opus-4-6 (API key, $30-80/mo)
  • Sub-Agent Scout: openai-codex/gpt-5.4-mini (included)
  • Coding (Cody/Bomb): openai-codex/gpt-5.4 (included)
  • Fallback: anthropic/claude-sonnet-4-6 (API key, if GPT-5.4 hits limits)

Pros

  • Best of both worlds
  • Opus for hard problems
  • GPT-5.4 for everything else

Cons

  • Two providers, two billing methods
  • More complexity
Option C: Stay on Anthropic API (Minimal Change)
~$100-250/mo variable
  • Main Agent (Fergus): anthropic/claude-sonnet-4-6 (API key)
  • Sub-Agent Dwight: anthropic/claude-opus-4-6 (API key)
  • Coding (Cody/Bomb): openai-codex/gpt-5.4 (Codex OAuth, Plus $20/mo)

Just swap the Anthropic auth from OAuth token to API key. Same SOUL.md works. Variable monthly cost โ€” could spike.

Section 7: Local Model Option โ€” Gemma 4

Can Gemma 4 replace Sonnet if run locally?

Short answer: No.

  • AA Intelligence Index: Gemma 4 scores 39 vs Sonnet's 82+ (43-point gap)
  • Gemma 4 is the BEST open/local model โ€” but "best local" is still a tier below frontier cloud models.

Hardware Requirements

Best Local Models for OpenClaw โ€” April 2026

  1. Gemma 4 31B โ€” Best overall. Native function calling. Apache 2.0. Needs 32GB+ RAM.
  2. Qwen3.5 27B โ€” "Matches GPT-5 Mini." Good tool calling. Needs 32GB+ RAM.
  3. Gemma 4 26B MoE โ€” Good for its effective size (3.8B active). Best "bang for buck" โ€” can squeeze into 16GB.
  4. DeepSeek-Coder small variants โ€” Good for local coding agents.
Recommendation: Don't buy hardware for local models as a Sonnet replacement. The quality gap is too large. Local models are great as emergency fallback, privacy-sensitive tasks, or saving money on low-priority Scout-level work. But for the main Fergus agent? Cloud models are the answer.

Final Recommendation โ€” What To Do Right Now

Immediate (Today)

  1. Set up Anthropic API key in OpenClaw config โ€” keeps everything working while you transition. Cost: ~$50-150/mo for Sonnet, $30-80/mo for selective Opus.
  2. Keep Codex OAuth for Cody/Bomb (already working on Plus $20/mo).

This Week

  1. Test GPT-5.4 as main agent on a non-critical channel. Try it in #fergus for a day.
  2. Tune SOUL.md for GPT-5.4 quirks (anti-table rules, stronger delegation language, conciseness).

Within 2 Weeks

  1. If GPT-5.4 works โ†’ upgrade to Pro ($200/mo) and go all-in (Option A or B).
  2. If GPT-5.4 doesn't match Sonnet's persona โ†’ stay on Anthropic API (Option C), accept variable cost.

The Money

  • Cheapest good option GPT-5.4 Pro all-in = $200/mo flat
  • Best quality option GPT-5.4 Pro + Opus API for Dwight = $230-280/mo
  • Status quo Anthropic API for everything = $100-250/mo variable + Plus $20/mo for Codex
The bottom line: GPT-5.4 is the replacement. It's ranked #1 overall. The OpenClaw community endorses it. It's the only top-tier model with subscription access. The only question is whether you can tune your prompts to match Sonnet's persona compliance โ€” and based on everything reviewed here, you can. It just takes a week or two of iteration.

AI Model Replacement Guide ยท Prepared by Dwight ยท April 4, 2026 ยท For Brent's OpenClaw stack