AI Model Replacement Guide

The Situation
Sonnet Replacement
Opus Replacement
Full Model Rankings — April 2026
Subscription OAuth Options
Can GPT-5.4 Do What Sonnet Does?
Transition Plan (3 Options)
Local Model Option (Gemma 4)
Final Recommendation

The Situation

Anthropic killed third-party OAuth today (April 4, 2026 at 12pm PT). You can no longer use a Claude subscription (Pro/Max) to power OpenClaw. Two options remain for Claude:

Anthropic API key — pay-per-token ($3/$15 per MTok for Sonnet, $5/$25 for Opus)
Switch to a different provider entirely

This guide answers: What should you actually do?

Section 1: The Sonnet Replacement

Current role: Fergus's main agent — everyday conversation, task management, delegation, Discord responses, voice message handling, tool calls, persona consistency, multi-agent orchestration.

🥇GPT-5.4 via OpenAI Codex OAuth — CAN REPLACE SONNET

This is the move. Here's why:

Quality: AI Top 40 score of 100.0 (vs Sonnet's #5-6 range). Terminal-Bench 2.0: 75.1% vs Sonnet's 59.1%.
Tool calling: Has "Tool Search" — dedicated multi-tool orchestration. "Slight edge in complex orchestration where orchestrator must track multiple simultaneous sub-task states." This is exactly what Fergus does.
Session memory: ZooClaw benchmark rated GPT-5.4 "Excellent" for memory reliability after compaction — Sonnet rated "Good".
OpenClaw community: "5.4 is more than capable of running main agent 95% of the time." Named ZooClaw overall winner above Sonnet AND Opus.
Access: Codex OAuth via ChatGPT Plus ($20/mo) or Pro ($200/mo). The ONLY subscription-based OAuth that still works in OpenClaw for a top-tier model.

Where GPT-5.4 is WORSE than Sonnet

Speed: 20-30 tok/s vs Sonnet's 44-63 tok/s. Slower Discord responses — biggest real-world tradeoff.
Persona compliance: GPT-5.4 wants to DO work. Sonnet follows "don't do work, delegate" rules more naturally. Needs stronger guardrails.
Discord formatting: Occasionally generates markdown tables (broken in Discord). Add explicit rules to SOUL.md.
Verbosity: Tends toward longer responses. Add "be concise" rules.

Verdict: GPT-5.4 CAN replace Sonnet. Stronger on tool calling, session memory, and multi-step orchestration. Weaker on speed, persona compliance, and formatting — all fixable with prompt engineering. Transition will require 1-2 weeks of SOUL.md tuning.

🥈Claude Sonnet 4.6 via Anthropic API Key

Keep Sonnet by moving to API billing. Identical quality. Cost: $3/$15 per MTok (~$50-150/mo with 90% prompt caching discount for heavy daily use).

Verdict: Obviously works — it's the same model. Question is whether $50-150/mo variable beats $20/mo flat for GPT-5.4 Plus, which is arguably better.

🥉Gemini 3.1 Pro via Google API Key — CANNOT FULLY REPLACE

Strong benchmarks (GPQA 94.3%, SWE-bench 80.6%) but no subscription OAuth in OpenClaw, Google is suspending accounts that use their API heavily with OpenClaw agents, and persona consistency is "underwhelming out of the box."

Verdict: CANNOT replace Sonnet for this use case. Too risky, no subscription, weak persona handling.

Sonnet Replacement Recommendation:

Primary: GPT-5.4 via Codex OAuth (ChatGPT Plus $20/mo or Pro $200/mo)
Fallback: Keep Sonnet via Anthropic API key ($50-150/mo variable)

Section 2: The Opus Replacement

Current role: Dwight — deep analysis, strategy, research, complex reasoning, comparison tasks, financial analysis, product research.

🥇GPT-5.4 via OpenAI Codex OAuth — CAN REPLACE OPUS (for 80% of tasks)
AI Top 40 #1 (100.0) vs Opus at #2 (93.2). Leads on 8 of 10 benchmarks.
Matches Opus on: multi-step reasoning, knowledge work, structured analysis, research synthesis.
Falls short: Opus has a unique quality for nuanced, creative reasoning — "thinking around corners." GPT-5.4 is more systematic, Opus more intuitive. For the hardest 20% of Dwight tasks, Opus still has an edge.
Verdict: GPT-5.4 CAN replace Opus for most Dwight tasks. For the hardest 20%, you lose some nuance.

🥈Claude Opus 4.6 via Anthropic API Key — STAYS AS OPUS

Arena #1 (Elo 1,504). Arena Code #1 (Elo 1,548). The acknowledged quality ceiling. Cost: $5/$25 per MTok. Dwight is used selectively so realistic monthly: $30-80.

Verdict: Opus via API CAN stay as Opus. $30-80/mo for the best reasoning model is reasonable for selective use.

Opus Replacement Recommendation:

Primary: Keep Opus via Anthropic API key ($30-80/mo) — cheap enough for selective Dwight use
Alternative: Use GPT-5.4 for everything (accepts the 20% quality loss on deepest reasoning)

Section 3: Full Model Rankings — April 2026

Ranked by overall capability. Pricing, access method, and value rating for each.

Tier S — Frontier (Best Models Alive)

GPT-5.4 OpenAI

AI Top 40: 100.0 · Terminal-Bench: 75.1% · SWE-bench: ~80%

API: $2.50/$15 per MTok · ✅ Codex OAuth subscription ($20-200/mo)

Best all-around agentic model. Tool calling. Computer use. Multi-step execution.

🔵 $20/mo

Claude Opus 4.6 Anthropic

Arena: #1 (1,504) · Arena Code: #1 (1,548) · SWE-bench: 79.6%

API: $5/$25 per MTok · ⚠️ API key only (OAuth killed April 4)

Best human-preference quality. Best coding. Best nuanced reasoning. Best persona adherence.

🟡 Fair

Grok 4 xAI

AI Top 40: 86.6 · HLE: 50.7% (#1) · Strong reasoning

API: $3/$15 per MTok · ✅ API key only. No OAuth.

Strongest on Humanity's Last Exam. Good deep reasoning.

🟡 Fair

Gemini 3.1 Pro Google

GPQA: 94.3% (#1) · SWE-bench: 80.6% · ARC-AGI-2: 77.1% (#1)

API: $2/$12 per MTok · ✅ API key. ❌ No OAuth subscription.

⚠️ Risk Google suspending accounts using API with OpenClaw agents.

🟢 Great value

Tier A — Near-Frontier

Claude Sonnet 4.6 Anthropic

SWE-bench: 79.6% · 44-63 tok/s · Excellent instruction following

API: $3/$15 per MTok (90% cache discount) · ⚠️ API key only (OAuth killed April 4)

Fastest frontier-class model. Best instruction following. Best persona consistency. Best Discord formatting.

🟢 Excellent

GPT-5.4 Pro OpenAI

Enhanced reasoning version

API: $30/$180 per MTok (12x standard) · ✅ API key. Not on Codex OAuth.

Maximum reasoning when cost is no object.

🔴 Poor value

GPT-5.2 OpenAI

Strong general-purpose. "Most reliable instruction following" per OpenAI.

API: $1.75/$14 per MTok · ✅ API key.

🟢 Excellent

Qwen3-Max Alibaba

API: $1.20/$6 per MTok · Or Alibaba Coding Plan: $10-50/mo flat

✅ API key + ✅ Alibaba Coding Plan OAuth

⚠️ Quality Significantly below Sonnet/GPT-5.4. "TV robot" persona compliance.

🟢 Budget

Tier B — Strong Performers

Grok 4.2 xAI

Fast variant. API: $2/$6 per MTok · ✅ API key. 2M context window.

🟢 Excellent

#10

GPT-5.3-Codex OpenAI

400K context. Strong coding specialist. API: $1.75/$14 per MTok

✅ Codex OAuth (subscription) — included in Plus/Pro. This is what Cody/Bomb use.

🔵 Included

#11

GPT-5.3-Codex-Spark OpenAI

1,000+ tok/s on Cerebras. Ultra-fast coding. Pro subscription only.

✅ Codex OAuth · Separate usage pool from GPT-5.4 (different hardware)

Pro users get GPT-5.4 quota PLUS Spark quota independently — more total headroom.

Pro only

#12

DeepSeek V3.2 DeepSeek

API: $0.28/$0.42 per MTok (27x cheaper than Opus)

⚠️ Privacy Chinese company. Data handling concerns for business use.

🟢 Cheapest

#13

Mistral Large 3 Mistral

API: $0.50/$1.50 per MTok · European data handling.

🟢 Excellent

Tier C — Budget / Specialist

#15

Gemma 4 31B Google Open

AA Intelligence Index: #2 among open models (score 39). MMLU-Pro 85.2%. Native function calling.

✅ Local via Ollama · Free if you have hardware. Apache 2.0 license.

⚠️ Gap Sonnet has a 43-point advantage on AA Intelligence Index. Good for local fallback, not primary agent.

Free local

#16

Llama 4 Maverick Meta Open

AI Top 40: #31. 400B total params, MoE. Multimodal.

Free locally · API via OpenRouter/Groq · CANNOT replace Sonnet.

Fair

Section 4: Subscription OAuth in OpenClaw — Full List

Which providers let you pay a flat monthly rate and use it through OpenClaw?

✅ Confirmed Working Subscription OAuth

Provider	Subscription	Price	Models	Quality for Main Agent?
OpenAI Codex	ChatGPT Plus	$20/mo	GPT-5.4, GPT-5.3-Codex, GPT-5.4-mini	YES — Top tier
OpenAI Codex	ChatGPT Pro	$200/mo	All Plus + Spark, 6x limits	YES — Top tier + headroom
Alibaba Coding Plan	Lite	$10/mo ($3 first mo)	Qwen3.5+, Kimi K2.5, GLM-5, MiniMax M2.5	No — budget tier
Alibaba Coding Plan	Pro	$50/mo ($15 first mo)	Same models, 5x more requests	No — budget tier
MiniMax Coding Plan	OAuth	~$10-20/mo	MiniMax models	No — budget tier
Z.AI / GLM Coding Plan	OAuth	~$10-20/mo	GLM models	No — budget tier

❌ No Subscription OAuth

Anthropic — KILLED TODAY (April 4, 2026). API key only going forward.
Google Gemini — Feature request filed, not implemented. API key only.
xAI Grok — API key only. No OAuth in OpenClaw.
Mistral — API key only.
DeepSeek — API key only.
Meta Llama — Open weights only (local or hosted).

The reality: OpenAI is the ONLY provider offering top-tier subscription OAuth in OpenClaw.

Section 5: Can GPT-5.4 Do What Brent Does With Sonnet?

✅

Long complex sessions without degrading

ZooClaw benchmark: "Excellent" for memory reliability after compaction — same as Opus, BETTER than Sonnet ("Good").

⚠️

Follow SOUL.md / AGENTS.md persona instructions

YES, WITH WORK. More opinionated than Sonnet. Needs stronger delegation rules, explicit "no markdown tables," explicit "be concise." Set personalityOverlay: "off".

✅

Handle tool calls reliably

BETTER than Sonnet. Terminal-Bench 75.1% vs 59.1%. Designed for agentic tool use.

✅

Multi-agent orchestration

"Instruction-following precision gives it a slight edge in complex orchestration where orchestrator must track multiple simultaneous sub-task states."

⚠️

Format Discord messages properly

YES, WITH RULES. Occasionally generates markdown tables, tends toward longer responses. Fixable with explicit SOUL.md rules.

✅

Flat-rate subscription

The ONLY top-tier model with subscription OAuth in OpenClaw right now.

Bottom line: GPT-5.4 CAN do everything Brent does with Sonnet. It's actually BETTER at tool calling and session memory. Worse at delegation compliance, formatting, and speed — all fixable with prompt engineering. Will take 1-2 weeks of SOUL.md tuning.

Section 6: Transition Plan

Option A: All-In OpenAI (Simplest, Cheapest)

$200/mo (Pro) or $20/mo (Plus)

Main Agent (Fergus): openai-codex/gpt-5.4
Sub-Agent Dwight: openai-codex/gpt-5.4
Sub-Agent Scout: openai-codex/gpt-5.4-mini
Coding (Cody/Bomb): openai-codex/gpt-5.4 (already on this)

Pros

One provider, one bill, flat rate
Generous limits

Cons

No Opus for hardest reasoning
Slower speed
Prompt tuning needed

SOUL.md additions needed:

## 🚨 GPT-5.4 SPECIFIC RULES
- NEVER use markdown tables. Use bullet lists instead. Always.
- Keep responses concise. Don't over-explain.
- You are a DISPATCHER. Do NOT do work yourself. This model
  has a tendency to try to do things — resist it. Delegate EVERYTHING.
- When formatting for Discord: use bold headers + bullet lists.
  No tables. No code blocks for non-code content.

openclaw.json config:

{
  "plugins": {
    "entries": {
      "openai": {
        "config": {
          "personalityOverlay": "off"
        }
      }
    }
  },
  "agents": {
    "defaults": {
      "model": {
        "primary": "openai-codex/gpt-5.4"
      }
    }
  }
}

Option B: Hybrid (Best Quality, Higher Cost)

~$280-430/mo total

Main Agent (Fergus): openai-codex/gpt-5.4 ($200/mo Pro or $20/mo Plus)
Sub-Agent Dwight: anthropic/claude-opus-4-6 (API key, $30-80/mo)
Sub-Agent Scout: openai-codex/gpt-5.4-mini (included)
Coding (Cody/Bomb): openai-codex/gpt-5.4 (included)
Fallback: anthropic/claude-sonnet-4-6 (API key, if GPT-5.4 hits limits)

Pros

Best of both worlds
Opus for hard problems
GPT-5.4 for everything else

Cons

Two providers, two billing methods
More complexity

Option C: Stay on Anthropic API (Minimal Change)

~$100-250/mo variable

Main Agent (Fergus): anthropic/claude-sonnet-4-6 (API key)
Sub-Agent Dwight: anthropic/claude-opus-4-6 (API key)
Coding (Cody/Bomb): openai-codex/gpt-5.4 (Codex OAuth, Plus $20/mo)

Just swap the Anthropic auth from OAuth token to API key. Same SOUL.md works. Variable monthly cost — could spike.

Section 7: Local Model Option — Gemma 4

Can Gemma 4 replace Sonnet if run locally?

Short answer: No.

AA Intelligence Index: Gemma 4 scores 39 vs Sonnet's 82+ (43-point gap)
Gemma 4 is the BEST open/local model — but "best local" is still a tier below frontier cloud models.

Hardware Requirements

Current Mac Mini M4 16GB: ❌ Can't run 31B. Gemma 4 26B MoE is tight. Not recommended for production.
Mac Mini M4 Pro 48GB (~$1,800-2,000): ✅ Runs 31B comfortably. Sweet spot.
Mac Studio M4 Max 64GB+ (~$2,700+): ✅ Can run 70B models at usable speed.

Best Local Models for OpenClaw — April 2026

Gemma 4 31B — Best overall. Native function calling. Apache 2.0. Needs 32GB+ RAM.
Qwen3.5 27B — "Matches GPT-5 Mini." Good tool calling. Needs 32GB+ RAM.
Gemma 4 26B MoE — Good for its effective size (3.8B active). Best "bang for buck" — can squeeze into 16GB.
DeepSeek-Coder small variants — Good for local coding agents.

Recommendation: Don't buy hardware for local models as a Sonnet replacement. The quality gap is too large. Local models are great as emergency fallback, privacy-sensitive tasks, or saving money on low-priority Scout-level work. But for the main Fergus agent? Cloud models are the answer.

Final Recommendation — What To Do Right Now

Immediate (Today)

Set up Anthropic API key in OpenClaw config — keeps everything working while you transition. Cost: ~$50-150/mo for Sonnet, $30-80/mo for selective Opus.
Keep Codex OAuth for Cody/Bomb (already working on Plus $20/mo).

This Week

Test GPT-5.4 as main agent on a non-critical channel. Try it in #fergus for a day.
Tune SOUL.md for GPT-5.4 quirks (anti-table rules, stronger delegation language, conciseness).

Within 2 Weeks

If GPT-5.4 works → upgrade to Pro ($200/mo) and go all-in (Option A or B).
If GPT-5.4 doesn't match Sonnet's persona → stay on Anthropic API (Option C), accept variable cost.

The Money

Cheapest good option GPT-5.4 Pro all-in = $200/mo flat
Best quality option GPT-5.4 Pro + Opus API for Dwight = $230-280/mo
Status quo Anthropic API for everything = $100-250/mo variable + Plus $20/mo for Codex

The bottom line: GPT-5.4 is the replacement. It's ranked #1 overall. The OpenClaw community endorses it. It's the only top-tier model with subscription access. The only question is whether you can tune your prompts to match Sonnet's persona compliance — and based on everything reviewed here, you can. It just takes a week or two of iteration.

AI Model Replacement Guide · Prepared by Dwight · April 4, 2026 · For Brent's OpenClaw stack

AI Model Replacement Guide

Table of Contents

The Situation

Section 1: The Sonnet Replacement

Where GPT-5.4 is WORSE than Sonnet

Section 2: The Opus Replacement

Section 3: Full Model Rankings — April 2026

Tier S — Frontier (Best Models Alive)

Tier A — Near-Frontier

Tier B — Strong Performers

Tier C — Budget / Specialist

Section 4: Subscription OAuth in OpenClaw — Full List

✅ Confirmed Working Subscription OAuth

❌ No Subscription OAuth

Section 5: Can GPT-5.4 Do What Brent Does With Sonnet?

Section 6: Transition Plan

Pros

Cons

Pros

Cons

Section 7: Local Model Option — Gemma 4

Hardware Requirements

Best Local Models for OpenClaw — April 2026

Final Recommendation — What To Do Right Now

Immediate (Today)

This Week

Within 2 Weeks

The Money