STET
Current cleaned leaderboard

Zod (TypeScript)

Zod (TypeScript) is deadlocked at 44% pass rate — but quality separates them: GPT-5.5 leads equivalence at 67%.

Updated May 01, 2026·168 runs·2 repos·Judge: GPT-5.4
Model
27 tasks per model
Week 4
The test-based pass/fail bar
44.4%
3 models tied
Match the intended fix?
66.7%
GPT-5.5
Would a reviewer approve?
51.9%
GPT-5.5
Surgical or over-edited?
21.0%
Claude Opus 4.7 · lowest
Claude Opus 4.7claude code
44.4%
25.9%63.0% · n=27
40.7%
22.2%
21.0%
GPT-5.5codex cli
44.4%
25.9%63.0% · n=27
66.7%
51.9%
30.7%
GPT-5.4codex cli
33.3%
14.8%51.9% · n=27
66.7%
37.0%
34.8%
Snapshot studies

Opus 4.7 vs Opus 4.6 on Zod

Read study →
Historical Zod slice · 28 tasks · Claude Code only

Not leaderboard-ranked; this uses a historical 28-task Zod slice and should only be compared within this study.

ModelGateEquivReviewFootprintCost
Claude Opus 4.6 (Mar 19)42.9%39.3%39.3%0.210$8.93
Claude Opus 4.6 (Apr 16)42.9%32.1%25.0%0.221$6.65
Claude Opus 4.742.9%46.4%25.0%0.090$8.11