STET

flux-pr-5409

Zod (TypeScript) · W2 · GPT-5.3 Codex

fail_high_conf

Tests failed. 1/2 commands passed. Strength: weak.

69.2% run pass rate
Tier 1
primary equivalencefailedneeds generated testsweak signal riskcommand source driftnon equivalentfail
pnpm build
gold passagent pass
npx vitest run packages/zod/src/v4/core/tests/locales/he.test.ts -t "string\ type\ \(feminine\ \-\ צריכה\)|number\ type\ \(masculine\ \-\ צריך\)|array\ type\ \(masculine\ \-\ צריך\)|set\ type\ \(feminine\ \-\ צריכה\)|array\ max|string\ expected\ \(feminine\),\ number\ received|number\ expected\ \(masculine\),\ string\ received|boolean\ expected\ \(masculine\),\ null\ received|array\ expected\ \(masculine\),\ object\ received|object\ expected\ \(masculine\),\ array\ received|function\ expected\ \(feminine\),\ string\ received|feminine\ types\ use\ צריכה|masculine\ types\ use\ צריך|single\ value|two\ values|multiple\ values|not_multiple_of|unrecognized_keys\ \-\ single\ key|unrecognized_keys\ \-\ multiple\ keys|invalid_union|invalid_key\ in\ object|startsWith|endsWith|includes|regex|email|url|uuid|invalid\ element\ type\ in\ tuple\ shows\ full\ error\ message|inclusive\ minimum\ \(>=\)|exclusive\ minimum\ \(>\)|inclusive\ maximum\ \(<=\)|exclusive\ maximum\ \(<\)|verifies\ all\ type\ translations\ are\ correct"
gold passagent fail

Partial score: 1/2

Publishable: noWeak signal risk: yesCache: miss

Trajectory

unknown · partial order only

Canonical trajectory missing; showing coarse derived order only.

patch written
Patch captured
#1

Stet captured agent.patch for this trial.

validation
Tests failed
#2
equivalence
Equivalence judgment
#3

non_equivalent

code review
Code review judgment
#4

fail

decision
Final decision
#5

fail_high_conf

Quality

equivalence
non_equivalent
82% confidence
code review
fail
6 findings
footprint
low (0.16)
behavioral
50.0%
cost
$1.05 · 220K

Equivalence Reasoning

behavioral

The patch improves many messages, but it misses core intent around fully localized, natural Hebrew in key/element context messages: `invalid_key` and `invalid_element` still fall back to raw origin strings via `ב${origin}` (e.g. potentially `בarray`/`בobject`), which breaks the requested localized type-label and definite-article consistency.

Code Review

correctness: 1/4introduced bug risk: 1/4edge case handling: 1/4maintainability idioms: 3/4

The patch is structurally thoughtful but likely does not satisfy the intended change set because multiple high-visibility locale strings differ from expected Hebrew phrasing contracts and key tested outputs.

6 findings
`invalid_union` message changed beyond expected contract
major

The patch returns an expanded sentence for `invalid_union` instead of the simpler expected output, which is likely to fail message-exact locale tests.

packages/zod/src/v4/locales/he.ts:178
`invalid_key` wording does not match intended localized phrasing
major

Expected behavior for key-related errors is a specific object-field phrasing, but the patch emits `מפתח לא תקין ...` with origin-dependent context, changing both noun choice and structure.

packages/zod/src/v4/locales/he.ts:176
`invalid_element` uses raw origin context instead of localized definite type label
major

The patch builds `invalid_element` with `originContext(issue.origin)`, which can produce non-localized forms (e.g., generic `ב${origin}`) instead of consistent Hebrew definite-article type labels.

packages/zod/src/v4/locales/he.ts:58
Size error phrasing shifts from required צריך/צריכה patterns
major

For `too_small`/`too_big`, the patch uniformly uses `חייב/חייבת` and min/max prose helpers, diverging from expected gender-aware `צריך/צריכה` and task-specific wording consistency.

packages/zod/src/v4/locales/he.ts:142
Generic `invalid_format` message form is inconsistent with expected adjective pattern
major

The patch switches to `אינו/אינה תקין/ה` with definite nouns, which can mismatch expected locale strings for formats like email/url/uuid that rely on stable wording.

packages/zod/src/v4/locales/he.ts:168
`unrecognized_keys` text changed to different terminology
major

The output changed from `לא מזוהה/ים` style to `לא מוכר/ים`, which is likely to break exact-match tests for single/plural key errors.

packages/zod/src/v4/locales/he.ts:174