STET

flux-pr-5409

Zod (TypeScript) · W2 · GPT-5.1 Codex Mini

fail_high_conf

Tests failed. 1/2 commands passed. Strength: weak.

61.5% run pass rate
Tier 1
primary equivalencefailedneeds generated testsweak signal riskcommand source driftnon equivalentfail
pnpm build
gold passagent pass
npx vitest run packages/zod/src/v4/core/tests/locales/he.test.ts -t "string\ type\ \(feminine\ \-\ צריכה\)|number\ type\ \(masculine\ \-\ צריך\)|array\ type\ \(masculine\ \-\ צריך\)|set\ type\ \(feminine\ \-\ צריכה\)|array\ max|string\ expected\ \(feminine\),\ number\ received|number\ expected\ \(masculine\),\ string\ received|boolean\ expected\ \(masculine\),\ null\ received|array\ expected\ \(masculine\),\ object\ received|object\ expected\ \(masculine\),\ array\ received|function\ expected\ \(feminine\),\ string\ received|feminine\ types\ use\ צריכה|masculine\ types\ use\ צריך|single\ value|two\ values|multiple\ values|not_multiple_of|unrecognized_keys\ \-\ single\ key|unrecognized_keys\ \-\ multiple\ keys|invalid_union|invalid_key\ in\ object|startsWith|endsWith|includes|regex|email|url|uuid|invalid\ element\ type\ in\ tuple\ shows\ full\ error\ message|inclusive\ minimum\ \(>=\)|exclusive\ minimum\ \(>\)|inclusive\ maximum\ \(<=\)|exclusive\ maximum\ \(<\)|verifies\ all\ type\ translations\ are\ correct"
gold passagent fail

Partial score: 1/2

Publishable: noWeak signal risk: yesCache: miss

Trajectory

unknown · partial order only

Canonical trajectory missing; showing coarse derived order only.

patch written
Patch captured
#1

Stet captured agent.patch for this trial.

validation
Tests failed
#2
equivalence
Equivalence judgment
#3

non_equivalent

code review
Code review judgment
#4

fail

decision
Final decision
#5

fail_high_conf

Quality

equivalence
non_equivalent
88% confidence
code review
fail
4 findings
footprint
low (0.15)
behavioral
50.0%
cost
$0.32 · 523K

Equivalence Reasoning

behavioral

The patch improves several messages, but it misses core intent around consistent Hebrew grammar/localization. In particular, `invalid_format` fallback always uses masculine (`לא תקין`) and can produce awkward forms like `הכתובת אימייל לא תקין` instead of gender-aware phrasing. It also leaves some key-related/container labels potentially unlocalized (`issue.origin` raw values), so the “localized type labels + consistent gender-aware wording” requirement is not fully satisfied.

Code Review

correctness: 1/4introduced bug risk: 1/4edge case handling: 1/4maintainability idioms: 2/4

The patch makes substantial Hebrew text changes but likely does not satisfy the intended locale behavior: multiple canonical messages were changed beyond requested refinement, and at least one new edge-case fragility was introduced.

4 findings
invalid_type message format diverges from expected locale contract
major

The patch changes `invalid_type` to `הסוג הצפוי הוא ...` instead of the expected `קלט לא תקין: צריך להיות ..., התקבל ...`, which is likely to fail existing locale tests and breaks consistency with other issue messages.

packages/zod/src/v4/locales/he.ts:113
Core issue codes return different semantics than intended
major

`invalid_union`, `invalid_key`, and `invalid_element` were rewritten to new meanings (including container/key-specific text), while the task expected specific Hebrew wording refinements for existing codes. This is a behavioral change beyond phrasing and likely mismatches tests.

packages/zod/src/v4/locales/he.ts:161
invalid_element assumes presence of issue.key
major

The new `invalid_element` path always interpolates `issue.key` (`util.stringifyPrimitive(issue.key)`), which can produce awkward/incorrect output if the key is absent or not meaningful for the origin.

packages/zod/src/v4/locales/he.ts:166
invalid_value multi-option phrasing remains non-natural and inconsistent
major

For multiple values, the patch still joins with `|` via `util.joinValues`, which does not implement the requested natural Hebrew alternatives formatting (`או` structure).

packages/zod/src/v4/locales/he.ts:118