STET

flux-commit-0064304a

Zod (TypeScript) · W2 · GPT-5.3 Codex

fail_high_conf

Tests failed. 1/9 commands passed. Strength: strong.

69.2% run pass rate
Tier 1
primary testsfailedcommand source drifttime budget exhaustednon equivalentfail
yarn test -- --runInBand
gold passagent pass
pytest -q tests/behavior/test_index_exports_lowercase_infer_alias_only.py
gold passagent fail
pytest -q tests/behavior/test_readme_parse_deep_clone_wording.py
gold passagent fail
pytest -q tests/behavior/test_readme_prefers_lowercase_infer_everywhere.py
gold passagent fail
pytest -q tests/behavior/test_index_exports_lowercase_infer_alias.py
gold passagent fail
pytest -q tests/behavior/test_readme_prefers_lowercase_infer.py
gold passagent fail
pytest -q tests/behavior/test_readme_records_section.py
gold passagent unknown
pytest -q tests/behavior/test_readme_parse_deep_clone.py
gold unknownagent
pytest -q tests/behavior/test_readme_records_with_infer.py
gold unknownagent

Partial score: 1/6

Publishable: yesCache: miss

Trajectory

unknown · partial order only

Canonical trajectory missing; showing coarse derived order only.

patch written
Patch captured
#1

Stet captured agent.patch for this trial.

validation
Tests failed
#2
equivalence
Equivalence judgment
#3

non_equivalent

code review
Code review judgment
#4

fail

decision
Final decision
#5

fail_high_conf

Quality

equivalence
non_equivalent
92% confidence
code review
fail
3 findings
footprint
high (1.00)
behavioral
16.7%
cost
$1.62 · 600K

Equivalence Reasoning

behavioral

The patch appears to add only the README table-of-contents link for `Records` (without the required parsing deep-clone clarification, record-vs-object documentation, and widespread `z.infer` doc updates) and adds many unrelated `lib/` build artifacts. While it does include an `infer` export in generated typings, it does not clearly implement the intended source-level/documentation changes end-to-end.

Code Review

correctness: 1/4introduced bug risk: 0/4edge case handling: 1/4maintainability idioms: 0/4

The agent patch likely does not satisfy the intended change: it appears incomplete on required README/source updates and is dominated by unrelated generated `lib` file additions, so resolution probability is low.

3 findings
README task scope is largely unimplemented
major

The visible README diff only adds a Records link in the table of contents; it does not show the required parse deep-clone wording, records section content, or broad infer-reference replacements.

README.md:25
Changes target compiled lib outputs instead of source files
major

The patch adds `lib/src/index.*` files (including lowercase `infer` alias there) rather than demonstrating the corresponding required source-level update in `src/index.ts`, so expected behavior/tests may remain unmet.

lib/src/index.d.ts:41
Patch includes large unrelated generated artifacts
major

Multiple new generated files (errors/helpers/maps) are introduced under `lib/src`, creating noise and potential drift from source without clear relation to the requested documentation/export task.

lib/src/ZodError.js:1