flux-pr-1169

graphql-go-tools (Go) · W1 · GPT-5.1 Codex Mini

graphql-go-tools (Go)sqlparser-rs Zod (TypeScript)

fail_guardrail

Tests: unknown. 0/1 commands passed. Strength: missing.

57.1% run pass rate

Tier 1

guardrail preflight failedprimary equivalenceunknownnon equivalentneeds generated testsweak signal riskall commands ignoredunknown no gold pass commands

go test -C v2 ./... -count=1 -timeout=300s

gold unknownagent —

Partial score: 0/0

Publishable: yesWeak signal risk: yes

Trajectory

codex · partial order only

provider-native trajectory captured; validation and decision steps are appended with coarse ordering only

session start

Session started

results

tool call

Command started

shell command

results

tool result

Command finished

shell command exit code 0

results

tool call

Command started

shell command

results

tool result

Command finished

shell command exit code 0

results

tool call

Command started

shell command

results

tool result

Command finished

shell command exit code 1

results

tool call

Command started

shell command

results

tool result

Command finished

shell command exit code 1

results

tool call

Command started

#10

shell command

results

tool result

Command finished

#11

shell command exit code 0

results

tool call

Command started

#12

shell command

results

tool result

Command finished

#13

shell command exit code 0

results

tool call

Command started

#14

shell command

results

tool result

Command finished

#15

shell command exit code 0

results

tool call

Command started

#16

shell command

results

tool result

Command finished

#17

shell command exit code 0

results

tool call

Command started

#18

shell command

results

tool result

Command finished

#19

shell command exit code 0

results

tool call

Command started

#20

shell command

results

tool result

Command finished

#21

shell command exit code 0

results

tool call

Command started

#22

shell command

results

tool result

Command finished

#23

shell command exit code 0

results

tool call

Command started

#24

shell command

results

tool result

Command finished

#25

shell command exit code 0

results

tool call

Command started

#26

shell command

results

tool result

Command finished

#27

shell command exit code 0

results

tool call

Command started

#28

shell command

results

tool result

Command finished

#29

shell command exit code 0

results

tool call

Command started

#30

shell command

results

tool result

Command finished

#31

shell command exit code 0

results

tool call

Command started

#32

shell command

results

tool result

Command finished

#33

shell command exit code 0

results

tool call

Command started

#34

shell command

results

tool result

Command finished

#35

shell command exit code 0

results

tool call

Command started

#36

shell command

results

tool result

Command finished

#37

shell command exit code 0

results

tool call

Command started

#38

shell command

results

tool result

Command finished

#39

shell command exit code 0

results

tool call

Command started

#40

shell command

results

tool result

Command finished

#41

shell command exit code 0

results

tool call

Command started

#42

shell command

results

tool result

Command finished

#43

shell command exit code 0

results

tool call

Command started

#44

shell command

results

tool result

Command finished

#45

shell command exit code 0

results

tool call

Command started

#46

shell command

results

tool result

Command finished

#47

shell command exit code 1

results

tool call

Command started

#48

shell command

results

patch written

Patch captured

#49

Flux captured agent.patch for this trial

agent.patch

validation

Validation recorded

#50

validation

equivalence

Equivalence judgment

#51

non_equivalent

validation

code review

Code review judgment

#52

fail

task detail

decision

Final decision

#53

fail_guardrail

task detail

Quality

equivalence

non_equivalent

82% confidence

code review

fail · 30/100

3 findings

footprint

low (0.17)

behavioral

—

cost

$2.67 · 9.2M

Equivalence Reasoning

behavioral

Code Review

correctness: 1/4edge case handling: 1/4introduced bug risk: 1/4maintainability idioms: 2/4

The agent patch partially updates message text and locations, but it likely does not satisfy the intended PR: key spec-compliant field-selection behaviors and the new centralized Apollo validation-failed mechanism are not correctly implemented.

3 findings

Enum leaf fields with sub-selections are still accepted

major

Validation only rejects sub-selections for scalar fields (`NodeKindScalarTypeDefinition`) and does not reject enum fields with selections, which should also be treated as leaf types.

v2/pkg/astvalidation/operation_rule_validate_field_selections.go:58

Missing-subfield error uses wrong type context

major

The call passes `typeName` (enclosing type) into `ErrMissingFieldSelectionOnNonScalar`, but the spec-style message expects the field's return type (e.g., `User`, `User!`, `[User]`). This can produce incorrect error text.

v2/pkg/astvalidation/operation_rule_validate_field_selections.go:61

Centralized validation-failed mechanism not implemented as intended

major

Status/extension codes are applied in `Validate` only when `ReplaceUndefinedOpFieldError` is set, rather than via a centralized walker-level external-error hook with the new dedicated compatibility flag. This keeps old flag semantics and misses the intended architecture change.

v2/pkg/astvalidation/operation_validation.go:90

Evidencetask_detail (—)trajectory (15.0 KB)validation (115.5 KB)results (58.5 KB)run_metadata (1.6 KB)agent_patch (12.3 KB)summary (94.5 KB)manifest (495 B)