• …rd, coverage reconciliation, testdb halt)
    
    Post-implementation multi-agent review (6 dims + per-finding adversarial verify) found the control-flow sound and the goal mostly-achieved; this lands the three deterministic, low-risk fixes among the confirmed findings.
    
    - build-failed short-circuit (must-fix): behaviorSubGate now validates the LLM's classification before green-by-skip — requires non-empty rootCausePath AND no interaction/sentinel hard issues riding along; a "dirty" build-failed goes through adjudicate(allowContinue:false) retry/halt instead of silently approving. The skeleton (lazy router + FeStub) makes legit sibling-unimpl build breaks rare, so a build-failed is more likely a real in-FE shared-code regression — the boundary comment §107-108 claimed load-bearing but had zero JS enforcement.
    
    - coverage reconciliation backstop (§3.6): empty-coverage only guarded ==0; now 0<routesReached<routesPlanned with the shortfall unexplained by route-level coverageGaps -> adjudicate(allowContinue:false). Closes the partial-coverage false-green. Counts route-level reasons only; over-counting can only suppress the gate, never false-halt.
    
    - test-DB guard direct-halt now implemented: TESTDB_GUARD_MARK in envError.detail + behaviorTestDbGuardTripped -> immediate HALT on the first result, honoring step2's "no retry, no adjudication" promise (previously prompt-only; a guard trip wrongly entered the generic stack-not-ready retry path, ~5 redundant setup-test-db.mjs runs).
    
    Left as accepted design trade-offs (documented in design §12): self-attested whitelist/route scope, soft-by-source non-data text, disabled-control should-work recovery limited to submit buttons.
    
    Verified: SYNTAX_OK (wrapped check), 87/87 lib tests pass, no time/random builtins. v2 design doc updated (§3/5/6.2/6.3 + new §12).
    zichun authored
     
    Browse Code »
  • Replaces the phase-level read-only behavior-gate with a per-FE acceptance dimension: each FE is approved only when the code-reviewer approves AND runtime behavior verification is green. Behavior defects (dead control / sentinel text mismatch) become fixable must-fix that drive verify->fix->re-verify, not halts.
    
    - reviewWithFixLoop (frontend only, via if(fe)): at the approve gate, behaviorSubGate boots this FE's full stack + seeds sentinels, enumerates this FE's routes, two-tier asserts. Hard issues with a locator -> fixPrompt -> functional reverify -> next behaviorRound; soft text (i18n/literal/semantic) -> adjudicate(continue); behaviorRound bounded by BEHAVIOR_FE_MAX=3, env race by BEHAVIOR_ATTEMPT_MAX=2. Backend featureLoop branch unchanged.
    
    - New runFrontendSkeleton stage (before featureLoop(frontend)): App shell + full lazy router + FeStub placeholders + shared nav, so the app is buildable at every mid-phase point; tdd swaps FeStub->real component per FE. Idempotent via fe-skeleton-done tag.
    
    - BEHAVIOR_GATE_SCHEMA gains build-failed envError kind (sibling-FE-unimpl short-circuit, not a bug) + locator-not-resolvable coverage reason; deriveSpec emits a per-FE route-scope section, reviewer validates it.
    
    - Removed phase-level runBehaviorGate + 'Behavior' phase; kept phase-level testGate (regression). REVIEW_HARD_ROUNDS 8->10.
    
    - Safety: test-DB naming guard pushed into scripts-setup-test-db.mjs template (fail-closed unless name contains test/_dev/_local or ALLOW_NONTEST_DROP=1) + 3 tests.
    
    - agentType stays erp-workflow:code-reviewer. v1 design doc marked SUPERSEDED; v2 design at docs/design/2026-06-02-frontend-behavior-in-review-loop.md.
    
    Verified: wrapped syntax check SYNTAX_OK, 87/87 lib tests pass, no orphan refs, no time/random builtins, top-level return intact. Not yet run end-to-end against a real ERP project.
    zichun authored
     
    Browse File »
  • New 'Behavior' stage between Gate and Milestone, frontend-phase only, after testGate green and before report/milestone. Verifies every interactive control actually works and every text region shows the right content, independent of the tests the tdd agent wrote.
    
    - BEHAVIOR_GATE_SCHEMA / behaviorGateContract() / behaviorGatePrompt() / runBehaviorGate(); reportPrompt now gates the milestone on behavior-gate evidence (frontend-phase-behavior-gate-r*.md, last attempt must be non-RED).
    
    - Two-tier failure: interaction defects (incl. binding-garbage) flake-retry once then hard-halt via adjudicate(allowContinue:false); text issues split by source (sentinel=objective -> no continue; i18n/literal/semantic=adjudicable). Convergence loop re-runs the env + interaction hard gates after any text-layer retry so a refreshed result can't slip a green past the hard gates.
    
    - Full-stack seeded run: test-DB name guard (deterministic halt, not adjudicated), strict 4-phase ordering (empty DB -> boot backend so Flyway builds schema -> seed -> boot frontend), auth bootstrap via storageState, router-config-driven route discovery, ephemeral .tmp/behavior-gate runner with finally teardown, type-legal per-field-unique sentinels.
    
    - agentType uses the plugin-namespaced 'erp-workflow:code-reviewer' (a bare 'code-reviewer' is ambiguous with feature-dev:code-reviewer); README + agents/code-reviewer.md aligned (frontmatter name: stays bare).
    
    - Design: docs/design/2026-06-02-frontend-behavior-gate.md. README + coding-start banner updated.
    zichun authored
     
    Browse File »