diff --git a/design/feature/recovery-pipeline/design.md b/design/feature/recovery-pipeline/design.md new file mode 100644 index 0000000..e17d501 --- /dev/null +++ b/design/feature/recovery-pipeline/design.md @@ -0,0 +1,66 @@ +# Feature Design Map: Recovery Pipeline + +## Bounded Contexts + +- `ingest-snapshot` — owns deterministic upstream bundle ingest, segment boundaries, canonical source projection, and run manifests. +- `dependency-recovery` — owns vendored package identification, dependency decisions, externalization, and bundled fallback preservation. +- `static-context-evidence` — owns deterministic context packets, binding graphs, and usage evidence for downstream consumers. +- `snapshot-lineage` — owns adjacent-run matching, durable lineage, change classification, relabel eligibility, and upstream summary facts. +- `iterative-naming` — owns relabel queue planning, batch execution handoff, wave reconciliation, safe rename acceptance, and naming memory updates. +- `codebase-regularization` — owns deterministic file placement, structural splitting, import/export reconstruction, and canonical editable tree emission. +- `maintained-transform-replay` — owns replay of long-lived maintained transforms and replay conflict reporting. +- `release-packaging` — owns release artifact assembly, provenance manifests, and publication-ready outputs. + +## Feature Step to Workflow Slice Map + +| Feature Step | Bounded Context | Workflow Slice | Notes | +| :----------- | :-------------- | :------------- | :---- | +| Ingest upstream bundle snapshot into deterministic recovery artifacts | `ingest-snapshot` | `deterministic-bundle-ingest` | Produces the canonical per-run source of truth used by all later slices. | +| Identify vendored package boundaries and confidence decisions | `dependency-recovery` | `identify-vendored-packages` | Consumes ingest artifacts and records accepted, rejected, and unresolved dependency decisions. | +| Replace accepted vendored packages with external dependencies while keeping fallbacks | `dependency-recovery` | `externalize-accepted-dependencies` | Depends on identified package decisions; unresolved packages stay bundled. | +| Extract deterministic context packets for each segment | `static-context-evidence` | `extract-segment-context` | Consumes ingest output after dependency treatment to emit machine-readable evidence. | +| Compare adjacent runs and classify lineage-aware changes | `snapshot-lineage` | `diff-adjacent-runs` | Consumes current and previous run manifests plus Phase 3 context. | +| Rank relabel candidates into deterministic queue packets | `iterative-naming` | `plan-relabel-queue` | Uses only new and modified segments from snapshot-lineage. | +| Execute queued relabel batches against the model provider in waves | `iterative-naming` | `execute-wave-batches` | Owns outbound API execution only; no naming decisions are applied here. | +| Evaluate responses, accept safe names, and update queue state | `iterative-naming` | `evaluate-and-apply-renames` | Reconciles at wave boundary and updates naming memory. | +| Emit the canonical editable recovered tree | `codebase-regularization` | `regularize-editable-tree` | Must preserve build-first while improving navigability. | +| Replay long-lived maintained transforms onto the regularized tree | `maintained-transform-replay` | `replay-maintained-transforms` | Carries durable local changes across upgrades. | +| Build release artifacts and publication metadata | `release-packaging` | `build-and-publish-artifacts` | Packages processed and unmodified artifacts for traceable release output. | + +## Cross-Context Handoffs + +- `ingest-snapshot` -> `dependency-recovery` via run manifest, segments, and canonical projection because vendored matching starts from deterministic ingest evidence. +- `ingest-snapshot` -> `static-context-evidence` via stable segment records because context extraction depends on canonical segment boundaries. +- `dependency-recovery` -> `static-context-evidence` via accepted externalization decisions and preserved fallbacks because context packets must describe the post-decision code surface. +- `static-context-evidence` -> `snapshot-lineage` via deterministic context packets because fuzzy matching and summary facts need machine-readable evidence. +- `snapshot-lineage` -> `iterative-naming` via relabel-eligible changed/new segments and ambiguity reports because only safe changed material should enter naming work. +- `iterative-naming` -> `codebase-regularization` via safely renamed generated source and naming memory because regularization should operate on the best accepted recovered names. +- `codebase-regularization` -> `maintained-transform-replay` via canonical editable tree and placement mappings because replay targets the regularized tree, not the pre-regularized source. +- `maintained-transform-replay` -> `release-packaging` via replay outcomes and transformed tree state because releases must reflect which maintained transforms were applied, skipped, or conflicted. + +## Recommended Slice Order + +1. `ingest-snapshot/deterministic-bundle-ingest` — all later slices depend on deterministic ingest artifacts and canonical segment boundaries. +2. `dependency-recovery/identify-vendored-packages` — shrinks the app-authored surface before later evidence and naming work. +3. `dependency-recovery/externalize-accepted-dependencies` — completes dependency treatment before downstream evidence extraction. +4. `static-context-evidence/extract-segment-context` — provides deterministic evidence used by diffing, summaries, and transform anchoring. +5. `snapshot-lineage/diff-adjacent-runs` — identifies changed/new material and durable lineage needed for iterative naming. +6. `iterative-naming/plan-relabel-queue` — transforms changed material into deterministic naming work packets. +7. `iterative-naming/execute-wave-batches` — executes persisted batches without applying names yet. +8. `iterative-naming/evaluate-and-apply-renames` — applies only accepted names after wave reconciliation. +9. `codebase-regularization/regularize-editable-tree` — emits the canonical browsable tree once safe names are available. +10. `maintained-transform-replay/replay-maintained-transforms` — reapplies durable local changes onto the regularized tree. +11. `release-packaging/build-and-publish-artifacts` — packages the final tree and release metadata last. + +## Orchestration Notes + +- The feature-level pipeline is linear by default, but review-needed findings do not automatically halt later safe slices in MVP. +- `iterative-naming` contains three slices inside one bounded context; only wave orchestration crosses those slice boundaries. +- Cross-context decisions stay at handoff seams: each slice makes decisions only over state owned by its context. +- Build-first remains the feature-level acceptance rule, especially across `codebase-regularization`, `maintained-transform-replay`, and `release-packaging`. + +## Open Questions + +- `static-context-evidence` will consume post-externalization source as its canonical input; if pre-externalization review becomes necessary later, treat it as a secondary review artifact rather than the main slice input. +- The release docs imply publication is optional; the exact publication handoff seam inside `release-packaging` is still open. +- Build verification is a hard invariant, but the repository-wide command set for that verification is not yet frozen in the design artifacts. diff --git a/design/feature/recovery-pipeline/discovery.md b/design/feature/recovery-pipeline/discovery.md new file mode 100644 index 0000000..792ae83 --- /dev/null +++ b/design/feature/recovery-pipeline/discovery.md @@ -0,0 +1,95 @@ +# Feature Discovery: Recovery Pipeline + +## 1. Commands (User Intents) + +- Pipeline operator wants to ingest an upstream bundle snapshot because they need a deterministic base for recovery work. +- Pipeline operator wants to identify and externalize vendored dependencies because they want to shrink the app-authored surface that later phases must understand. +- Pipeline operator wants to extract deterministic context because later phases need machine-readable evidence without relying on an LLM as the source of truth. +- Pipeline operator wants to diff the current snapshot against the previous snapshot because they want durable lineage, compact upstream summaries, and to avoid resending unchanged material for naming. +- Pipeline operator wants to iteratively relabel changed and new code because they want a more browsable recovered tree with readable names across modules, functions, locals, and parameters. +- Pipeline operator wants to regularize recovered output into a canonical editable tree because they care most about a browsable codebase. +- Pipeline operator wants the recovered tree to build because buildability is the current hard success invariant. +- Pipeline operator wants uncertain areas surfaced in manifests and reports because uncertainty should not block MVP progress. +- Pipeline operator wants manual runtime rescue patches captured as formal maintained transforms because repeated upgrades should become replayable. +- Pipeline operator wants to publish processed and unmodified artifacts with provenance because releases should remain traceable to the upstream snapshot. + +## 2. Events (Domain Facts) + +- Upstream snapshot ingested (payload: run ID, upstream snapshot identity, emitted manifest, emitted segments). +- Dependency candidate identified (payload: candidate package, evidence, recovered segment boundary). +- Dependency decision recorded (payload: accepted|rejected|unresolved, confidence, rationale, fallback reference). +- Context packet extracted (payload: segment ID, bindings, links, evidence, heuristics). +- Run diff completed (payload: unchanged|modified|new|deleted|split|merged|ambiguous classifications, lineage updates). +- Relabel candidate queued (payload: candidate ID, pass kind, evidence score, difficulty score, priority score). +- Batch wave executed (payload: wave ID, batch IDs, model/config, execution outcomes). +- Rename proposal evaluated (payload: accepted|deferred|stalled|exhausted outcomes, rejection reasons, counters). +- Accepted names applied (payload: candidate fields renamed, updated source/metadata, naming-memory updates). +- Regularized tree emitted (payload: canonical repo-root tree, regularization manifest, placement mappings). +- Review-needed artifact emitted (payload: phase, machine-readable report, concise human summary). +- Maintained transform replayed (payload: applied|conflict|skipped outcome, transform metadata, replay report). +- Release artifact set emitted (payload: processed-source artifact, unmodified-source artifact, release manifest, release notes). + +## 3. Business Rules & Invariants + +- Rule: The repo root is always the latest canonical editable recovered tree. +- Rule: Per-run artifacts, evidence, queue state, and review reports live under `runs/`. +- Rule: Buildability outranks readability; risky naming or regularization must not be accepted if it jeopardizes correctness. +- Rule: Runtime completeness is desirable but not required for MVP progression if the output still builds and remains browsable. +- Rule: Uncertainty should be surfaced in manifests and reports instead of silently guessed away. +- Rule: For MVP, review-needed states should not halt the entire pipeline if later phases can proceed safely. +- Rule: Later phases must consume deterministic machine-readable artifacts as source of truth. +- Rule: LLM output may assist naming and ambiguous ranking, but must not become the source of truth for deterministic structure, matching, or safety decisions. +- Rule: The root recovered tree is generated, not hand-maintained between runs. +- Rule: Upgrades should start from raw ingest, reuse deterministic prior evidence where valid, then replay maintained transforms. +- Rule: If manual fixes are needed because the code is not runnable, those fixes should become formal Phase 9 maintained transforms. +- Invariant: Build-first is the current formal verification bar for successful regularization/publishing. +- Invariant: If a more navigable regularization attempt breaks the build, the failed attempt must be surfaced for review rather than silently degraded. +- Invariant: Review surfacing must include both machine-readable artifacts and concise human-readable summaries. + +## 4. Edge Cases Handled + +- Case: Dependency match confidence is low or colliding -> record as unresolved or review-needed instead of forcing externalization. +- Case: Vendored replacement may drift from bundled behavior -> preserve bundled fallback implementations for validation and safety. +- Case: Diff matching remains contested -> emit `ambiguous` artifacts and exclude those segments from automated lineage-dependent actions. +- Case: Rename candidates lack sufficient evidence -> keep them visible in queue state, defer and retry deterministically, then allow terminal `stalled` or `exhausted` outcomes rather than retrying forever. +- Case: Model response is low confidence, insufficiently specific, invalid, or collision-prone -> reject deterministically and feed structured reasons back into queue state. +- Case: A more readable split or placement would make the tree fail to build -> surface the failed regularization attempt for review. +- Case: Runtime behavior is incomplete after recovery -> allow manual rescue patches, but capture durable fixes as maintained transforms when they must persist across upgrades. +- Case: Publication fails after artifacts are built -> keep local built artifacts and separate publication failure from build failure. +- Case: Review-needed findings appear in MVP -> continue later safe phases while recording artifacts for later inspection. + +## 5. Candidate Bounded Contexts + +- Ingest & Snapshot Evidence: owns deterministic bundle ingest, segment records, and canonical projections. +- Dependency Recovery: owns vendored package identification, confidence decisions, externalization, and fallback preservation. +- Static Context Evidence: owns deterministic context extraction artifacts and evidence packets. +- Snapshot Lineage & Change Detection: owns run-to-run matching, lineage, change classification, and upstream summaries. +- Iterative Naming: owns relabel queue planning, batch execution handoff, semantic acceptance, safe rename application, and naming memory. +- Codebase Regularization: owns deterministic file/folder placement, structural splitting, import/export reconstruction, and editable-tree emission. +- Maintained Transform Replay: owns deterministic replay of long-lived transforms and replay conflict reporting. +- Release Packaging: owns artifact packaging, provenance manifests, and optional publication. + +## 6. Candidate Workflow Slices + +- ingest-snapshot/deterministic-bundle-ingest: turn an upstream bundle into deterministic segment records and canonical source projection. +- dependency-recovery/identify-vendored-packages: score dependency candidates and recover package boundaries. +- dependency-recovery/externalize-accepted-dependencies: replace accepted vendored code with npm imports while preserving fallbacks. +- static-context-evidence/extract-segment-context: emit canonical context packets and binding/link evidence. +- snapshot-lineage/diff-adjacent-runs: classify changes, mint lineage, and produce relabel queues plus upstream summaries. +- iterative-naming/plan-relabel-queue: compute candidate evidence, difficulty, priority, and batch-ready work items. +- iterative-naming/execute-wave-batches: send persisted batch artifacts to the model provider in parallel waves. +- iterative-naming/evaluate-and-apply-renames: validate wave results, accept safe names, update queue state, and refresh naming memory. +- codebase-regularization/regularize-editable-tree: produce the canonical repo-root tree with deterministic placement and mappings. +- maintained-transform-replay/replay-maintained-transforms: apply stored transforms safely and emit replay outcomes. +- release-packaging/build-and-publish-artifacts: package processed and unmodified artifacts with release metadata. + +## 7. Shared Language Notes + +- Preferred term: Recovery Pipeline = the full release-oriented workflow that turns an upstream bundle snapshot into a buildable, browsable recovered tree plus release artifacts. +- Preferred term: Recovered Tree = the canonical editable source tree emitted at repo root. +- Preferred term: Build-first = the current formal invariant that the recovered tree must build even if runtime completeness is still partial. +- Preferred term: Review-needed artifact = a machine-readable report plus concise human summary describing uncertainty, failure, or conflict that requires later inspection. +- Preferred term: Maintained Transform = a durable replayable change stored outside the numbered upstream-processing pipeline and reapplied in Phase 9. +- Preferred term: Naming Memory = accepted-name history reused to improve future relabel iterations. +- Avoid: “original repo layout” when you mean the deterministic regularized editable tree. +- Avoid: “runtime complete” when you only mean “buildable and browsable enough to inspect.” diff --git a/design/feature/recovery-pipeline/status.md b/design/feature/recovery-pipeline/status.md new file mode 100644 index 0000000..c49b4c7 --- /dev/null +++ b/design/feature/recovery-pipeline/status.md @@ -0,0 +1,124 @@ +# Design Status: Recovery Pipeline + +## Feature + +- Name: `Recovery Pipeline` +- Feature slug: `recovery-pipeline` +- Current phase: `Context & Workflow Decomposition` +- Overall status: `Decomposition In Progress` +- Security verification status: `Not Started` +- Current workflow slice: `ingest-snapshot/deterministic-bundle-ingest` + +## Feature Artifacts + +- [x] `design/feature/recovery-pipeline/discovery.md` +- [x] `design/feature/recovery-pipeline/design.md` +- [x] `design/feature/recovery-pipeline/status.md` + +## Feature Discovery Gate + +- [x] feature goal and actor intents captured +- [x] commands and events identified at feature level +- [x] business rules and invariants captured at feature level +- [x] edge cases captured at feature level +- [x] candidate bounded contexts identified +- [x] candidate workflow inventory identified +- [x] project-wide shared-language updates captured +- [x] approved for context and workflow decomposition + +## Context & Workflow Decomposition Gate + +- [x] bounded contexts confirmed +- [x] feature steps mapped to workflow slices +- [x] cross-context handoffs recorded +- [x] per-context shared-language files created or updated +- [x] workflow folders created with `01-decomposition.md` +- [x] recommended slice order recorded +- [ ] approved to begin slice discovery + +## Workflow Slice Tracker + +| Bounded Context | Workflow Slice | Slice Discovery | Core Sketch | Blueprint | Design Security | Assembly | Impl Security | Refactor | Notes | +| :-------------- | :------------- | :-------------- | :---------- | :-------- | :-------------- | :------- | :------------ | :------- | :---- | +| `ingest-snapshot` | `deterministic-bundle-ingest` | `Complete` | `Complete` | `Ready` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Foundational source-of-truth slice.` | +| `dependency-recovery` | `identify-vendored-packages` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Shrinks app-authored surface before later phases.` | +| `dependency-recovery` | `externalize-accepted-dependencies` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Depends on package identification decisions.` | +| `static-context-evidence` | `extract-segment-context` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Produces deterministic evidence for downstream consumers.` | +| `snapshot-lineage` | `diff-adjacent-runs` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Owns lineage and changed/new segment routing.` | +| `iterative-naming` | `plan-relabel-queue` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Queue planning only.` | +| `iterative-naming` | `execute-wave-batches` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Outbound model execution only.` | +| `iterative-naming` | `evaluate-and-apply-renames` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Safe deterministic acceptance and application.` | +| `codebase-regularization` | `regularize-editable-tree` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Must preserve build-first invariant.` | +| `maintained-transform-replay` | `replay-maintained-transforms` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Carries maintained changes across upgrades.` | +| `release-packaging` | `build-and-publish-artifacts` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Not Started` | `Release-oriented output only.` | + +## Current Slice Gates + +### Slice Discovery Gate + +- [x] selected slice named explicitly +- [x] happy path captured +- [x] edge cases captured +- [x] business rules and invariants captured +- [x] handoff assumptions captured +- [x] context shared-language updates captured +- [x] approved for core sketch + +### Core Sketch Gate + +- [x] required state is explicit +- [x] command and events are explicit +- [x] policy signature is explicit +- [x] slice boundaries are explicit +- [x] no cross-context decision logic inside the slice +- [x] approved for blueprint + +### Blueprint Gate + +- [ ] domain types make illegal states harder to express +- [ ] shared concepts reused appropriately +- [ ] policy is pure +- [ ] reducer/apply shape is explicit +- [ ] workflow contract is explicit +- [ ] approved for design security review or assembly + +### Design Security Gate + +- [ ] trust boundaries reviewed +- [ ] authority and least privilege reviewed +- [ ] sink and data-flow risks reviewed +- [ ] blocking findings resolved or explicitly accepted +- [ ] approved for assembly + +### Assembly Gate + +- [ ] tests added +- [ ] implementation completed +- [ ] types pass +- [ ] tests passing +- [ ] effect AST checks run for modified Effect files +- [ ] approved for implementation security review or next slice + +### Implementation Security Gate + +- [ ] implementation security review completed or explicitly deferred +- [ ] blocking findings resolved or explicitly accepted +- [ ] approved for refactor consideration or next slice + +### Refactor Gate + +- [ ] diagnosis completed if structural changes were needed +- [ ] execution completed if approved +- [ ] verification rerun after refactor +- [ ] slice complete + +## Open Questions / Blockers + +- Build-first is selected, but the exact build command set is still implementation-specific. +- The release docs imply publication is optional; the exact publication handoff seam inside `release-packaging` is still open. + +## Context Handoff Notes + +- Read first: `design/feature/recovery-pipeline/discovery.md` +- Current focus: `Context & Workflow Decomposition` +- Do not change: `Buildability outranks readability, repo root is the latest editable tree, review-needed states continue in MVP, and uncertainty is surfaced through manifests and reports.`