From af3e18758be447ce440615c7b5db8a0aa015902d Mon Sep 17 00:00:00 2001 From: Elizabeth W Date: Mon, 25 May 2026 01:06:43 -0600 Subject: [PATCH] discovery and core sketch for ingest snapshot --- design/workflows/ingest-snapshot/CONTEXT.md | 29 +++++++ .../02-discovery.md | 51 ++++++++++++ .../03-core-sketch.md | 81 +++++++++++++++++++ 3 files changed, 161 insertions(+) create mode 100644 design/workflows/ingest-snapshot/CONTEXT.md create mode 100644 design/workflows/ingest-snapshot/deterministic-bundle-ingest/02-discovery.md create mode 100644 design/workflows/ingest-snapshot/deterministic-bundle-ingest/03-core-sketch.md diff --git a/design/workflows/ingest-snapshot/CONTEXT.md b/design/workflows/ingest-snapshot/CONTEXT.md new file mode 100644 index 0000000..f602738 --- /dev/null +++ b/design/workflows/ingest-snapshot/CONTEXT.md @@ -0,0 +1,29 @@ +# Ingest Snapshot Context + +**Snapshot Ingest**: the deterministic intake of one upstream bundle snapshot into stable per-run recovery artifacts. +_Avoid_: import step, parse pass + +**Run Identity**: the deterministic identity for one ingest run, derived from the upstream snapshot identity rather than manually assigned. +_Avoid_: ad hoc run id, operator-chosen id + +**Segment Record**: one deterministic ingest-level code unit produced from any AST slice boundary that can be proven stably. +_Avoid_: chunk, guessed module + +**Canonical Source Projection**: the normalized recovered code emitted from ingest for downstream phases to consume as source-of-truth input. +_Avoid_: formatted bundle, pretty output + +## Example dialogue + +> **Developer:** "Can Snapshot Ingest continue if the bundle does not parse?" +> **Domain Expert:** "No. Snapshot Ingest hard-stops because no trustworthy Segment Records can be emitted." +> +> **Developer:** "Who chooses the Run Identity?" +> **Domain Expert:** "The system derives Run Identity deterministically from the upstream snapshot identity." +> +> **Developer:** "What counts as a Segment Record?" +> **Domain Expert:** "Any AST slice boundary we can prove deterministically, not only wrapper modules." + +## Flagged ambiguities + +- "Module" is too narrow for this context because ingest may emit deterministic AST-slice boundaries that do not correspond to a bundler module wrapper. +- "Run ID" previously sounded operator-provided; resolved term is `Run Identity`, which is deterministically derived. diff --git a/design/workflows/ingest-snapshot/deterministic-bundle-ingest/02-discovery.md b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/02-discovery.md new file mode 100644 index 0000000..05f4f08 --- /dev/null +++ b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/02-discovery.md @@ -0,0 +1,51 @@ +# Slice Discovery: Deterministic Bundle Ingest + +- Bounded context: `ingest-snapshot` +- Workflow slug: `deterministic-bundle-ingest` + +## Happy Path + +- Operator selects one upstream bundle snapshot for recovery. +- The workflow derives `Run Identity` deterministically from the upstream snapshot identity. +- The bundle parses successfully. +- The workflow detects deterministic AST slice boundaries for the snapshot. +- For each proven segment boundary, the workflow emits a `Segment Record` with source slice, AST node type, canonical source projection, and stable hashes. +- The workflow emits machine-readable ingest artifacts for the run, including `manifest.json` and `segments.jsonl`. +- The workflow may emit a human-readable summary, but success is defined by the machine artifacts. +- Downstream contexts consume the emitted manifest, segment records, and canonical source projection as the source of truth for later phases. + +## Edge Cases + +- Bundle does not parse -> hard stop the slice because no trustworthy ingest artifacts can be emitted. +- An apparent boundary cannot be proven deterministically -> do not emit it as a separate segment record; keep only proven AST slice boundaries. +- Identical upstream snapshot is ingested again -> derive the same run identity inputs and emit the same deterministic machine artifacts. +- Previous run manifest is available but continuity is weak -> ingest may observe it for continuity hints, but the current run artifacts still come only from the current snapshot and deterministic ingest rules. +- Human-readable summary generation fails -> slice still succeeds if machine-readable artifacts were emitted correctly. + +## Business Rules & Invariants + +- Rule: `Run Identity` is derived deterministically from upstream snapshot identity rather than chosen manually. +- Rule: A `Segment Record` may come from any deterministic AST slice boundary that can be proven stably. +- Rule: Ingest emits machine-readable artifacts as the source of truth for later phases. +- Rule: Human-readable summary output is optional relative to core ingest success. +- Invariant: If the bundle cannot be parsed, ingest hard-stops rather than emitting speculative artifacts. +- Invariant: The workflow does not guess segment boundaries that it cannot prove deterministically. +- Invariant: Identical upstream snapshot inputs produce identical deterministic ingest outputs. + +## Required Decisions Owned by This Context + +- Whether the selected upstream snapshot is parseable enough to begin deterministic ingest. +- Which AST slice boundaries are proven enough to become `Segment Records`. +- Which machine-readable artifacts are required for ingest success. +- How deterministic run identity is derived from upstream snapshot identity. + +## Handoff Assumptions + +- `dependency-recovery` receives `manifest.json`, `segments.jsonl`, and canonical source projection as the authoritative ingest outputs. +- `static-context-evidence` receives stable segment records whose boundaries were decided only inside `ingest-snapshot`. +- `release-packaging` may later consume upstream snapshot identity recorded in the run manifest. +- Cross-run continuity hints from a previous manifest do not override the current slice's deterministic ingest decisions. + +## Open Questions + +- None currently inside this slice; broader build verification and publication-seam questions remain feature-level concerns. diff --git a/design/workflows/ingest-snapshot/deterministic-bundle-ingest/03-core-sketch.md b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/03-core-sketch.md new file mode 100644 index 0000000..e018e13 --- /dev/null +++ b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/03-core-sketch.md @@ -0,0 +1,81 @@ +# Core Sketch: Deterministic Bundle Ingest + +- Bounded context: `ingest-snapshot` +- Workflow slug: `deterministic-bundle-ingest` + +## Command + +- `IngestUpstreamSnapshot` +- Meaning: start `Snapshot Ingest` for one selected upstream bundle snapshot so the recovery pipeline has deterministic source-of-truth artifacts for later phases. + +## Required State + +State owned by `ingest-snapshot` and required to decide this workflow: + +- `SelectedSnapshot` + - upstream snapshot identity + - upstream bundle location or source reference + - optional upstream metadata intended for later release provenance +- `RunIdentityRules` + - deterministic derivation rules from upstream snapshot identity + - collision policy for repeated ingest of the same snapshot identity +- `SegmentBoundaryRules` + - deterministic AST slice boundary rules + - proof rules for when a candidate boundary is strong enough to become a `Segment Record` +- `IngestArtifactRequirements` + - required machine artifacts for success: `Run Manifest`, `segments.jsonl`, and `Canonical Source Projection` + - optional human-readable summary artifact + +## Observed Inputs + +Snapshots or handoffs read but not owned by this context: + +- optional previous `Run Manifest` reference used only for continuity hints +- upstream metadata needed later by `release-packaging` + +## Policy Signature (Pseudo) + +```text +deriveRunIdentity : SelectedSnapshot -> RunIdentity + +validateSnapshotIngest : + IngestUpstreamSnapshot -> SelectedSnapshot -> Result + +decideSegmentBoundaries : + SnapshotReady -> SegmentBoundaryRules -> Result, IngestRejected> + +validateRequiredArtifacts : + RunIdentity -> SegmentRecords -> IngestArtifactRequirements -> Result + +performSnapshotIngest : + IngestUpstreamSnapshot + -> IngestSnapshotState + -> Result +``` + +## Events + +### Success Event + +- `UpstreamSnapshotIngested` + - run identity + - upstream snapshot identity + - emitted `Run Manifest` reference + - emitted `Segment Record` set reference + - emitted `Canonical Source Projection` reference + - optional summary reference + +### Failure Event + +- `SnapshotIngestHardStopped` + - upstream snapshot identity when available + - failure reason + - failed stage such as parse failure, boundary proof failure, or required artifact failure + +## Boundary Notes + +- The `ingest-snapshot` context decides only whether one snapshot can be deterministically ingested and which boundaries become `Segment Records`. +- It does not decide package identity, dependency externalization, context heuristics, lineage matching, naming, regularization, transform replay, or release publication. +- Observing a previous `Run Manifest` does not let this slice reuse or override current ingest decisions; cross-run lineage belongs to `snapshot-lineage`. +- Human-readable summary generation is outside the hard success contract for this slice; required machine artifacts remain the source of truth. +- Feature-level orchestration decides when later phases may continue after downstream review-needed states; this slice only hard-stops when deterministic ingest itself is not trustworthy.