discovery and core sketch for ingest snapshot

This commit is contained in:
2026-05-25 01:06:43 -06:00
parent ea73f4814f
commit af3e18758b
3 changed files with 161 additions and 0 deletions
@@ -0,0 +1,29 @@
# Ingest Snapshot Context
**Snapshot Ingest**: the deterministic intake of one upstream bundle snapshot into stable per-run recovery artifacts.
_Avoid_: import step, parse pass
**Run Identity**: the deterministic identity for one ingest run, derived from the upstream snapshot identity rather than manually assigned.
_Avoid_: ad hoc run id, operator-chosen id
**Segment Record**: one deterministic ingest-level code unit produced from any AST slice boundary that can be proven stably.
_Avoid_: chunk, guessed module
**Canonical Source Projection**: the normalized recovered code emitted from ingest for downstream phases to consume as source-of-truth input.
_Avoid_: formatted bundle, pretty output
## Example dialogue
> **Developer:** "Can Snapshot Ingest continue if the bundle does not parse?"
> **Domain Expert:** "No. Snapshot Ingest hard-stops because no trustworthy Segment Records can be emitted."
>
> **Developer:** "Who chooses the Run Identity?"
> **Domain Expert:** "The system derives Run Identity deterministically from the upstream snapshot identity."
>
> **Developer:** "What counts as a Segment Record?"
> **Domain Expert:** "Any AST slice boundary we can prove deterministically, not only wrapper modules."
## Flagged ambiguities
- "Module" is too narrow for this context because ingest may emit deterministic AST-slice boundaries that do not correspond to a bundler module wrapper.
- "Run ID" previously sounded operator-provided; resolved term is `Run Identity`, which is deterministically derived.
@@ -0,0 +1,51 @@
# Slice Discovery: Deterministic Bundle Ingest
- Bounded context: `ingest-snapshot`
- Workflow slug: `deterministic-bundle-ingest`
## Happy Path
- Operator selects one upstream bundle snapshot for recovery.
- The workflow derives `Run Identity` deterministically from the upstream snapshot identity.
- The bundle parses successfully.
- The workflow detects deterministic AST slice boundaries for the snapshot.
- For each proven segment boundary, the workflow emits a `Segment Record` with source slice, AST node type, canonical source projection, and stable hashes.
- The workflow emits machine-readable ingest artifacts for the run, including `manifest.json` and `segments.jsonl`.
- The workflow may emit a human-readable summary, but success is defined by the machine artifacts.
- Downstream contexts consume the emitted manifest, segment records, and canonical source projection as the source of truth for later phases.
## Edge Cases
- Bundle does not parse -> hard stop the slice because no trustworthy ingest artifacts can be emitted.
- An apparent boundary cannot be proven deterministically -> do not emit it as a separate segment record; keep only proven AST slice boundaries.
- Identical upstream snapshot is ingested again -> derive the same run identity inputs and emit the same deterministic machine artifacts.
- Previous run manifest is available but continuity is weak -> ingest may observe it for continuity hints, but the current run artifacts still come only from the current snapshot and deterministic ingest rules.
- Human-readable summary generation fails -> slice still succeeds if machine-readable artifacts were emitted correctly.
## Business Rules & Invariants
- Rule: `Run Identity` is derived deterministically from upstream snapshot identity rather than chosen manually.
- Rule: A `Segment Record` may come from any deterministic AST slice boundary that can be proven stably.
- Rule: Ingest emits machine-readable artifacts as the source of truth for later phases.
- Rule: Human-readable summary output is optional relative to core ingest success.
- Invariant: If the bundle cannot be parsed, ingest hard-stops rather than emitting speculative artifacts.
- Invariant: The workflow does not guess segment boundaries that it cannot prove deterministically.
- Invariant: Identical upstream snapshot inputs produce identical deterministic ingest outputs.
## Required Decisions Owned by This Context
- Whether the selected upstream snapshot is parseable enough to begin deterministic ingest.
- Which AST slice boundaries are proven enough to become `Segment Records`.
- Which machine-readable artifacts are required for ingest success.
- How deterministic run identity is derived from upstream snapshot identity.
## Handoff Assumptions
- `dependency-recovery` receives `manifest.json`, `segments.jsonl`, and canonical source projection as the authoritative ingest outputs.
- `static-context-evidence` receives stable segment records whose boundaries were decided only inside `ingest-snapshot`.
- `release-packaging` may later consume upstream snapshot identity recorded in the run manifest.
- Cross-run continuity hints from a previous manifest do not override the current slice's deterministic ingest decisions.
## Open Questions
- None currently inside this slice; broader build verification and publication-seam questions remain feature-level concerns.
@@ -0,0 +1,81 @@
# Core Sketch: Deterministic Bundle Ingest
- Bounded context: `ingest-snapshot`
- Workflow slug: `deterministic-bundle-ingest`
## Command
- `IngestUpstreamSnapshot`
- Meaning: start `Snapshot Ingest` for one selected upstream bundle snapshot so the recovery pipeline has deterministic source-of-truth artifacts for later phases.
## Required State
State owned by `ingest-snapshot` and required to decide this workflow:
- `SelectedSnapshot`
- upstream snapshot identity
- upstream bundle location or source reference
- optional upstream metadata intended for later release provenance
- `RunIdentityRules`
- deterministic derivation rules from upstream snapshot identity
- collision policy for repeated ingest of the same snapshot identity
- `SegmentBoundaryRules`
- deterministic AST slice boundary rules
- proof rules for when a candidate boundary is strong enough to become a `Segment Record`
- `IngestArtifactRequirements`
- required machine artifacts for success: `Run Manifest`, `segments.jsonl`, and `Canonical Source Projection`
- optional human-readable summary artifact
## Observed Inputs
Snapshots or handoffs read but not owned by this context:
- optional previous `Run Manifest` reference used only for continuity hints
- upstream metadata needed later by `release-packaging`
## Policy Signature (Pseudo)
```text
deriveRunIdentity : SelectedSnapshot -> RunIdentity
validateSnapshotIngest :
IngestUpstreamSnapshot -> SelectedSnapshot -> Result<SnapshotReady, IngestRejected>
decideSegmentBoundaries :
SnapshotReady -> SegmentBoundaryRules -> Result<NonEmptyList<SegmentRecord>, IngestRejected>
validateRequiredArtifacts :
RunIdentity -> SegmentRecords -> IngestArtifactRequirements -> Result<IngestArtifactsReady, IngestRejected>
performSnapshotIngest :
IngestUpstreamSnapshot
-> IngestSnapshotState
-> Result<UpstreamSnapshotIngested, SnapshotIngestHardStopped>
```
## Events
### Success Event
- `UpstreamSnapshotIngested`
- run identity
- upstream snapshot identity
- emitted `Run Manifest` reference
- emitted `Segment Record` set reference
- emitted `Canonical Source Projection` reference
- optional summary reference
### Failure Event
- `SnapshotIngestHardStopped`
- upstream snapshot identity when available
- failure reason
- failed stage such as parse failure, boundary proof failure, or required artifact failure
## Boundary Notes
- The `ingest-snapshot` context decides only whether one snapshot can be deterministically ingested and which boundaries become `Segment Records`.
- It does not decide package identity, dependency externalization, context heuristics, lineage matching, naming, regularization, transform replay, or release publication.
- Observing a previous `Run Manifest` does not let this slice reuse or override current ingest decisions; cross-run lineage belongs to `snapshot-lineage`.
- Human-readable summary generation is outside the hard success contract for this slice; required machine artifacts remain the source of truth.
- Feature-level orchestration decides when later phases may continue after downstream review-needed states; this slice only hard-stops when deterministic ingest itself is not trustworthy.