From d15e09504a5007e5f1b47afa935bb1047973e0ad Mon Sep 17 00:00:00 2001 From: Elizabeth W Date: Mon, 25 May 2026 01:28:43 -0600 Subject: [PATCH] design security scan and fixes, basically all trusted vs tainted input --- .../02-discovery.md | 4 +++ .../03-core-sketch.md | 11 +++++-- .../04-blueprint.fs | 31 ++++++++++++++----- .../workflows/ingest-snapshot/shared-model.fs | 24 ++++++++------ docs/reference/shared-language.md | 8 +++-- 5 files changed, 56 insertions(+), 22 deletions(-) diff --git a/design/workflows/ingest-snapshot/deterministic-bundle-ingest/02-discovery.md b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/02-discovery.md index 05f4f08..03a1cdf 100644 --- a/design/workflows/ingest-snapshot/deterministic-bundle-ingest/02-discovery.md +++ b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/02-discovery.md @@ -17,8 +17,10 @@ ## Edge Cases - Bundle does not parse -> hard stop the slice because no trustworthy ingest artifacts can be emitted. +- Bundle exceeds configured size or parse budget -> hard stop the slice before deeper ingest work to reduce resource-exhaustion risk. - An apparent boundary cannot be proven deterministically -> do not emit it as a separate segment record; keep only proven AST slice boundaries. - Identical upstream snapshot is ingested again -> derive the same run identity inputs and emit the same deterministic machine artifacts. +- Previous run manifest is available but not verified -> do not use it for continuity hints. - Previous run manifest is available but continuity is weak -> ingest may observe it for continuity hints, but the current run artifacts still come only from the current snapshot and deterministic ingest rules. - Human-readable summary generation fails -> slice still succeeds if machine-readable artifacts were emitted correctly. @@ -28,6 +30,8 @@ - Rule: A `Segment Record` may come from any deterministic AST slice boundary that can be proven stably. - Rule: Ingest emits machine-readable artifacts as the source of truth for later phases. - Rule: Human-readable summary output is optional relative to core ingest success. +- Rule: Bundle location input must be parsed into a trusted bundle location before ingest uses it. +- Rule: A previous run manifest must be verified before it may influence continuity hints. - Invariant: If the bundle cannot be parsed, ingest hard-stops rather than emitting speculative artifacts. - Invariant: The workflow does not guess segment boundaries that it cannot prove deterministically. - Invariant: Identical upstream snapshot inputs produce identical deterministic ingest outputs. diff --git a/design/workflows/ingest-snapshot/deterministic-bundle-ingest/03-core-sketch.md b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/03-core-sketch.md index e018e13..a02921f 100644 --- a/design/workflows/ingest-snapshot/deterministic-bundle-ingest/03-core-sketch.md +++ b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/03-core-sketch.md @@ -14,7 +14,7 @@ State owned by `ingest-snapshot` and required to decide this workflow: - `SelectedSnapshot` - upstream snapshot identity - - upstream bundle location or source reference + - trusted bundle location derived from tainted ingest input - optional upstream metadata intended for later release provenance - `RunIdentityRules` - deterministic derivation rules from upstream snapshot identity @@ -30,7 +30,7 @@ State owned by `ingest-snapshot` and required to decide this workflow: Snapshots or handoffs read but not owned by this context: -- optional previous `Run Manifest` reference used only for continuity hints +- optional verified previous `Run Manifest` reference used only for continuity hints - upstream metadata needed later by `release-packaging` ## Policy Signature (Pseudo) @@ -38,6 +38,12 @@ Snapshots or handoffs read but not owned by this context: ```text deriveRunIdentity : SelectedSnapshot -> RunIdentity +parseBundleLocation : + TaintedBundleInput -> Result + +validatePreviousRunManifest : + RunManifest -> Result + validateSnapshotIngest : IngestUpstreamSnapshot -> SelectedSnapshot -> Result @@ -77,5 +83,6 @@ performSnapshotIngest : - The `ingest-snapshot` context decides only whether one snapshot can be deterministically ingested and which boundaries become `Segment Records`. - It does not decide package identity, dependency externalization, context heuristics, lineage matching, naming, regularization, transform replay, or release publication. - Observing a previous `Run Manifest` does not let this slice reuse or override current ingest decisions; cross-run lineage belongs to `snapshot-lineage`. +- A previous `Run Manifest` must be verified for schema and integrity before this slice may use it as a continuity hint. - Human-readable summary generation is outside the hard success contract for this slice; required machine artifacts remain the source of truth. - Feature-level orchestration decides when later phases may continue after downstream review-needed states; this slice only hard-stops when deterministic ingest itself is not trustworthy. diff --git a/design/workflows/ingest-snapshot/deterministic-bundle-ingest/04-blueprint.fs b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/04-blueprint.fs index 1189fee..37ba6bc 100644 --- a/design/workflows/ingest-snapshot/deterministic-bundle-ingest/04-blueprint.fs +++ b/design/workflows/ingest-snapshot/deterministic-bundle-ingest/04-blueprint.fs @@ -4,10 +4,14 @@ open IngestSnapshot.SharedModel // 1. Primitives -type TaintedBundleInput = TaintedBundleInput of BundleSource +type TaintedBundleInput = TaintedBundleInput of TaintedBundleLocation type DerivedRunIdentity = DerivedRunIdentity of RunIdentity +type MaxBundleBytes = MaxBundleBytes of int64 + +type ParseBudget = ParseBudget of int64 + type BoundaryProof = BoundaryProof of string type RequiredArtifact = @@ -18,6 +22,9 @@ type RequiredArtifact = type IngestFailureReason = | BundleNotParseable | RunIdentityCouldNotBeDerived + | PreviousRunManifestNotVerified + | BundleTooLarge of MaxBundleBytes + | ParseBudgetExceeded of ParseBudget | NoDeterministicBoundaryProven | RequiredArtifactMissing of RequiredArtifact @@ -27,15 +34,15 @@ type IngestUpstreamSnapshot = { SnapshotIdentity: SnapshotIdentity BundleInput: TaintedBundleInput SnapshotMetadata: SnapshotMetadata option - PreviousRunManifest: RunManifest option } + PreviousRunManifest: VerifiedPreviousRunManifest option } // 3. Events (Facts) type UpstreamSnapshotIngested = { RunManifest: RunManifest SegmentRecords: SegmentRecord list - CanonicalProjectionPath: CanonicalProjectionPath - SummaryPath: SummaryPath option } + CanonicalProjectionPath: TrustedCanonicalProjectionPath + SummaryPath: TrustedSummaryPath option } type SnapshotIngestHardStopped = { SnapshotIdentity: SnapshotIdentity @@ -54,17 +61,21 @@ type Error = type AwaitingSnapshotSelection = { RunIdentityRulesDescription: string BoundaryRulesDescription: string - RequiredArtifacts: RequiredArtifact list } + RequiredArtifacts: RequiredArtifact list + MaxBundleBytes: MaxBundleBytes + ParseBudget: ParseBudget } type SnapshotReady = { SelectedSnapshot: SelectedSnapshot - PreviousRunManifest: RunManifest option - RequiredArtifacts: RequiredArtifact list } + PreviousRunManifest: VerifiedPreviousRunManifest option + RequiredArtifacts: RequiredArtifact list + MaxBundleBytes: MaxBundleBytes + ParseBudget: ParseBudget } type DeterministicSegmentsReady = { RunIdentity: RunIdentity SelectedSnapshot: SelectedSnapshot - PreviousRunManifest: RunManifest option + PreviousRunManifest: VerifiedPreviousRunManifest option SegmentRecords: SegmentRecord list BoundaryProofs: BoundaryProof list RequiredArtifacts: RequiredArtifact list } @@ -77,8 +88,12 @@ type State = // 6. Contract Signatures +val parseBundleLocation : TaintedBundleInput -> Result + val deriveRunIdentity : SelectedSnapshot -> Result +val validatePreviousRunManifest : RunManifest -> Result + val validateSnapshotSelection : State -> IngestUpstreamSnapshot -> Result val decideSegmentRecords : SnapshotReady -> Result diff --git a/design/workflows/ingest-snapshot/shared-model.fs b/design/workflows/ingest-snapshot/shared-model.fs index c395176..8401642 100644 --- a/design/workflows/ingest-snapshot/shared-model.fs +++ b/design/workflows/ingest-snapshot/shared-model.fs @@ -4,7 +4,9 @@ module IngestSnapshot.SharedModel type SnapshotIdentity = SnapshotIdentity of string -type BundleSource = BundleSource of string +type TaintedBundleLocation = TaintedBundleLocation of string + +type TrustedBundleLocation = TrustedBundleLocation of string type RunIdentity = RunIdentity of string @@ -20,13 +22,13 @@ type NormalizedHash = NormalizedHash of string type ShapeHash = ShapeHash of string -type ManifestPath = ManifestPath of string +type TrustedManifestPath = TrustedManifestPath of string -type SegmentsPath = SegmentsPath of string +type TrustedSegmentsPath = TrustedSegmentsPath of string -type CanonicalProjectionPath = CanonicalProjectionPath of string +type TrustedCanonicalProjectionPath = TrustedCanonicalProjectionPath of string -type SummaryPath = SummaryPath of string +type TrustedSummaryPath = TrustedSummaryPath of string // 2. Shared compounds @@ -36,9 +38,11 @@ type SnapshotMetadata = type SelectedSnapshot = { SnapshotIdentity: SnapshotIdentity - BundleSource: BundleSource + BundleLocation: TrustedBundleLocation SnapshotMetadata: SnapshotMetadata option } +type VerifiedPreviousRunManifest = VerifiedPreviousRunManifest of RunManifest + type SegmentHashes = { RawHash: RawHash NormalizedHash: NormalizedHash @@ -54,7 +58,7 @@ type SegmentRecord = type RunManifest = { RunIdentity: RunIdentity SnapshotIdentity: SnapshotIdentity - ManifestPath: ManifestPath - SegmentsPath: SegmentsPath - CanonicalProjectionPath: CanonicalProjectionPath - SummaryPath: SummaryPath option } + ManifestPath: TrustedManifestPath + SegmentsPath: TrustedSegmentsPath + CanonicalProjectionPath: TrustedCanonicalProjectionPath + SummaryPath: TrustedSummaryPath option } diff --git a/docs/reference/shared-language.md b/docs/reference/shared-language.md index cb4c5de..7be6f01 100644 --- a/docs/reference/shared-language.md +++ b/docs/reference/shared-language.md @@ -32,8 +32,12 @@ Feature-specific naming choices should also be recorded in the relevant design a | Term | Meaning | Use this, not that | Notes | | :--- | :--- | :--- | :--- | -| `` | `` | `` not `` | `` | -| `` | `` | `` not `` | `` | +| `Recovery Pipeline` | Release-oriented workflow that turns one upstream snapshot into a buildable, browsable recovered tree and release artifacts | `Recovery Pipeline` not `deobfuscation script chain` | Feature-level umbrella term used across contexts. | +| `Recovered Tree` | Canonical editable source tree emitted at repo root for review and modification | `Recovered Tree` not `original repo layout` | The tree is reconstructed for usability, not historical fidelity. | +| `Build-first` | Acceptance rule that preserves buildability even when readability improvements are still incomplete | `Build-first` not `runtime complete` | Current hard success invariant for the feature. | +| `Review-needed Artifact` | Machine-readable report plus concise human summary that surfaces uncertainty, failure, or conflict | `Review-needed Artifact` not `warning log` | Explicit inspection seam rather than hidden failure. | +| `Maintained Transform` | Durable replayable local change stored outside the numbered upstream-processing phases | `Maintained Transform` not `manual patch` | Reused by replay and release contexts. | +| `Naming Memory` | Small reviewable history of accepted recovered names reused in later relabel iterations | `Naming Memory` not `rename cache` | Shared iterative-naming term with reviewer-facing meaning. | ## Review questions