diff --git a/design/workflows/dependency-recovery/CONTEXT.md b/design/workflows/dependency-recovery/CONTEXT.md new file mode 100644 index 0000000..eaa9c9a --- /dev/null +++ b/design/workflows/dependency-recovery/CONTEXT.md @@ -0,0 +1,38 @@ +# Dependency Recovery Context + +**Vendored Package**: third-party code embedded inside the upstream bundle and considered for recovery as an external dependency. +_Avoid_: library blob, bundled package blob + +**Dependency Decision**: the context-owned determination that a vendored candidate is accepted, rejected, or unresolved with recorded evidence and rationale. +_Avoid_: match result, package guess + +**Acceptance Threshold**: the configurable confidence score boundary at or above which a vendored candidate is accepted automatically. +_Avoid_: hard-coded cutoff, fixed confidence bar + +**Rejected Candidate**: a vendored candidate whose evidence deterministically supports that it is not a package match worth externalizing. +_Avoid_: low-confidence maybe, unresolved miss + +**Unresolved Candidate**: a vendored candidate that remains plausible but is below the acceptance threshold, colliding, ambiguous, or otherwise unsafe to accept. +_Avoid_: rejected maybe, ignored candidate + +**Fallback Preservation**: keeping bundled code available when externalization is unsafe or unresolved so later validation can compare behaviors safely. +_Avoid_: leave old code around, dead backup code + +**Externalization**: replacing accepted vendored code with an external dependency reference without deleting the original bundled implementation. +_Avoid_: strip dependency, remove vendor code + +## Example dialogue + +> **Developer:** "If a candidate scores below the configured threshold, do we reject it?" +> **Domain Expert:** "No. If it still looks plausible, it stays unresolved until stronger evidence appears or a reviewer decides otherwise." +> +> **Developer:** "When do we mark a candidate rejected?" +> **Domain Expert:** "Only when the evidence deterministically says it is not a vendored package match worth externalizing." +> +> **Developer:** "Does accepting a vendored package delete the bundled code?" +> **Domain Expert:** "No. Externalization still preserves the bundled implementation as fallback evidence." + +## Flagged ambiguities + +- "Low confidence" alone does not mean `rejected`; low-confidence but still plausible candidates are `Unresolved Candidates`. +- "Match result" was too scoring-shaped; the preferred term is `Dependency Decision` because this context owns a reviewer-facing decision with rationale. diff --git a/design/workflows/dependency-recovery/identify-vendored-packages/02-discovery.md b/design/workflows/dependency-recovery/identify-vendored-packages/02-discovery.md new file mode 100644 index 0000000..f645073 --- /dev/null +++ b/design/workflows/dependency-recovery/identify-vendored-packages/02-discovery.md @@ -0,0 +1,59 @@ +# Slice Discovery: Identify Vendored Packages + +- Bounded context: `dependency-recovery` +- Workflow slug: `identify-vendored-packages` + +## Happy Path + +- The workflow receives deterministic ingest artifacts from `ingest-snapshot`, including the run manifest, segment records, and canonical source projection. +- The workflow scans segment and segment-group candidates for vendored-package evidence. +- The workflow recovers probable package boundaries when one vendored package spans multiple related segments. +- The workflow computes a deterministic confidence score for each candidate match by combining the configured evidence signals. +- The workflow compares each confidence score against the configurable acceptance threshold. +- Candidates at or above the acceptance threshold become `accepted` dependency decisions. +- Candidates that remain plausible but are below threshold, colliding, or ambiguous become `unresolved` dependency decisions. +- Candidates whose evidence deterministically supports that they are not vendored package matches become `rejected` dependency decisions. +- The workflow emits a manifest of accepted, rejected, and unresolved dependency decisions with evidence, rationale, boundary notes, and replacement planning hints. +- Downstream `externalize-accepted-dependencies` consumes only the accepted decisions for externalization while preserving unresolved and rejected records for review and summaries. + +## Edge Cases + +- A package is split across multiple obfuscated wrappers -> recover one vendored candidate boundary spanning the related segments and record the grouping rationale. +- Multiple package matches compete for the same segment group -> keep the stronger deterministic ranking, but if the competition remains plausible and unsafe to settle automatically, emit `unresolved` rather than forcing rejection. +- A candidate has some evidence but stays below the configured acceptance threshold -> emit `unresolved`. +- A candidate has strong counter-evidence that it is app-authored or otherwise not a vendored package match -> emit `rejected`. +- Runtime traces are unavailable -> continue with static evidence only. +- Runtime traces conflict with static evidence -> record the mixed provenance and prefer `unresolved` unless the conflict is resolved deterministically. +- A candidate package boundary is only partially recoverable -> record the partial boundary notes and keep the decision unresolved unless the recoverable portion still supports a deterministic accepted or rejected decision. +- Two runs use different threshold settings -> preserve the same confidence scores and evidence, but allow the configurable threshold to change which candidates become accepted. + +## Business Rules & Invariants + +- Rule: Confidence scoring is deterministic for identical ingest artifacts, evidence sources, and configuration. +- Rule: Acceptance uses a configurable threshold rather than a hard-coded cutoff. +- Rule: `Accepted` means the candidate score is at or above the configured acceptance threshold and the match is safe enough to externalize later. +- Rule: `Unresolved` is preferred over `Rejected` when a candidate remains plausible but is below threshold, colliding, ambiguous, or otherwise unsafe to settle automatically. +- Rule: `Rejected` is reserved for candidates whose evidence deterministically supports that they are not vendored package matches worth externalizing. +- Rule: The workflow records auditable evidence and rationale for every dependency decision. +- Rule: One vendored package candidate may map to multiple segments when the boundary recovery rationale supports it. +- Invariant: This slice identifies and records dependency decisions only; it does not externalize code. +- Invariant: Accepted, rejected, and unresolved candidates all remain visible in emitted manifests. +- Invariant: Changing the acceptance threshold must not require redesigning the manifest format. + +## Required Decisions Owned by This Context + +- Which evidence signals are combined into deterministic vendored-package confidence scoring. +- Which related segments belong to one vendored package candidate boundary. +- Whether a candidate becomes accepted, rejected, or unresolved. +- What evidence, rationale, ambiguity notes, and replacement planning hints must be persisted for downstream use. + +## Handoff Assumptions + +- `ingest-snapshot` provides deterministic run manifest, segment records, and canonical projection as the source of truth for candidate discovery. +- `externalize-accepted-dependencies` consumes only accepted dependency decisions for replacement work. +- Later review and release-summary seams consume accepted, rejected, and unresolved manifests without reopening this slice's decision logic. +- Optional runtime traces act only as additional evidence inside this context and do not override deterministic decision recording requirements. + +## Open Questions + +- None currently inside this slice; threshold tuning and evidence-weight configuration remain implementation concerns within this context. diff --git a/design/workflows/dependency-recovery/identify-vendored-packages/03-core-sketch.md b/design/workflows/dependency-recovery/identify-vendored-packages/03-core-sketch.md new file mode 100644 index 0000000..0ebd67a --- /dev/null +++ b/design/workflows/dependency-recovery/identify-vendored-packages/03-core-sketch.md @@ -0,0 +1,89 @@ +# Core Sketch: Identify Vendored Packages + +- Bounded context: `dependency-recovery` +- Workflow slug: `identify-vendored-packages` + +## Command + +- `IdentifyVendoredPackages` +- Meaning: evaluate deterministic ingest artifacts to decide which bundled code should be treated as a `Vendored Package` and recorded as accepted, rejected, or unresolved for later `Externalization`. + +## Required State + +State owned by `dependency-recovery` and required to decide this workflow: + +- `VendoredCandidateDiscoveryRules` + - allowed evidence signals for vendored-package discovery + - grouping rules for segment and segment-group candidates + - rationale rules for recovered package boundaries +- `ConfidenceScoringRules` + - deterministic scoring weights or combination rules across evidence types + - ranking rules for competing package matches on the same candidate boundary + - tie-break rules when scores or evidence patterns compete +- `AcceptanceThresholdPolicy` + - configurable `Acceptance Threshold` + - rules for when plausible but below-threshold or colliding candidates remain `Unresolved Candidates` + - rules for when counter-evidence is strong enough to produce a `Rejected Candidate` +- `DependencyDecisionRequirements` + - required manifest fields for evidence summary, raw evidence references, confidence score, provenance, recovered boundary notes, ambiguity notes, replacement plan, and fallback reference + - required auditability rules for later review and downstream handoffs + +## Observed Inputs + +Snapshots or handoffs read but not owned by this context: + +- `Run Manifest` from `ingest-snapshot` +- `Segment Record` set and canonical source projection from `ingest-snapshot` +- optional runtime traces used only as additional or tie-break evidence +- optional registry, tarball, or CDN package evidence used to compare candidate matches + +## Policy Signature (Pseudo) + +```text +identifyCandidateBoundaries : + RunManifest -> SegmentRecords -> CandidateDiscoveryRules -> NonEmptyList + +scoreVendoredCandidate : + VendoredCandidate -> EvidenceSources -> ConfidenceScoringRules -> RankedCandidateMatches + +decideDependencyDecision : + RankedCandidateMatches + -> AcceptanceThresholdPolicy + -> DependencyDecision + +validateDecisionManifest : + DependencyDecisionSet -> DependencyDecisionRequirements -> Result + +performVendoredPackageIdentification : + IdentifyVendoredPackages + -> DependencyRecoveryState + -> Result +``` + +## Events + +### Success Event + +- `VendoredPackagesIdentified` + - run identity reference + - accepted dependency decisions + - rejected dependency decisions + - unresolved dependency decisions + - emitted decision manifest reference + - evidence artifact references + +### Failure Event + +- `VendoredPackageIdentificationHardStopped` + - run identity when available + - failed stage such as missing ingest artifacts, invalid candidate boundary inputs, or invalid decision-manifest requirements + - failure reason + +## Boundary Notes + +- The `dependency-recovery` context decides package candidacy, confidence ranking, and dependency decision state only for this slice. +- This slice does not externalize accepted packages; that belongs to `dependency-recovery/externalize-accepted-dependencies`. +- `ingest-snapshot` remains the source of truth for run manifest, segment boundaries, and canonical projection; this slice must not reopen ingest decisions. +- Optional runtime traces and package-source comparisons act only as evidence inputs here and must not turn this slice into cross-context orchestration. +- Feature-level orchestration decides whether unresolved or review-needed outcomes slow later phases; this slice only records accepted, rejected, and unresolved decisions with audit-ready evidence. +- Threshold tuning and scoring-weight configuration stay inside this context's policy setup, but later feature phases consume only the emitted decisions and artifacts. diff --git a/design/workflows/dependency-recovery/identify-vendored-packages/04-blueprint.fs b/design/workflows/dependency-recovery/identify-vendored-packages/04-blueprint.fs new file mode 100644 index 0000000..7ecf1ac --- /dev/null +++ b/design/workflows/dependency-recovery/identify-vendored-packages/04-blueprint.fs @@ -0,0 +1,149 @@ +module DependencyRecovery.IdentifyVendoredPackages + +open DependencyRecovery.SharedModel + +// 1. Primitives + +type TaintedRunManifestReference = TaintedRunManifestReference of string + +type TaintedSegmentRecordReference = TaintedSegmentRecordReference of string + +type TaintedCanonicalProjectionReference = TaintedCanonicalProjectionReference of string + +type TaintedRuntimeTraceReference = TaintedRuntimeTraceReference of string + +type TrustedRunManifestReference = TrustedRunManifestReference of string + +type TrustedSegmentRecordReference = TrustedSegmentRecordReference of string + +type TrustedCanonicalProjectionReference = TrustedCanonicalProjectionReference of string + +type TrustedRuntimeTraceReference = TrustedRuntimeTraceReference of string + +type CandidateDiscoveryRule = CandidateDiscoveryRule of string + +type ScoringRule = ScoringRule of string + +type TieBreakRule = TieBreakRule of string + +type RequirementRule = RequirementRule of string + +// 2. Commands (Inputs) + +type IdentifyVendoredPackages = { + runManifest: TaintedRunManifestReference + segmentRecords: TaintedSegmentRecordReference + canonicalProjection: TaintedCanonicalProjectionReference + runtimeTraces: TaintedRuntimeTraceReference option +} + +// 3. Observed inputs and owned state + +type TrustedIngestArtifacts = { + runIdentity: RunIdentity + runManifest: TrustedRunManifestReference + segmentRecords: TrustedSegmentRecordReference + canonicalProjection: TrustedCanonicalProjectionReference + runtimeTraces: TrustedRuntimeTraceReference option +} + +type VendoredCandidateDiscoveryRules = { + allowedSignals: EvidenceSignal list + groupingRules: CandidateDiscoveryRule list + boundaryRationaleRules: CandidateDiscoveryRule list +} + +type ConfidenceScoringRules = { + scoringRules: ScoringRule list + rankingRules: ScoringRule list + tieBreakRules: TieBreakRule list +} + +type AcceptanceThresholdPolicy = { + acceptanceThreshold: ConfidenceScore + unresolvedRules: RequirementRule list + rejectedRules: RequirementRule list +} + +type DependencyDecisionRequirements = { + requiredManifestRules: RequirementRule list + auditabilityRules: RequirementRule list +} + +type DependencyRecoveryState = { + candidateDiscoveryRules: VendoredCandidateDiscoveryRules + confidenceScoringRules: ConfidenceScoringRules + acceptanceThresholdPolicy: AcceptanceThresholdPolicy + dependencyDecisionRequirements: DependencyDecisionRequirements +} + +// 4. Events (Facts) + +type VendoredPackagesIdentified = { + runIdentity: RunIdentity + acceptedDecisions: DependencyDecision list + rejectedDecisions: DependencyDecision list + unresolvedDecisions: DependencyDecision list + decisionManifest: EvidenceReference + evidenceArtifacts: EvidenceReference list +} + +type VendoredPackageIdentificationFailureReason = + | MissingIngestArtifacts + | InvalidIngestArtifactReference + | NoCandidateBoundariesRecovered + | InvalidDecisionManifestRequirements + +type VendoredPackageIdentificationHardStopped = { + runIdentity: RunIdentity option + failedStage: string + reason: VendoredPackageIdentificationFailureReason +} + +// 5. State (Aggregate) + +type DependencyIdentificationState = + | AwaitingVendoredPackageIdentification of DependencyRecoveryState + | VendoredPackageDecisionsRecorded of VendoredPackagesIdentified + +// 6. Parse and decision contracts + +val parseIngestArtifacts : + IdentifyVendoredPackages + -> Result + +val identifyCandidateBoundaries : + TrustedIngestArtifacts + -> VendoredCandidateDiscoveryRules + -> Result + +val scoreCandidateMatches : + CandidateBoundary + -> TrustedIngestArtifacts + -> ConfidenceScoringRules + -> Result + +val decideDependencyDecision : + AcceptanceThresholdPolicy + -> CandidateBoundary + -> CandidateMatch list + -> Result + +val validateDecisionManifest : + DependencyDecisionRequirements + -> DependencyDecision list + -> Result + +val decide : + DependencyIdentificationState + -> IdentifyVendoredPackages + -> Result + +val apply : + DependencyIdentificationState + -> VendoredPackagesIdentified + -> DependencyIdentificationState + +val workflow : + IdentifyVendoredPackages + -> Effect.Effect> diff --git a/design/workflows/dependency-recovery/shared-model.fs b/design/workflows/dependency-recovery/shared-model.fs new file mode 100644 index 0000000..b38260d --- /dev/null +++ b/design/workflows/dependency-recovery/shared-model.fs @@ -0,0 +1,69 @@ +module DependencyRecovery.SharedModel + +type RunIdentity = RunIdentity of string + +type SegmentId = SegmentId of string + +type CandidateBoundaryId = CandidateBoundaryId of string + +type PackageName = PackageName of string + +type ConfidenceScore = private ConfidenceScore of int + +type EvidenceReference = EvidenceReference of string + +type BoundaryNote = BoundaryNote of string + +type AmbiguityNote = AmbiguityNote of string + +type ReplacementPlan = ReplacementPlan of string + +type FallbackReference = FallbackReference of string + +type Rationale = Rationale of string + +type EvidenceProvenance = + | Registry + | Tarball + | Cdn + | Runtime + | Static + | Mixed + +type EvidenceSignal = + | LicenseBanner + | PreservedPackageName + | SourceMapHint + | PreservedRequireString + | CharacteristicLiteralSet + | HelperSignature + | AstShapeFingerprint + | ExportSurfaceSimilarity + | DependencyGraphPosition + | ByteSimilarity + | RuntimeExecutionTrace + +type CandidateBoundary = { + boundaryId: CandidateBoundaryId + segmentIds: SegmentId list + boundaryNotes: BoundaryNote list +} + +type EvidenceSummary = { + signals: EvidenceSignal list + rawEvidence: EvidenceReference list + provenance: EvidenceProvenance + rationale: Rationale +} + +type CandidateMatch = { + packageName: PackageName + confidenceScore: ConfidenceScore + evidence: EvidenceSummary + ambiguityNotes: AmbiguityNote list +} + +type DependencyDecision = + | AcceptedDecision of CandidateBoundary * CandidateMatch * ReplacementPlan * FallbackReference + | RejectedDecision of CandidateBoundary * CandidateMatch + | UnresolvedDecision of CandidateBoundary * CandidateMatch list