doing individual items not entire groups

rules are list of options not strings
initial blueprint for dependency recovery design
2026-05-26 19:49:10 -06:00 · 2026-05-26 19:23:19 -06:00 · 2026-05-26 19:13:16 -06:00
5 changed files with 471 additions and 0 deletions
@@ -0,0 +1,38 @@
+# Dependency Recovery Context
+
+**Vendored Package**: third-party code embedded inside the upstream bundle and considered for recovery as an external dependency.
+_Avoid_: library blob, bundled package blob
+
+**Dependency Decision**: the context-owned determination that a vendored candidate is accepted, rejected, or unresolved with recorded evidence and rationale.
+_Avoid_: match result, package guess
+
+**Acceptance Threshold**: the configurable confidence score boundary at or above which a vendored candidate is accepted automatically.
+_Avoid_: hard-coded cutoff, fixed confidence bar
+
+**Rejected Candidate**: a vendored candidate whose evidence deterministically supports that it is not a package match worth externalizing.
+_Avoid_: low-confidence maybe, unresolved miss
+
+**Unresolved Candidate**: a vendored candidate that remains plausible but is below the acceptance threshold, colliding, ambiguous, or otherwise unsafe to accept.
+_Avoid_: rejected maybe, ignored candidate
+
+**Fallback Preservation**: keeping bundled code available when externalization is unsafe or unresolved so later validation can compare behaviors safely.
+_Avoid_: leave old code around, dead backup code
+
+**Externalization**: replacing accepted vendored code with an external dependency reference without deleting the original bundled implementation.
+_Avoid_: strip dependency, remove vendor code
+
+## Example dialogue
+
+> **Developer:** "If a candidate scores below the configured threshold, do we reject it?"
+> **Domain Expert:** "No. If it still looks plausible, it stays unresolved until stronger evidence appears or a reviewer decides otherwise."
+>
+> **Developer:** "When do we mark a candidate rejected?"
+> **Domain Expert:** "Only when the evidence deterministically says it is not a vendored package match worth externalizing."
+>
+> **Developer:** "Does accepting a vendored package delete the bundled code?"
+> **Domain Expert:** "No. Externalization still preserves the bundled implementation as fallback evidence."
+
+## Flagged ambiguities
+
+- "Low confidence" alone does not mean `rejected`; low-confidence but still plausible candidates are `Unresolved Candidates`.
+- "Match result" was too scoring-shaped; the preferred term is `Dependency Decision` because this context owns a reviewer-facing decision with rationale.
@@ -0,0 +1,59 @@
+# Slice Discovery: Identify Vendored Packages
+
+- Bounded context: `dependency-recovery`
+- Workflow slug: `identify-vendored-packages`
+
+## Happy Path
+
+- The workflow receives deterministic ingest artifacts from `ingest-snapshot`, including the run manifest, segment records, and canonical source projection.
+- The workflow scans segment and segment-group candidates for vendored-package evidence.
+- The workflow recovers probable package boundaries when one vendored package spans multiple related segments.
+- The workflow computes a deterministic confidence score for each candidate match by combining the configured evidence signals.
+- The workflow compares each confidence score against the configurable acceptance threshold.
+- Candidates at or above the acceptance threshold become `accepted` dependency decisions.
+- Candidates that remain plausible but are below threshold, colliding, or ambiguous become `unresolved` dependency decisions.
+- Candidates whose evidence deterministically supports that they are not vendored package matches become `rejected` dependency decisions.
+- The workflow emits a manifest of accepted, rejected, and unresolved dependency decisions with evidence, rationale, boundary notes, and replacement planning hints.
+- Downstream `externalize-accepted-dependencies` consumes only the accepted decisions for externalization while preserving unresolved and rejected records for review and summaries.
+
+## Edge Cases
+
+- A package is split across multiple obfuscated wrappers -> recover one vendored candidate boundary spanning the related segments and record the grouping rationale.
+- Multiple package matches compete for the same segment group -> keep the stronger deterministic ranking, but if the competition remains plausible and unsafe to settle automatically, emit `unresolved` rather than forcing rejection.
+- A candidate has some evidence but stays below the configured acceptance threshold -> emit `unresolved`.
+- A candidate has strong counter-evidence that it is app-authored or otherwise not a vendored package match -> emit `rejected`.
+- Runtime traces are unavailable -> continue with static evidence only.
+- Runtime traces conflict with static evidence -> record the mixed provenance and prefer `unresolved` unless the conflict is resolved deterministically.
+- A candidate package boundary is only partially recoverable -> record the partial boundary notes and keep the decision unresolved unless the recoverable portion still supports a deterministic accepted or rejected decision.
+- Two runs use different threshold settings -> preserve the same confidence scores and evidence, but allow the configurable threshold to change which candidates become accepted.
+
+## Business Rules & Invariants
+
+- Rule: Confidence scoring is deterministic for identical ingest artifacts, evidence sources, and configuration.
+- Rule: Acceptance uses a configurable threshold rather than a hard-coded cutoff.
+- Rule: `Accepted` means the candidate score is at or above the configured acceptance threshold and the match is safe enough to externalize later.
+- Rule: `Unresolved` is preferred over `Rejected` when a candidate remains plausible but is below threshold, colliding, ambiguous, or otherwise unsafe to settle automatically.
+- Rule: `Rejected` is reserved for candidates whose evidence deterministically supports that they are not vendored package matches worth externalizing.
+- Rule: The workflow records auditable evidence and rationale for every dependency decision.
+- Rule: One vendored package candidate may map to multiple segments when the boundary recovery rationale supports it.
+- Invariant: This slice identifies and records dependency decisions only; it does not externalize code.
+- Invariant: Accepted, rejected, and unresolved candidates all remain visible in emitted manifests.
+- Invariant: Changing the acceptance threshold must not require redesigning the manifest format.
+
+## Required Decisions Owned by This Context
+
+- Which evidence signals are combined into deterministic vendored-package confidence scoring.
+- Which related segments belong to one vendored package candidate boundary.
+- Whether a candidate becomes accepted, rejected, or unresolved.
+- What evidence, rationale, ambiguity notes, and replacement planning hints must be persisted for downstream use.
+
+## Handoff Assumptions
+
+- `ingest-snapshot` provides deterministic run manifest, segment records, and canonical projection as the source of truth for candidate discovery.
+- `externalize-accepted-dependencies` consumes only accepted dependency decisions for replacement work.
+- Later review and release-summary seams consume accepted, rejected, and unresolved manifests without reopening this slice's decision logic.
+- Optional runtime traces act only as additional evidence inside this context and do not override deterministic decision recording requirements.
+
+## Open Questions
+
+- None currently inside this slice; threshold tuning and evidence-weight configuration remain implementation concerns within this context.
@@ -0,0 +1,91 @@
+# Core Sketch: Identify Vendored Packages
+
+- Bounded context: `dependency-recovery`
+- Workflow slug: `identify-vendored-packages`
+
+## Command
+
+- `IdentifyVendoredPackage`
+- Meaning: evaluate one recovered vendored candidate boundary against deterministic evidence so this context can record a single `Dependency Decision` for later `Externalization`.
+
+## Required State
+
+State owned by `dependency-recovery` and required to decide this workflow:
+
+- `VendoredCandidateDiscoveryRules`
+  - allowed evidence signals for vendored-package discovery
+  - grouping rules for segment and segment-group candidates
+  - rationale rules for recovered package boundaries
+- `ConfidenceScoringRules`
+  - deterministic scoring weights or combination rules across evidence types
+  - ranking rules for competing package matches on the same candidate boundary
+  - tie-break rules when scores or evidence patterns compete
+- `AcceptanceThresholdPolicy`
+  - configurable `Acceptance Threshold`
+  - rules for when plausible but below-threshold or colliding candidates remain `Unresolved Candidates`
+  - rules for when counter-evidence is strong enough to produce a `Rejected Candidate`
+- `DependencyDecisionRequirements`
+  - required manifest fields for evidence summary, raw evidence references, confidence score, provenance, recovered boundary notes, ambiguity notes, replacement plan, and fallback reference
+  - required auditability rules for later review and downstream handoffs
+
+## Observed Inputs
+
+Snapshots or handoffs read but not owned by this context:
+
+- one recovered `Vendored Package` candidate boundary derived from `ingest-snapshot` artifacts by outer orchestration
+- the relevant `Run Manifest` facts and canonical source projection from `ingest-snapshot` for that candidate boundary
+- optional runtime traces for that candidate boundary used only as additional or tie-break evidence
+- optional registry, tarball, or CDN package evidence used to compare matches for that candidate boundary
+
+## Policy Signature (Pseudo)
+
+```text
+scoreVendoredCandidate :
+  VendoredCandidateBoundary
+  -> CandidateEvidenceSources
+  -> ConfidenceScoringRules
+  -> RankedCandidateMatches
+
+decideDependencyDecision :
+  VendoredCandidateBoundary
+  -> RankedCandidateMatches
+  -> AcceptanceThresholdPolicy
+  -> DependencyDecision
+
+validateDecisionRecord :
+  DependencyDecision
+  -> DependencyDecisionRequirements
+  -> Result<DependencyDecisionRecorded, DependencyDecisionRejected>
+
+performVendoredPackageIdentification :
+  IdentifyVendoredPackage
+  -> DependencyRecoveryState
+  -> Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>
+```
+
+## Events
+
+### Success Event
+
+- `VendoredPackageIdentified`
+  - run identity reference
+  - evaluated candidate boundary reference
+  - emitted single dependency decision
+  - emitted decision record reference
+  - evidence artifact references for that candidate
+
+### Failure Event
+
+- `VendoredPackageIdentificationHardStopped`
+  - run identity when available
+  - failed stage such as missing ingest artifacts, invalid candidate boundary inputs, or invalid decision-manifest requirements
+  - failure reason
+
+## Boundary Notes
+
+- The `dependency-recovery` context decides one candidate boundary's package candidacy, confidence ranking, and dependency decision state per workflow invocation.
+- Outer orchestration is responsible for discovering, selecting, and iterating across multiple candidate boundaries; this slice must not batch those decisions itself.
+- This slice does not externalize accepted packages; that belongs to `dependency-recovery/externalize-accepted-dependencies`.
+- `ingest-snapshot` remains the source of truth for run manifest, segment boundaries, and canonical projection; this slice must not reopen ingest decisions.
+- Optional runtime traces and package-source comparisons act only as evidence inputs here and must not turn this slice into cross-context orchestration.
+- Feature-level orchestration decides whether unresolved or review-needed outcomes slow later phases; this slice only records one audit-ready dependency decision at a time.
@@ -0,0 +1,214 @@
+module DependencyRecovery.IdentifyVendoredPackages
+
+open DependencyRecovery.SharedModel
+
+// 1. Primitives
+
+type TaintedRunManifestReference = TaintedRunManifestReference of string
+
+type TaintedSegmentRecordReference = TaintedSegmentRecordReference of string
+
+type TaintedCanonicalProjectionReference = TaintedCanonicalProjectionReference of string
+
+type TaintedRuntimeTraceReference = TaintedRuntimeTraceReference of string
+
+type TrustedRunManifestReference = TrustedRunManifestReference of string
+
+type TrustedSegmentRecordReference = TrustedSegmentRecordReference of string
+
+type TrustedCanonicalProjectionReference = TrustedCanonicalProjectionReference of string
+
+type TrustedRuntimeTraceReference = TrustedRuntimeTraceReference of string
+
+type CandidateGroupingRule =
+    | GroupAdjacentSegments
+    | GroupStructurallyLinkedSegments
+    | GroupSharedLiteralClusters
+    | GroupSharedExportSurface
+
+type BoundaryRationaleRule =
+    | RecordAdjacentGroupingRationale
+    | RecordStructuralLinkRationale
+    | RecordSharedLiteralRationale
+    | RecordExportSurfaceRationale
+
+type ScoringRule =
+    | WeightLicenseBanner
+    | WeightPreservedPackageName
+    | WeightSourceMapHint
+    | WeightPreservedRequireString
+    | WeightCharacteristicLiteralSet
+    | WeightHelperSignature
+    | WeightAstShapeFingerprint
+    | WeightExportSurfaceSimilarity
+    | WeightDependencyGraphPosition
+    | WeightByteSimilarity
+    | WeightRuntimeExecutionTrace
+
+type RankingRule =
+    | RankByTotalEvidenceWeight
+    | RankByEvidenceDiversity
+    | RankByBoundaryCoverage
+    | RankByRuntimeSupport
+
+type TieBreakRule =
+    | PreferMoreSpecificPackageMatch
+    | PreferBroaderBoundaryCoverage
+    | PreferStaticEvidenceAgreement
+    | PreferStablePackageNameOrder
+
+type UnresolvedRule =
+    | KeepBelowThresholdCandidatesUnresolved
+    | KeepCollidingCandidatesUnresolved
+    | KeepAmbiguousCandidatesUnresolved
+    | KeepConflictingEvidenceCandidatesUnresolved
+
+type RejectedRule =
+    | RejectDeterministicallyAppAuthoredCandidates
+    | RejectDeterministicallyNonPackageCandidates
+    | RejectDeterministicallyContradictedCandidates
+
+type RequiredManifestField =
+    | CandidatePackageNameField
+    | DecisionStateField
+    | ConfidenceScoreField
+    | EvidenceSummaryField
+    | RawEvidenceReferencesField
+    | MatchedSegmentIdsField
+    | RecoveredBoundaryNotesField
+    | ReplacementPlanField
+    | FallbackReferenceField
+    | EvidenceProvenanceField
+    | AmbiguityNotesField
+
+type AuditabilityRule =
+    | RecordDecisionRationale
+    | RecordThresholdUsed
+    | RecordScoringInputs
+    | RecordCompetingMatches
+    | RecordDecisionTimestampOrder
+
+// 2. Commands (Inputs)
+
+type TaintedCandidateBoundaryReference = TaintedCandidateBoundaryReference of string
+
+type TrustedCandidateBoundaryReference = TrustedCandidateBoundaryReference of string
+
+type IdentifyVendoredPackage = {
+    runManifest: TaintedRunManifestReference
+    canonicalProjection: TaintedCanonicalProjectionReference
+    candidateBoundary: TaintedCandidateBoundaryReference
+    runtimeTraces: TaintedRuntimeTraceReference option
+}
+
+// 3. Observed inputs and owned state
+
+type TrustedCandidateInput = {
+    runIdentity: RunIdentity
+    runManifest: TrustedRunManifestReference
+    canonicalProjection: TrustedCanonicalProjectionReference
+    candidateBoundary: TrustedCandidateBoundaryReference
+    runtimeTraces: TrustedRuntimeTraceReference option
+}
+
+type VendoredCandidateDiscoveryRules = {
+    allowedSignals: EvidenceSignal list
+    groupingRules: CandidateGroupingRule list
+    boundaryRationaleRules: BoundaryRationaleRule list
+}
+
+type ConfidenceScoringRules = {
+    scoringRules: ScoringRule list
+    rankingRules: RankingRule list
+    tieBreakRules: TieBreakRule list
+}
+
+type AcceptanceThresholdPolicy = {
+    acceptanceThreshold: ConfidenceScore
+    unresolvedRules: UnresolvedRule list
+    rejectedRules: RejectedRule list
+}
+
+type DependencyDecisionRequirements = {
+    requiredManifestFields: RequiredManifestField list
+    auditabilityRules: AuditabilityRule list
+}
+
+type DependencyRecoveryState = {
+    candidateDiscoveryRules: VendoredCandidateDiscoveryRules
+    confidenceScoringRules: ConfidenceScoringRules
+    acceptanceThresholdPolicy: AcceptanceThresholdPolicy
+    dependencyDecisionRequirements: DependencyDecisionRequirements
+}
+
+// 4. Events (Facts)
+
+type DecisionRecordReference = DecisionRecordReference of string
+
+type VendoredPackageIdentified = {
+    runIdentity: RunIdentity
+    candidateBoundary: TrustedCandidateBoundaryReference
+    dependencyDecision: DependencyDecision
+    decisionRecord: DecisionRecordReference
+    evidenceArtifacts: EvidenceReference list
+}
+
+type VendoredPackageIdentificationStage =
+    | CandidateInputParsingStage
+    | CandidateScoringStage
+    | DependencyDecisionStage
+    | DecisionRecordValidationStage
+
+type VendoredPackageIdentificationFailureReason =
+    | MissingIngestArtifacts
+    | InvalidIngestArtifactReference
+    | InvalidCandidateBoundaryReference
+    | InvalidDecisionRecordRequirements
+
+type VendoredPackageIdentificationHardStopped = {
+    runIdentity: RunIdentity option
+    failedStage: VendoredPackageIdentificationStage
+    reason: VendoredPackageIdentificationFailureReason
+}
+
+// 5. State (Aggregate)
+
+type DependencyIdentificationState =
+    | AwaitingVendoredPackageIdentification of DependencyRecoveryState
+    | VendoredPackageDecisionRecorded of VendoredPackageIdentified
+
+// 6. Parse and decision contracts
+
+val parseCandidateInput :
+    IdentifyVendoredPackage
+    -> Result<TrustedCandidateInput, VendoredPackageIdentificationHardStopped>
+
+val scoreCandidateMatches :
+    TrustedCandidateInput
+    -> ConfidenceScoringRules
+    -> Result<CandidateMatch list, VendoredPackageIdentificationHardStopped>
+
+val decideDependencyDecision :
+    AcceptanceThresholdPolicy
+    -> TrustedCandidateBoundaryReference
+    -> CandidateMatch list
+    -> Result<DependencyDecision, VendoredPackageIdentificationHardStopped>
+
+val validateDecisionRecord :
+    DependencyDecisionRequirements
+    -> DependencyDecision
+    -> Result<DecisionRecordReference, VendoredPackageIdentificationHardStopped>
+
+val decide :
+    DependencyIdentificationState
+    -> IdentifyVendoredPackage
+    -> Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>
+
+val apply :
+    DependencyIdentificationState
+    -> VendoredPackageIdentified
+    -> DependencyIdentificationState
+
+val workflow :
+    IdentifyVendoredPackage
+    -> Effect.Effect<Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>>
@@ -0,0 +1,69 @@
+module DependencyRecovery.SharedModel
+
+type RunIdentity = RunIdentity of string
+
+type SegmentId = SegmentId of string
+
+type CandidateBoundaryId = CandidateBoundaryId of string
+
+type PackageName = PackageName of string
+
+type ConfidenceScore = private ConfidenceScore of int
+
+type EvidenceReference = EvidenceReference of string
+
+type BoundaryNote = BoundaryNote of string
+
+type AmbiguityNote = AmbiguityNote of string
+
+type ReplacementPlan = ReplacementPlan of string
+
+type FallbackReference = FallbackReference of string
+
+type Rationale = Rationale of string
+
+type EvidenceProvenance =
+    | Registry
+    | Tarball
+    | Cdn
+    | Runtime
+    | Static
+    | Mixed
+
+type EvidenceSignal =
+    | LicenseBanner
+    | PreservedPackageName
+    | SourceMapHint
+    | PreservedRequireString
+    | CharacteristicLiteralSet
+    | HelperSignature
+    | AstShapeFingerprint
+    | ExportSurfaceSimilarity
+    | DependencyGraphPosition
+    | ByteSimilarity
+    | RuntimeExecutionTrace
+
+type CandidateBoundary = {
+    boundaryId: CandidateBoundaryId
+    segmentIds: SegmentId list
+    boundaryNotes: BoundaryNote list
+}
+
+type EvidenceSummary = {
+    signals: EvidenceSignal list
+    rawEvidence: EvidenceReference list
+    provenance: EvidenceProvenance
+    rationale: Rationale
+}
+
+type CandidateMatch = {
+    packageName: PackageName
+    confidenceScore: ConfidenceScore
+    evidence: EvidenceSummary
+    ambiguityNotes: AmbiguityNote list
+}
+
+type DependencyDecision =
+    | AcceptedDecision of CandidateBoundary * CandidateMatch * ReplacementPlan * FallbackReference
+    | RejectedDecision of CandidateBoundary * CandidateMatch
+    | UnresolvedDecision of CandidateBoundary * CandidateMatch list
Author	SHA1	Message	Date
ada	8fca79e968	doing individual items not entire groups	2026-05-26 19:49:10 -06:00
ada	2c4df3998a	rules are list of options not strings	2026-05-26 19:23:19 -06:00
ada	121336a190	initial blueprint for dependency recovery design	2026-05-26 19:13:16 -06:00