Compare commits

..

3 Commits

Author SHA1 Message Date
ada 8fca79e968 doing individual items not entire groups 2026-05-26 19:49:10 -06:00
ada 2c4df3998a rules are list of options not strings 2026-05-26 19:23:19 -06:00
ada 121336a190 initial blueprint for dependency recovery design 2026-05-26 19:13:16 -06:00
5 changed files with 471 additions and 0 deletions
@@ -0,0 +1,38 @@
# Dependency Recovery Context
**Vendored Package**: third-party code embedded inside the upstream bundle and considered for recovery as an external dependency.
_Avoid_: library blob, bundled package blob
**Dependency Decision**: the context-owned determination that a vendored candidate is accepted, rejected, or unresolved with recorded evidence and rationale.
_Avoid_: match result, package guess
**Acceptance Threshold**: the configurable confidence score boundary at or above which a vendored candidate is accepted automatically.
_Avoid_: hard-coded cutoff, fixed confidence bar
**Rejected Candidate**: a vendored candidate whose evidence deterministically supports that it is not a package match worth externalizing.
_Avoid_: low-confidence maybe, unresolved miss
**Unresolved Candidate**: a vendored candidate that remains plausible but is below the acceptance threshold, colliding, ambiguous, or otherwise unsafe to accept.
_Avoid_: rejected maybe, ignored candidate
**Fallback Preservation**: keeping bundled code available when externalization is unsafe or unresolved so later validation can compare behaviors safely.
_Avoid_: leave old code around, dead backup code
**Externalization**: replacing accepted vendored code with an external dependency reference without deleting the original bundled implementation.
_Avoid_: strip dependency, remove vendor code
## Example dialogue
> **Developer:** "If a candidate scores below the configured threshold, do we reject it?"
> **Domain Expert:** "No. If it still looks plausible, it stays unresolved until stronger evidence appears or a reviewer decides otherwise."
>
> **Developer:** "When do we mark a candidate rejected?"
> **Domain Expert:** "Only when the evidence deterministically says it is not a vendored package match worth externalizing."
>
> **Developer:** "Does accepting a vendored package delete the bundled code?"
> **Domain Expert:** "No. Externalization still preserves the bundled implementation as fallback evidence."
## Flagged ambiguities
- "Low confidence" alone does not mean `rejected`; low-confidence but still plausible candidates are `Unresolved Candidates`.
- "Match result" was too scoring-shaped; the preferred term is `Dependency Decision` because this context owns a reviewer-facing decision with rationale.
@@ -0,0 +1,59 @@
# Slice Discovery: Identify Vendored Packages
- Bounded context: `dependency-recovery`
- Workflow slug: `identify-vendored-packages`
## Happy Path
- The workflow receives deterministic ingest artifacts from `ingest-snapshot`, including the run manifest, segment records, and canonical source projection.
- The workflow scans segment and segment-group candidates for vendored-package evidence.
- The workflow recovers probable package boundaries when one vendored package spans multiple related segments.
- The workflow computes a deterministic confidence score for each candidate match by combining the configured evidence signals.
- The workflow compares each confidence score against the configurable acceptance threshold.
- Candidates at or above the acceptance threshold become `accepted` dependency decisions.
- Candidates that remain plausible but are below threshold, colliding, or ambiguous become `unresolved` dependency decisions.
- Candidates whose evidence deterministically supports that they are not vendored package matches become `rejected` dependency decisions.
- The workflow emits a manifest of accepted, rejected, and unresolved dependency decisions with evidence, rationale, boundary notes, and replacement planning hints.
- Downstream `externalize-accepted-dependencies` consumes only the accepted decisions for externalization while preserving unresolved and rejected records for review and summaries.
## Edge Cases
- A package is split across multiple obfuscated wrappers -> recover one vendored candidate boundary spanning the related segments and record the grouping rationale.
- Multiple package matches compete for the same segment group -> keep the stronger deterministic ranking, but if the competition remains plausible and unsafe to settle automatically, emit `unresolved` rather than forcing rejection.
- A candidate has some evidence but stays below the configured acceptance threshold -> emit `unresolved`.
- A candidate has strong counter-evidence that it is app-authored or otherwise not a vendored package match -> emit `rejected`.
- Runtime traces are unavailable -> continue with static evidence only.
- Runtime traces conflict with static evidence -> record the mixed provenance and prefer `unresolved` unless the conflict is resolved deterministically.
- A candidate package boundary is only partially recoverable -> record the partial boundary notes and keep the decision unresolved unless the recoverable portion still supports a deterministic accepted or rejected decision.
- Two runs use different threshold settings -> preserve the same confidence scores and evidence, but allow the configurable threshold to change which candidates become accepted.
## Business Rules & Invariants
- Rule: Confidence scoring is deterministic for identical ingest artifacts, evidence sources, and configuration.
- Rule: Acceptance uses a configurable threshold rather than a hard-coded cutoff.
- Rule: `Accepted` means the candidate score is at or above the configured acceptance threshold and the match is safe enough to externalize later.
- Rule: `Unresolved` is preferred over `Rejected` when a candidate remains plausible but is below threshold, colliding, ambiguous, or otherwise unsafe to settle automatically.
- Rule: `Rejected` is reserved for candidates whose evidence deterministically supports that they are not vendored package matches worth externalizing.
- Rule: The workflow records auditable evidence and rationale for every dependency decision.
- Rule: One vendored package candidate may map to multiple segments when the boundary recovery rationale supports it.
- Invariant: This slice identifies and records dependency decisions only; it does not externalize code.
- Invariant: Accepted, rejected, and unresolved candidates all remain visible in emitted manifests.
- Invariant: Changing the acceptance threshold must not require redesigning the manifest format.
## Required Decisions Owned by This Context
- Which evidence signals are combined into deterministic vendored-package confidence scoring.
- Which related segments belong to one vendored package candidate boundary.
- Whether a candidate becomes accepted, rejected, or unresolved.
- What evidence, rationale, ambiguity notes, and replacement planning hints must be persisted for downstream use.
## Handoff Assumptions
- `ingest-snapshot` provides deterministic run manifest, segment records, and canonical projection as the source of truth for candidate discovery.
- `externalize-accepted-dependencies` consumes only accepted dependency decisions for replacement work.
- Later review and release-summary seams consume accepted, rejected, and unresolved manifests without reopening this slice's decision logic.
- Optional runtime traces act only as additional evidence inside this context and do not override deterministic decision recording requirements.
## Open Questions
- None currently inside this slice; threshold tuning and evidence-weight configuration remain implementation concerns within this context.
@@ -0,0 +1,91 @@
# Core Sketch: Identify Vendored Packages
- Bounded context: `dependency-recovery`
- Workflow slug: `identify-vendored-packages`
## Command
- `IdentifyVendoredPackage`
- Meaning: evaluate one recovered vendored candidate boundary against deterministic evidence so this context can record a single `Dependency Decision` for later `Externalization`.
## Required State
State owned by `dependency-recovery` and required to decide this workflow:
- `VendoredCandidateDiscoveryRules`
- allowed evidence signals for vendored-package discovery
- grouping rules for segment and segment-group candidates
- rationale rules for recovered package boundaries
- `ConfidenceScoringRules`
- deterministic scoring weights or combination rules across evidence types
- ranking rules for competing package matches on the same candidate boundary
- tie-break rules when scores or evidence patterns compete
- `AcceptanceThresholdPolicy`
- configurable `Acceptance Threshold`
- rules for when plausible but below-threshold or colliding candidates remain `Unresolved Candidates`
- rules for when counter-evidence is strong enough to produce a `Rejected Candidate`
- `DependencyDecisionRequirements`
- required manifest fields for evidence summary, raw evidence references, confidence score, provenance, recovered boundary notes, ambiguity notes, replacement plan, and fallback reference
- required auditability rules for later review and downstream handoffs
## Observed Inputs
Snapshots or handoffs read but not owned by this context:
- one recovered `Vendored Package` candidate boundary derived from `ingest-snapshot` artifacts by outer orchestration
- the relevant `Run Manifest` facts and canonical source projection from `ingest-snapshot` for that candidate boundary
- optional runtime traces for that candidate boundary used only as additional or tie-break evidence
- optional registry, tarball, or CDN package evidence used to compare matches for that candidate boundary
## Policy Signature (Pseudo)
```text
scoreVendoredCandidate :
VendoredCandidateBoundary
-> CandidateEvidenceSources
-> ConfidenceScoringRules
-> RankedCandidateMatches
decideDependencyDecision :
VendoredCandidateBoundary
-> RankedCandidateMatches
-> AcceptanceThresholdPolicy
-> DependencyDecision
validateDecisionRecord :
DependencyDecision
-> DependencyDecisionRequirements
-> Result<DependencyDecisionRecorded, DependencyDecisionRejected>
performVendoredPackageIdentification :
IdentifyVendoredPackage
-> DependencyRecoveryState
-> Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>
```
## Events
### Success Event
- `VendoredPackageIdentified`
- run identity reference
- evaluated candidate boundary reference
- emitted single dependency decision
- emitted decision record reference
- evidence artifact references for that candidate
### Failure Event
- `VendoredPackageIdentificationHardStopped`
- run identity when available
- failed stage such as missing ingest artifacts, invalid candidate boundary inputs, or invalid decision-manifest requirements
- failure reason
## Boundary Notes
- The `dependency-recovery` context decides one candidate boundary's package candidacy, confidence ranking, and dependency decision state per workflow invocation.
- Outer orchestration is responsible for discovering, selecting, and iterating across multiple candidate boundaries; this slice must not batch those decisions itself.
- This slice does not externalize accepted packages; that belongs to `dependency-recovery/externalize-accepted-dependencies`.
- `ingest-snapshot` remains the source of truth for run manifest, segment boundaries, and canonical projection; this slice must not reopen ingest decisions.
- Optional runtime traces and package-source comparisons act only as evidence inputs here and must not turn this slice into cross-context orchestration.
- Feature-level orchestration decides whether unresolved or review-needed outcomes slow later phases; this slice only records one audit-ready dependency decision at a time.
@@ -0,0 +1,214 @@
module DependencyRecovery.IdentifyVendoredPackages
open DependencyRecovery.SharedModel
// 1. Primitives
type TaintedRunManifestReference = TaintedRunManifestReference of string
type TaintedSegmentRecordReference = TaintedSegmentRecordReference of string
type TaintedCanonicalProjectionReference = TaintedCanonicalProjectionReference of string
type TaintedRuntimeTraceReference = TaintedRuntimeTraceReference of string
type TrustedRunManifestReference = TrustedRunManifestReference of string
type TrustedSegmentRecordReference = TrustedSegmentRecordReference of string
type TrustedCanonicalProjectionReference = TrustedCanonicalProjectionReference of string
type TrustedRuntimeTraceReference = TrustedRuntimeTraceReference of string
type CandidateGroupingRule =
| GroupAdjacentSegments
| GroupStructurallyLinkedSegments
| GroupSharedLiteralClusters
| GroupSharedExportSurface
type BoundaryRationaleRule =
| RecordAdjacentGroupingRationale
| RecordStructuralLinkRationale
| RecordSharedLiteralRationale
| RecordExportSurfaceRationale
type ScoringRule =
| WeightLicenseBanner
| WeightPreservedPackageName
| WeightSourceMapHint
| WeightPreservedRequireString
| WeightCharacteristicLiteralSet
| WeightHelperSignature
| WeightAstShapeFingerprint
| WeightExportSurfaceSimilarity
| WeightDependencyGraphPosition
| WeightByteSimilarity
| WeightRuntimeExecutionTrace
type RankingRule =
| RankByTotalEvidenceWeight
| RankByEvidenceDiversity
| RankByBoundaryCoverage
| RankByRuntimeSupport
type TieBreakRule =
| PreferMoreSpecificPackageMatch
| PreferBroaderBoundaryCoverage
| PreferStaticEvidenceAgreement
| PreferStablePackageNameOrder
type UnresolvedRule =
| KeepBelowThresholdCandidatesUnresolved
| KeepCollidingCandidatesUnresolved
| KeepAmbiguousCandidatesUnresolved
| KeepConflictingEvidenceCandidatesUnresolved
type RejectedRule =
| RejectDeterministicallyAppAuthoredCandidates
| RejectDeterministicallyNonPackageCandidates
| RejectDeterministicallyContradictedCandidates
type RequiredManifestField =
| CandidatePackageNameField
| DecisionStateField
| ConfidenceScoreField
| EvidenceSummaryField
| RawEvidenceReferencesField
| MatchedSegmentIdsField
| RecoveredBoundaryNotesField
| ReplacementPlanField
| FallbackReferenceField
| EvidenceProvenanceField
| AmbiguityNotesField
type AuditabilityRule =
| RecordDecisionRationale
| RecordThresholdUsed
| RecordScoringInputs
| RecordCompetingMatches
| RecordDecisionTimestampOrder
// 2. Commands (Inputs)
type TaintedCandidateBoundaryReference = TaintedCandidateBoundaryReference of string
type TrustedCandidateBoundaryReference = TrustedCandidateBoundaryReference of string
type IdentifyVendoredPackage = {
runManifest: TaintedRunManifestReference
canonicalProjection: TaintedCanonicalProjectionReference
candidateBoundary: TaintedCandidateBoundaryReference
runtimeTraces: TaintedRuntimeTraceReference option
}
// 3. Observed inputs and owned state
type TrustedCandidateInput = {
runIdentity: RunIdentity
runManifest: TrustedRunManifestReference
canonicalProjection: TrustedCanonicalProjectionReference
candidateBoundary: TrustedCandidateBoundaryReference
runtimeTraces: TrustedRuntimeTraceReference option
}
type VendoredCandidateDiscoveryRules = {
allowedSignals: EvidenceSignal list
groupingRules: CandidateGroupingRule list
boundaryRationaleRules: BoundaryRationaleRule list
}
type ConfidenceScoringRules = {
scoringRules: ScoringRule list
rankingRules: RankingRule list
tieBreakRules: TieBreakRule list
}
type AcceptanceThresholdPolicy = {
acceptanceThreshold: ConfidenceScore
unresolvedRules: UnresolvedRule list
rejectedRules: RejectedRule list
}
type DependencyDecisionRequirements = {
requiredManifestFields: RequiredManifestField list
auditabilityRules: AuditabilityRule list
}
type DependencyRecoveryState = {
candidateDiscoveryRules: VendoredCandidateDiscoveryRules
confidenceScoringRules: ConfidenceScoringRules
acceptanceThresholdPolicy: AcceptanceThresholdPolicy
dependencyDecisionRequirements: DependencyDecisionRequirements
}
// 4. Events (Facts)
type DecisionRecordReference = DecisionRecordReference of string
type VendoredPackageIdentified = {
runIdentity: RunIdentity
candidateBoundary: TrustedCandidateBoundaryReference
dependencyDecision: DependencyDecision
decisionRecord: DecisionRecordReference
evidenceArtifacts: EvidenceReference list
}
type VendoredPackageIdentificationStage =
| CandidateInputParsingStage
| CandidateScoringStage
| DependencyDecisionStage
| DecisionRecordValidationStage
type VendoredPackageIdentificationFailureReason =
| MissingIngestArtifacts
| InvalidIngestArtifactReference
| InvalidCandidateBoundaryReference
| InvalidDecisionRecordRequirements
type VendoredPackageIdentificationHardStopped = {
runIdentity: RunIdentity option
failedStage: VendoredPackageIdentificationStage
reason: VendoredPackageIdentificationFailureReason
}
// 5. State (Aggregate)
type DependencyIdentificationState =
| AwaitingVendoredPackageIdentification of DependencyRecoveryState
| VendoredPackageDecisionRecorded of VendoredPackageIdentified
// 6. Parse and decision contracts
val parseCandidateInput :
IdentifyVendoredPackage
-> Result<TrustedCandidateInput, VendoredPackageIdentificationHardStopped>
val scoreCandidateMatches :
TrustedCandidateInput
-> ConfidenceScoringRules
-> Result<CandidateMatch list, VendoredPackageIdentificationHardStopped>
val decideDependencyDecision :
AcceptanceThresholdPolicy
-> TrustedCandidateBoundaryReference
-> CandidateMatch list
-> Result<DependencyDecision, VendoredPackageIdentificationHardStopped>
val validateDecisionRecord :
DependencyDecisionRequirements
-> DependencyDecision
-> Result<DecisionRecordReference, VendoredPackageIdentificationHardStopped>
val decide :
DependencyIdentificationState
-> IdentifyVendoredPackage
-> Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>
val apply :
DependencyIdentificationState
-> VendoredPackageIdentified
-> DependencyIdentificationState
val workflow :
IdentifyVendoredPackage
-> Effect.Effect<Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>>
@@ -0,0 +1,69 @@
module DependencyRecovery.SharedModel
type RunIdentity = RunIdentity of string
type SegmentId = SegmentId of string
type CandidateBoundaryId = CandidateBoundaryId of string
type PackageName = PackageName of string
type ConfidenceScore = private ConfidenceScore of int
type EvidenceReference = EvidenceReference of string
type BoundaryNote = BoundaryNote of string
type AmbiguityNote = AmbiguityNote of string
type ReplacementPlan = ReplacementPlan of string
type FallbackReference = FallbackReference of string
type Rationale = Rationale of string
type EvidenceProvenance =
| Registry
| Tarball
| Cdn
| Runtime
| Static
| Mixed
type EvidenceSignal =
| LicenseBanner
| PreservedPackageName
| SourceMapHint
| PreservedRequireString
| CharacteristicLiteralSet
| HelperSignature
| AstShapeFingerprint
| ExportSurfaceSimilarity
| DependencyGraphPosition
| ByteSimilarity
| RuntimeExecutionTrace
type CandidateBoundary = {
boundaryId: CandidateBoundaryId
segmentIds: SegmentId list
boundaryNotes: BoundaryNote list
}
type EvidenceSummary = {
signals: EvidenceSignal list
rawEvidence: EvidenceReference list
provenance: EvidenceProvenance
rationale: Rationale
}
type CandidateMatch = {
packageName: PackageName
confidenceScore: ConfidenceScore
evidence: EvidenceSummary
ambiguityNotes: AmbiguityNote list
}
type DependencyDecision =
| AcceptedDecision of CandidateBoundary * CandidateMatch * ReplacementPlan * FallbackReference
| RejectedDecision of CandidateBoundary * CandidateMatch
| UnresolvedDecision of CandidateBoundary * CandidateMatch list