Compare commits
3 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 8fca79e968 | |||
| 2c4df3998a | |||
| 121336a190 |
@@ -0,0 +1,38 @@
|
||||
# Dependency Recovery Context
|
||||
|
||||
**Vendored Package**: third-party code embedded inside the upstream bundle and considered for recovery as an external dependency.
|
||||
_Avoid_: library blob, bundled package blob
|
||||
|
||||
**Dependency Decision**: the context-owned determination that a vendored candidate is accepted, rejected, or unresolved with recorded evidence and rationale.
|
||||
_Avoid_: match result, package guess
|
||||
|
||||
**Acceptance Threshold**: the configurable confidence score boundary at or above which a vendored candidate is accepted automatically.
|
||||
_Avoid_: hard-coded cutoff, fixed confidence bar
|
||||
|
||||
**Rejected Candidate**: a vendored candidate whose evidence deterministically supports that it is not a package match worth externalizing.
|
||||
_Avoid_: low-confidence maybe, unresolved miss
|
||||
|
||||
**Unresolved Candidate**: a vendored candidate that remains plausible but is below the acceptance threshold, colliding, ambiguous, or otherwise unsafe to accept.
|
||||
_Avoid_: rejected maybe, ignored candidate
|
||||
|
||||
**Fallback Preservation**: keeping bundled code available when externalization is unsafe or unresolved so later validation can compare behaviors safely.
|
||||
_Avoid_: leave old code around, dead backup code
|
||||
|
||||
**Externalization**: replacing accepted vendored code with an external dependency reference without deleting the original bundled implementation.
|
||||
_Avoid_: strip dependency, remove vendor code
|
||||
|
||||
## Example dialogue
|
||||
|
||||
> **Developer:** "If a candidate scores below the configured threshold, do we reject it?"
|
||||
> **Domain Expert:** "No. If it still looks plausible, it stays unresolved until stronger evidence appears or a reviewer decides otherwise."
|
||||
>
|
||||
> **Developer:** "When do we mark a candidate rejected?"
|
||||
> **Domain Expert:** "Only when the evidence deterministically says it is not a vendored package match worth externalizing."
|
||||
>
|
||||
> **Developer:** "Does accepting a vendored package delete the bundled code?"
|
||||
> **Domain Expert:** "No. Externalization still preserves the bundled implementation as fallback evidence."
|
||||
|
||||
## Flagged ambiguities
|
||||
|
||||
- "Low confidence" alone does not mean `rejected`; low-confidence but still plausible candidates are `Unresolved Candidates`.
|
||||
- "Match result" was too scoring-shaped; the preferred term is `Dependency Decision` because this context owns a reviewer-facing decision with rationale.
|
||||
@@ -0,0 +1,59 @@
|
||||
# Slice Discovery: Identify Vendored Packages
|
||||
|
||||
- Bounded context: `dependency-recovery`
|
||||
- Workflow slug: `identify-vendored-packages`
|
||||
|
||||
## Happy Path
|
||||
|
||||
- The workflow receives deterministic ingest artifacts from `ingest-snapshot`, including the run manifest, segment records, and canonical source projection.
|
||||
- The workflow scans segment and segment-group candidates for vendored-package evidence.
|
||||
- The workflow recovers probable package boundaries when one vendored package spans multiple related segments.
|
||||
- The workflow computes a deterministic confidence score for each candidate match by combining the configured evidence signals.
|
||||
- The workflow compares each confidence score against the configurable acceptance threshold.
|
||||
- Candidates at or above the acceptance threshold become `accepted` dependency decisions.
|
||||
- Candidates that remain plausible but are below threshold, colliding, or ambiguous become `unresolved` dependency decisions.
|
||||
- Candidates whose evidence deterministically supports that they are not vendored package matches become `rejected` dependency decisions.
|
||||
- The workflow emits a manifest of accepted, rejected, and unresolved dependency decisions with evidence, rationale, boundary notes, and replacement planning hints.
|
||||
- Downstream `externalize-accepted-dependencies` consumes only the accepted decisions for externalization while preserving unresolved and rejected records for review and summaries.
|
||||
|
||||
## Edge Cases
|
||||
|
||||
- A package is split across multiple obfuscated wrappers -> recover one vendored candidate boundary spanning the related segments and record the grouping rationale.
|
||||
- Multiple package matches compete for the same segment group -> keep the stronger deterministic ranking, but if the competition remains plausible and unsafe to settle automatically, emit `unresolved` rather than forcing rejection.
|
||||
- A candidate has some evidence but stays below the configured acceptance threshold -> emit `unresolved`.
|
||||
- A candidate has strong counter-evidence that it is app-authored or otherwise not a vendored package match -> emit `rejected`.
|
||||
- Runtime traces are unavailable -> continue with static evidence only.
|
||||
- Runtime traces conflict with static evidence -> record the mixed provenance and prefer `unresolved` unless the conflict is resolved deterministically.
|
||||
- A candidate package boundary is only partially recoverable -> record the partial boundary notes and keep the decision unresolved unless the recoverable portion still supports a deterministic accepted or rejected decision.
|
||||
- Two runs use different threshold settings -> preserve the same confidence scores and evidence, but allow the configurable threshold to change which candidates become accepted.
|
||||
|
||||
## Business Rules & Invariants
|
||||
|
||||
- Rule: Confidence scoring is deterministic for identical ingest artifacts, evidence sources, and configuration.
|
||||
- Rule: Acceptance uses a configurable threshold rather than a hard-coded cutoff.
|
||||
- Rule: `Accepted` means the candidate score is at or above the configured acceptance threshold and the match is safe enough to externalize later.
|
||||
- Rule: `Unresolved` is preferred over `Rejected` when a candidate remains plausible but is below threshold, colliding, ambiguous, or otherwise unsafe to settle automatically.
|
||||
- Rule: `Rejected` is reserved for candidates whose evidence deterministically supports that they are not vendored package matches worth externalizing.
|
||||
- Rule: The workflow records auditable evidence and rationale for every dependency decision.
|
||||
- Rule: One vendored package candidate may map to multiple segments when the boundary recovery rationale supports it.
|
||||
- Invariant: This slice identifies and records dependency decisions only; it does not externalize code.
|
||||
- Invariant: Accepted, rejected, and unresolved candidates all remain visible in emitted manifests.
|
||||
- Invariant: Changing the acceptance threshold must not require redesigning the manifest format.
|
||||
|
||||
## Required Decisions Owned by This Context
|
||||
|
||||
- Which evidence signals are combined into deterministic vendored-package confidence scoring.
|
||||
- Which related segments belong to one vendored package candidate boundary.
|
||||
- Whether a candidate becomes accepted, rejected, or unresolved.
|
||||
- What evidence, rationale, ambiguity notes, and replacement planning hints must be persisted for downstream use.
|
||||
|
||||
## Handoff Assumptions
|
||||
|
||||
- `ingest-snapshot` provides deterministic run manifest, segment records, and canonical projection as the source of truth for candidate discovery.
|
||||
- `externalize-accepted-dependencies` consumes only accepted dependency decisions for replacement work.
|
||||
- Later review and release-summary seams consume accepted, rejected, and unresolved manifests without reopening this slice's decision logic.
|
||||
- Optional runtime traces act only as additional evidence inside this context and do not override deterministic decision recording requirements.
|
||||
|
||||
## Open Questions
|
||||
|
||||
- None currently inside this slice; threshold tuning and evidence-weight configuration remain implementation concerns within this context.
|
||||
@@ -0,0 +1,91 @@
|
||||
# Core Sketch: Identify Vendored Packages
|
||||
|
||||
- Bounded context: `dependency-recovery`
|
||||
- Workflow slug: `identify-vendored-packages`
|
||||
|
||||
## Command
|
||||
|
||||
- `IdentifyVendoredPackage`
|
||||
- Meaning: evaluate one recovered vendored candidate boundary against deterministic evidence so this context can record a single `Dependency Decision` for later `Externalization`.
|
||||
|
||||
## Required State
|
||||
|
||||
State owned by `dependency-recovery` and required to decide this workflow:
|
||||
|
||||
- `VendoredCandidateDiscoveryRules`
|
||||
- allowed evidence signals for vendored-package discovery
|
||||
- grouping rules for segment and segment-group candidates
|
||||
- rationale rules for recovered package boundaries
|
||||
- `ConfidenceScoringRules`
|
||||
- deterministic scoring weights or combination rules across evidence types
|
||||
- ranking rules for competing package matches on the same candidate boundary
|
||||
- tie-break rules when scores or evidence patterns compete
|
||||
- `AcceptanceThresholdPolicy`
|
||||
- configurable `Acceptance Threshold`
|
||||
- rules for when plausible but below-threshold or colliding candidates remain `Unresolved Candidates`
|
||||
- rules for when counter-evidence is strong enough to produce a `Rejected Candidate`
|
||||
- `DependencyDecisionRequirements`
|
||||
- required manifest fields for evidence summary, raw evidence references, confidence score, provenance, recovered boundary notes, ambiguity notes, replacement plan, and fallback reference
|
||||
- required auditability rules for later review and downstream handoffs
|
||||
|
||||
## Observed Inputs
|
||||
|
||||
Snapshots or handoffs read but not owned by this context:
|
||||
|
||||
- one recovered `Vendored Package` candidate boundary derived from `ingest-snapshot` artifacts by outer orchestration
|
||||
- the relevant `Run Manifest` facts and canonical source projection from `ingest-snapshot` for that candidate boundary
|
||||
- optional runtime traces for that candidate boundary used only as additional or tie-break evidence
|
||||
- optional registry, tarball, or CDN package evidence used to compare matches for that candidate boundary
|
||||
|
||||
## Policy Signature (Pseudo)
|
||||
|
||||
```text
|
||||
scoreVendoredCandidate :
|
||||
VendoredCandidateBoundary
|
||||
-> CandidateEvidenceSources
|
||||
-> ConfidenceScoringRules
|
||||
-> RankedCandidateMatches
|
||||
|
||||
decideDependencyDecision :
|
||||
VendoredCandidateBoundary
|
||||
-> RankedCandidateMatches
|
||||
-> AcceptanceThresholdPolicy
|
||||
-> DependencyDecision
|
||||
|
||||
validateDecisionRecord :
|
||||
DependencyDecision
|
||||
-> DependencyDecisionRequirements
|
||||
-> Result<DependencyDecisionRecorded, DependencyDecisionRejected>
|
||||
|
||||
performVendoredPackageIdentification :
|
||||
IdentifyVendoredPackage
|
||||
-> DependencyRecoveryState
|
||||
-> Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>
|
||||
```
|
||||
|
||||
## Events
|
||||
|
||||
### Success Event
|
||||
|
||||
- `VendoredPackageIdentified`
|
||||
- run identity reference
|
||||
- evaluated candidate boundary reference
|
||||
- emitted single dependency decision
|
||||
- emitted decision record reference
|
||||
- evidence artifact references for that candidate
|
||||
|
||||
### Failure Event
|
||||
|
||||
- `VendoredPackageIdentificationHardStopped`
|
||||
- run identity when available
|
||||
- failed stage such as missing ingest artifacts, invalid candidate boundary inputs, or invalid decision-manifest requirements
|
||||
- failure reason
|
||||
|
||||
## Boundary Notes
|
||||
|
||||
- The `dependency-recovery` context decides one candidate boundary's package candidacy, confidence ranking, and dependency decision state per workflow invocation.
|
||||
- Outer orchestration is responsible for discovering, selecting, and iterating across multiple candidate boundaries; this slice must not batch those decisions itself.
|
||||
- This slice does not externalize accepted packages; that belongs to `dependency-recovery/externalize-accepted-dependencies`.
|
||||
- `ingest-snapshot` remains the source of truth for run manifest, segment boundaries, and canonical projection; this slice must not reopen ingest decisions.
|
||||
- Optional runtime traces and package-source comparisons act only as evidence inputs here and must not turn this slice into cross-context orchestration.
|
||||
- Feature-level orchestration decides whether unresolved or review-needed outcomes slow later phases; this slice only records one audit-ready dependency decision at a time.
|
||||
@@ -0,0 +1,214 @@
|
||||
module DependencyRecovery.IdentifyVendoredPackages
|
||||
|
||||
open DependencyRecovery.SharedModel
|
||||
|
||||
// 1. Primitives
|
||||
|
||||
type TaintedRunManifestReference = TaintedRunManifestReference of string
|
||||
|
||||
type TaintedSegmentRecordReference = TaintedSegmentRecordReference of string
|
||||
|
||||
type TaintedCanonicalProjectionReference = TaintedCanonicalProjectionReference of string
|
||||
|
||||
type TaintedRuntimeTraceReference = TaintedRuntimeTraceReference of string
|
||||
|
||||
type TrustedRunManifestReference = TrustedRunManifestReference of string
|
||||
|
||||
type TrustedSegmentRecordReference = TrustedSegmentRecordReference of string
|
||||
|
||||
type TrustedCanonicalProjectionReference = TrustedCanonicalProjectionReference of string
|
||||
|
||||
type TrustedRuntimeTraceReference = TrustedRuntimeTraceReference of string
|
||||
|
||||
type CandidateGroupingRule =
|
||||
| GroupAdjacentSegments
|
||||
| GroupStructurallyLinkedSegments
|
||||
| GroupSharedLiteralClusters
|
||||
| GroupSharedExportSurface
|
||||
|
||||
type BoundaryRationaleRule =
|
||||
| RecordAdjacentGroupingRationale
|
||||
| RecordStructuralLinkRationale
|
||||
| RecordSharedLiteralRationale
|
||||
| RecordExportSurfaceRationale
|
||||
|
||||
type ScoringRule =
|
||||
| WeightLicenseBanner
|
||||
| WeightPreservedPackageName
|
||||
| WeightSourceMapHint
|
||||
| WeightPreservedRequireString
|
||||
| WeightCharacteristicLiteralSet
|
||||
| WeightHelperSignature
|
||||
| WeightAstShapeFingerprint
|
||||
| WeightExportSurfaceSimilarity
|
||||
| WeightDependencyGraphPosition
|
||||
| WeightByteSimilarity
|
||||
| WeightRuntimeExecutionTrace
|
||||
|
||||
type RankingRule =
|
||||
| RankByTotalEvidenceWeight
|
||||
| RankByEvidenceDiversity
|
||||
| RankByBoundaryCoverage
|
||||
| RankByRuntimeSupport
|
||||
|
||||
type TieBreakRule =
|
||||
| PreferMoreSpecificPackageMatch
|
||||
| PreferBroaderBoundaryCoverage
|
||||
| PreferStaticEvidenceAgreement
|
||||
| PreferStablePackageNameOrder
|
||||
|
||||
type UnresolvedRule =
|
||||
| KeepBelowThresholdCandidatesUnresolved
|
||||
| KeepCollidingCandidatesUnresolved
|
||||
| KeepAmbiguousCandidatesUnresolved
|
||||
| KeepConflictingEvidenceCandidatesUnresolved
|
||||
|
||||
type RejectedRule =
|
||||
| RejectDeterministicallyAppAuthoredCandidates
|
||||
| RejectDeterministicallyNonPackageCandidates
|
||||
| RejectDeterministicallyContradictedCandidates
|
||||
|
||||
type RequiredManifestField =
|
||||
| CandidatePackageNameField
|
||||
| DecisionStateField
|
||||
| ConfidenceScoreField
|
||||
| EvidenceSummaryField
|
||||
| RawEvidenceReferencesField
|
||||
| MatchedSegmentIdsField
|
||||
| RecoveredBoundaryNotesField
|
||||
| ReplacementPlanField
|
||||
| FallbackReferenceField
|
||||
| EvidenceProvenanceField
|
||||
| AmbiguityNotesField
|
||||
|
||||
type AuditabilityRule =
|
||||
| RecordDecisionRationale
|
||||
| RecordThresholdUsed
|
||||
| RecordScoringInputs
|
||||
| RecordCompetingMatches
|
||||
| RecordDecisionTimestampOrder
|
||||
|
||||
// 2. Commands (Inputs)
|
||||
|
||||
type TaintedCandidateBoundaryReference = TaintedCandidateBoundaryReference of string
|
||||
|
||||
type TrustedCandidateBoundaryReference = TrustedCandidateBoundaryReference of string
|
||||
|
||||
type IdentifyVendoredPackage = {
|
||||
runManifest: TaintedRunManifestReference
|
||||
canonicalProjection: TaintedCanonicalProjectionReference
|
||||
candidateBoundary: TaintedCandidateBoundaryReference
|
||||
runtimeTraces: TaintedRuntimeTraceReference option
|
||||
}
|
||||
|
||||
// 3. Observed inputs and owned state
|
||||
|
||||
type TrustedCandidateInput = {
|
||||
runIdentity: RunIdentity
|
||||
runManifest: TrustedRunManifestReference
|
||||
canonicalProjection: TrustedCanonicalProjectionReference
|
||||
candidateBoundary: TrustedCandidateBoundaryReference
|
||||
runtimeTraces: TrustedRuntimeTraceReference option
|
||||
}
|
||||
|
||||
type VendoredCandidateDiscoveryRules = {
|
||||
allowedSignals: EvidenceSignal list
|
||||
groupingRules: CandidateGroupingRule list
|
||||
boundaryRationaleRules: BoundaryRationaleRule list
|
||||
}
|
||||
|
||||
type ConfidenceScoringRules = {
|
||||
scoringRules: ScoringRule list
|
||||
rankingRules: RankingRule list
|
||||
tieBreakRules: TieBreakRule list
|
||||
}
|
||||
|
||||
type AcceptanceThresholdPolicy = {
|
||||
acceptanceThreshold: ConfidenceScore
|
||||
unresolvedRules: UnresolvedRule list
|
||||
rejectedRules: RejectedRule list
|
||||
}
|
||||
|
||||
type DependencyDecisionRequirements = {
|
||||
requiredManifestFields: RequiredManifestField list
|
||||
auditabilityRules: AuditabilityRule list
|
||||
}
|
||||
|
||||
type DependencyRecoveryState = {
|
||||
candidateDiscoveryRules: VendoredCandidateDiscoveryRules
|
||||
confidenceScoringRules: ConfidenceScoringRules
|
||||
acceptanceThresholdPolicy: AcceptanceThresholdPolicy
|
||||
dependencyDecisionRequirements: DependencyDecisionRequirements
|
||||
}
|
||||
|
||||
// 4. Events (Facts)
|
||||
|
||||
type DecisionRecordReference = DecisionRecordReference of string
|
||||
|
||||
type VendoredPackageIdentified = {
|
||||
runIdentity: RunIdentity
|
||||
candidateBoundary: TrustedCandidateBoundaryReference
|
||||
dependencyDecision: DependencyDecision
|
||||
decisionRecord: DecisionRecordReference
|
||||
evidenceArtifacts: EvidenceReference list
|
||||
}
|
||||
|
||||
type VendoredPackageIdentificationStage =
|
||||
| CandidateInputParsingStage
|
||||
| CandidateScoringStage
|
||||
| DependencyDecisionStage
|
||||
| DecisionRecordValidationStage
|
||||
|
||||
type VendoredPackageIdentificationFailureReason =
|
||||
| MissingIngestArtifacts
|
||||
| InvalidIngestArtifactReference
|
||||
| InvalidCandidateBoundaryReference
|
||||
| InvalidDecisionRecordRequirements
|
||||
|
||||
type VendoredPackageIdentificationHardStopped = {
|
||||
runIdentity: RunIdentity option
|
||||
failedStage: VendoredPackageIdentificationStage
|
||||
reason: VendoredPackageIdentificationFailureReason
|
||||
}
|
||||
|
||||
// 5. State (Aggregate)
|
||||
|
||||
type DependencyIdentificationState =
|
||||
| AwaitingVendoredPackageIdentification of DependencyRecoveryState
|
||||
| VendoredPackageDecisionRecorded of VendoredPackageIdentified
|
||||
|
||||
// 6. Parse and decision contracts
|
||||
|
||||
val parseCandidateInput :
|
||||
IdentifyVendoredPackage
|
||||
-> Result<TrustedCandidateInput, VendoredPackageIdentificationHardStopped>
|
||||
|
||||
val scoreCandidateMatches :
|
||||
TrustedCandidateInput
|
||||
-> ConfidenceScoringRules
|
||||
-> Result<CandidateMatch list, VendoredPackageIdentificationHardStopped>
|
||||
|
||||
val decideDependencyDecision :
|
||||
AcceptanceThresholdPolicy
|
||||
-> TrustedCandidateBoundaryReference
|
||||
-> CandidateMatch list
|
||||
-> Result<DependencyDecision, VendoredPackageIdentificationHardStopped>
|
||||
|
||||
val validateDecisionRecord :
|
||||
DependencyDecisionRequirements
|
||||
-> DependencyDecision
|
||||
-> Result<DecisionRecordReference, VendoredPackageIdentificationHardStopped>
|
||||
|
||||
val decide :
|
||||
DependencyIdentificationState
|
||||
-> IdentifyVendoredPackage
|
||||
-> Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>
|
||||
|
||||
val apply :
|
||||
DependencyIdentificationState
|
||||
-> VendoredPackageIdentified
|
||||
-> DependencyIdentificationState
|
||||
|
||||
val workflow :
|
||||
IdentifyVendoredPackage
|
||||
-> Effect.Effect<Result<VendoredPackageIdentified, VendoredPackageIdentificationHardStopped>>
|
||||
@@ -0,0 +1,69 @@
|
||||
module DependencyRecovery.SharedModel
|
||||
|
||||
type RunIdentity = RunIdentity of string
|
||||
|
||||
type SegmentId = SegmentId of string
|
||||
|
||||
type CandidateBoundaryId = CandidateBoundaryId of string
|
||||
|
||||
type PackageName = PackageName of string
|
||||
|
||||
type ConfidenceScore = private ConfidenceScore of int
|
||||
|
||||
type EvidenceReference = EvidenceReference of string
|
||||
|
||||
type BoundaryNote = BoundaryNote of string
|
||||
|
||||
type AmbiguityNote = AmbiguityNote of string
|
||||
|
||||
type ReplacementPlan = ReplacementPlan of string
|
||||
|
||||
type FallbackReference = FallbackReference of string
|
||||
|
||||
type Rationale = Rationale of string
|
||||
|
||||
type EvidenceProvenance =
|
||||
| Registry
|
||||
| Tarball
|
||||
| Cdn
|
||||
| Runtime
|
||||
| Static
|
||||
| Mixed
|
||||
|
||||
type EvidenceSignal =
|
||||
| LicenseBanner
|
||||
| PreservedPackageName
|
||||
| SourceMapHint
|
||||
| PreservedRequireString
|
||||
| CharacteristicLiteralSet
|
||||
| HelperSignature
|
||||
| AstShapeFingerprint
|
||||
| ExportSurfaceSimilarity
|
||||
| DependencyGraphPosition
|
||||
| ByteSimilarity
|
||||
| RuntimeExecutionTrace
|
||||
|
||||
type CandidateBoundary = {
|
||||
boundaryId: CandidateBoundaryId
|
||||
segmentIds: SegmentId list
|
||||
boundaryNotes: BoundaryNote list
|
||||
}
|
||||
|
||||
type EvidenceSummary = {
|
||||
signals: EvidenceSignal list
|
||||
rawEvidence: EvidenceReference list
|
||||
provenance: EvidenceProvenance
|
||||
rationale: Rationale
|
||||
}
|
||||
|
||||
type CandidateMatch = {
|
||||
packageName: PackageName
|
||||
confidenceScore: ConfidenceScore
|
||||
evidence: EvidenceSummary
|
||||
ambiguityNotes: AmbiguityNote list
|
||||
}
|
||||
|
||||
type DependencyDecision =
|
||||
| AcceptedDecision of CandidateBoundary * CandidateMatch * ReplacementPlan * FallbackReference
|
||||
| RejectedDecision of CandidateBoundary * CandidateMatch
|
||||
| UnresolvedDecision of CandidateBoundary * CandidateMatch list
|
||||
Reference in New Issue
Block a user