Empirical validation addendum to the TAC-QSW Conjecture Technical White Paper v1.0
The TAC-QSW Conjecture predicted that Spearman rank correlation between v1 (semantic analysis) and v2 (QSW structural analysis) Eligibility Index scores would exceed 0.70 across a sufficient dataset. Phase 1 validation, conducted across 69 valid v1/v2 comparison pairs spanning biomedical investment, drug development, and M&A decisions, confirms this prediction.
All 69 valid comparison pairs were drawn from the biomedical investment domain, covering drug development advancement decisions, M&A transactions, clinical trial initiation, regulatory filing decisions, and portfolio investment decisions. Evidence layers included clinical trial data, regulatory status, financial terms, strategic positioning, and market analysis.
Every decision submitted to the TAC-3D platform triggers both engines in sequence: v1 (Claude Sonnet semantic analysis) runs first, producing a structured JSON result including Eligibility Index, verdict, structural summary, and tension map. Upon v1 completion, v2 (QSW engine) is triggered as a background process, receiving both the original evidence layers and v1 context as input.
The original protocol specified that v2 would run independently of v1, using only the evidence layers as input. During Phase 1 accumulation, a systematic divergence was identified: v1 scores consistently exceeded v2 scores by 20–50 points in cases where the evidence layers referenced well-established clinical outcomes or regulatory approvals. Investigation revealed that v2's compatibility matrix generator lacked access to the domain knowledge encoded in v1's semantic analysis.
The v1-Context Injection Protocol was validated to reduce mean absolute divergence from 49.7 points to 4.1–10.0 points on identical test cases, confirming that the original divergence was attributable to domain knowledge asymmetry rather than structural inference failure.
Of 104 total screenings accumulated during Phase 1, 35 were excluded from the primary dataset. Exclusion criteria: (a) rate limit errors during v1 execution resulting in null v1-context transmission; (b) compatibility matrix generation failures producing degenerate outputs (T=100, A=100 without valid cross-layer computation); (c) duplicate decision submissions.
All 69 valid comparison pairs were classified into four divergence categories based on verdict agreement and E-score differential:
v1 Eligibility Index scores exceed v2 scores by an average of +12.9 points across the dataset. This systematic bias is consistent with the conjecture's prediction in Section 5.3: v1 introduces semantic framing and domain context that goes beyond structural compatibility analysis. Decisions with strong domain validation (approved therapies, completed trials) receive higher v1 scores because Claude's training knowledge confirms the evidence's real-world validity — information unavailable to pure structural analysis.
The amendment to inject v1 structural context into the v2 compatibility matrix generation produced a measurable improvement in correlation:
| Metric | Before Amendment | After Amendment | Change |
|---|---|---|---|
| Spearman ρ | 0.555 | 0.739 | +0.184 |
| Exact verdict match | 21.7% | 27.5% | +5.8pp |
| E-score within 15pts | 42.6% | 52.2% | +9.6pp |
| Avg absolute diff | 18.9 pts | 15.6 pts | −3.3 pts |
| Systematic bias (v1−v2) | +10.2 pts | +12.9 pts | Mixed |
The increase in systematic bias after amendment reflects the dataset composition: post-amendment comparisons include more cases with strong domain knowledge (approved therapies, completed Phase 3 trials) where v1 correctly assigns high scores based on real-world outcomes. This is an expected artifact of domain-weighted sampling, not a regression.
Phase 2 targets 200 valid comparison pairs across three domains: biomedical investment (extending Phase 1), legal decision-making (via TAC-Legal), and general strategic decisions (via TAC Agent). Cross-domain comparison will test whether the conjecture holds outside the biomedical domain.
Regression analysis against the Phase 1 dataset will derive domain-specific weighting coefficients for the Eligibility Index formula. The hypothesis is that biomedical investment decisions require higher Convergence weighting (0.35 vs 0.25 default) due to the multi-scenario nature of clinical development.
The 16 DOMAIN_KNOWLEDGE divergence cases and 34 MIXED cases will be manually reviewed to classify divergences as: (a) semantic framing artifacts, (b) genuine structural incompatibilities missed by v1, or (c) QSW calibration issues. Cases in category (b) — where v2 correctly identifies structural problems that v1 missed — are the most theoretically valuable.