Predict which clinical-stage drugs are most likely to have catastrophic safety failures — based on modality, target, and 30 years of historical patterns.
A pure-Python analysis tool built on a curated dataset of 63 notable drug safety events across 8 therapeutic modalities (1993–2024). No ML, no dependencies — just historical rates and risk multipliers that surface real patterns in drug development failure.
Built by Kyle Stringfellow — environmental consultant turned biotech tools developer.
Drug safety failures destroy shareholder value and harm patients. Certain modalities — gene therapy, CAR-T, ADCs — have well-documented patterns of catastrophic events that repeat across programs. This tool quantifies those patterns so you can assess risk before it materializes.
Key findings from the dataset:
- CAR-T has the highest patient death rate (37.5% of recorded events)
- Gene therapy leads in clinical holds (33.3%) and has the highest base risk score (72/100)
- Cell therapy and small molecules tie for termination rate (60%)
- Safety terminations in cell therapy average -47% stock impact — worst of any modality
- Curated Dataset — 63 safety events with drug name, company, ticker, modality, indication, phase, event type, stock impact, and source citations
- Risk Scoring Engine — Composite 0–100 score from modality base risk, target modifiers, indication category, and clinical phase
- Statistical Analysis — Termination rates, clinical hold frequency, death rates by modality/phase, and stock impact by event type
- Time Trends — Tracks whether newer modalities are getting safer over time (spoiler: it's complicated)
- Company Lookup — Compare any public biotech by ticker against the safety event database
- Web Dashboard — Static HTML visualization of key findings, deployable to GitHub Pages
# No external dependencies — pure Python 3.6+
git clone https://github.com/kstring99/safety-analyzer.git
cd safety-analyzerpython3 main.py stats======================================================================
DRUG SAFETY EVENT ANALYSIS
Dataset: 63 events across 8 modalities
======================================================================
--- Safety Termination Rate by Modality ---
Modality Events Terminated Rate
-------------------------------------------------------
Cell Therapy 5 3 60.0%
Small Molecule 10 6 60.0%
mRNA/LNP 8 3 37.5%
ADC 7 2 28.6%
Gene Therapy 9 2 22.2%
Monoclonal Antibody 10 2 20.0%
Bispecific 6 0 0.0%
CAR-T 8 0 0.0%
--- Patient Death Events by Modality ---
Modality Events Deaths Rate
---------------------------------------------------
CAR-T 8 3 37.5%
Gene Therapy 9 3 33.3%
Cell Therapy 5 1 20.0%
Small Molecule 10 2 20.0%
Monoclonal Antibody 10 1 10.0%
...
python3 main.py score --modality gene_therapy --target AAV9 --indication SMA --phase "Phase 1"======================================================================
SAFETY RISK ASSESSMENT
======================================================================
Modality: Gene Therapy
Target: AAV9
Indication: SMA
Ind. Class: rare_disease
Phase: Phase 1
--- Risk Components ---
Modality base risk: 72/100
Target risk modifier: 1.3x
Indication risk modifier: 1.1x
Phase risk modifier: 1.3x
OVERALL SAFETY RISK SCORE: 64/100 [HIGH]
[################################------------------]
--- Comparable Historical Events (5) ---
[1999] OTC transgene (adenovirus vector) — University of Pennsylvania
Patient Death
Jesse Gelsinger died from a massive immune reaction to the adenoviral vector...
[2021] Zynteglo (betibeglogene autotemcel) — bluebird bio
Clinical Hold (stock: -37.5%)
Clinical hold placed after two patients developed myelodysplastic syndrome...
python3 main.py events --modality car_t
python3 main.py events --event-type patient_death
python3 main.py events --year 2023python3 main.py compare BLUE
python3 main.py compare MRNAOpen index.html in a browser or visit the live GitHub Pages site to see an interactive visualization of:
- Risk scores by modality (bar chart)
- Patient death rates by modality
- Safety termination rates
- Stock impact by event type
- Full event timeline
safety-analyzer/
├── main.py # CLI interface (4 commands: stats, score, events, compare)
├── safety_events.json # Curated dataset — 63 events, 8 modalities, 1993–2024
├── safety_stats.py # Statistical analysis (termination, holds, deaths, stock impact, trends)
├── risk_scorer.py # Risk scoring engine (base risk × target × indication × phase)
├── index.html # Web dashboard — dark theme, static HTML/JS
└── README.md
The scoring algorithm combines four factors into a 0–100 composite score:
| Factor | Range | Example |
|---|---|---|
| Modality base risk | 30–72 | Gene Therapy = 72, Small Molecule = 30 |
| Target modifier | 1.0–1.5x | Adenovirus = 1.5x, CD20 = 1.0x |
| Indication modifier | 0.85–1.5x | Healthy volunteers = 1.5x, Heme malignancy = 0.85x |
| Phase modifier | 0.8–1.3x | Phase 1 = 1.3x, Post-market = 0.8x |
score = min(100, base × target × indication × phase × 100 / 210)
| Modality | Base Risk | Key Risks | Notable Events |
|---|---|---|---|
| Gene Therapy | 72 | Immunogenicity, hepatotoxicity, insertional oncogenesis | Gelsinger (1999), Zolgensma liver failure, bluebird MDS |
| CAR-T | 68 | CRS, ICANS, T-cell malignancy | JCAR015 cerebral edema deaths, secondary cancers |
| Cell Therapy | 55 | GVHD, contamination, manufacturing failures | Allogeneic transplant mortality, Provenge (-95% stock) |
| Bispecific | 52 | CRS, ICANS, infections from B-cell depletion | High CRS rates (39–72%) across approved products |
| ADC | 48 | Payload toxicity (ILD, VOD, skin), off-target effects | Mylotarg withdrawal, Enhertu ILD, Blenrep keratopathy |
| mRNA/LNP | 38 | Myocarditis, reactogenicity, immunogenicity | COVID vaccine myocarditis, CureVac failure (-52%) |
| Monoclonal Antibody | 35 | PML, immune-mediated AEs, cytokine storm | TGN1412 catastrophe, Tysabri PML (-43%) |
| Small Molecule | 30 | Organ toxicity, cardiovascular risk | Vioxx (88K+ excess CV events), Fen-Phen ($21B settlements) |
All events sourced from public records:
- FDA safety communications, clinical holds, and prescribing information
- Published clinical trial results (NEJM, Lancet, JAMA, Blood)
- SEC filings and company press releases
- MHRA and EMA regulatory actions
- CDC VAERS data
- No ML — Historical rates and risk multipliers are more interpretable and auditable than black-box models for this dataset size
- No dependencies — Pure Python standard library only. Runs anywhere Python 3.6+ exists
- Public data only — Every event includes a source citation
- Extensible — JSON dataset and modular scoring make it straightforward to add new events or modalities
Kyle Stringfellow — Environmental consultant building biotech and data analysis tools. Learning to code by solving real problems in drug safety, environmental compliance, and scientific data management.
- GitHub: @kstring99