Skip to content

kstring99/safety-analyzer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Safety Analyzer

Predict which clinical-stage drugs are most likely to have catastrophic safety failures — based on modality, target, and 30 years of historical patterns.

A pure-Python analysis tool built on a curated dataset of 63 notable drug safety events across 8 therapeutic modalities (1993–2024). No ML, no dependencies — just historical rates and risk multipliers that surface real patterns in drug development failure.

Built by Kyle Stringfellow — environmental consultant turned biotech tools developer.


Why This Exists

Drug safety failures destroy shareholder value and harm patients. Certain modalities — gene therapy, CAR-T, ADCs — have well-documented patterns of catastrophic events that repeat across programs. This tool quantifies those patterns so you can assess risk before it materializes.

Key findings from the dataset:

  • CAR-T has the highest patient death rate (37.5% of recorded events)
  • Gene therapy leads in clinical holds (33.3%) and has the highest base risk score (72/100)
  • Cell therapy and small molecules tie for termination rate (60%)
  • Safety terminations in cell therapy average -47% stock impact — worst of any modality

Features

  • Curated Dataset — 63 safety events with drug name, company, ticker, modality, indication, phase, event type, stock impact, and source citations
  • Risk Scoring Engine — Composite 0–100 score from modality base risk, target modifiers, indication category, and clinical phase
  • Statistical Analysis — Termination rates, clinical hold frequency, death rates by modality/phase, and stock impact by event type
  • Time Trends — Tracks whether newer modalities are getting safer over time (spoiler: it's complicated)
  • Company Lookup — Compare any public biotech by ticker against the safety event database
  • Web Dashboard — Static HTML visualization of key findings, deployable to GitHub Pages

Quick Start

# No external dependencies — pure Python 3.6+
git clone https://github.com/kstring99/safety-analyzer.git
cd safety-analyzer

View safety statistics by modality

python3 main.py stats
======================================================================
DRUG SAFETY EVENT ANALYSIS
Dataset: 63 events across 8 modalities
======================================================================

--- Safety Termination Rate by Modality ---
Modality                    Events   Terminated     Rate
-------------------------------------------------------
Cell Therapy                     5            3    60.0%
Small Molecule                  10            6    60.0%
mRNA/LNP                         8            3    37.5%
ADC                              7            2    28.6%
Gene Therapy                     9            2    22.2%
Monoclonal Antibody             10            2    20.0%
Bispecific                       6            0     0.0%
CAR-T                            8            0     0.0%

--- Patient Death Events by Modality ---
Modality                    Events   Deaths     Rate
---------------------------------------------------
CAR-T                            8        3    37.5%
Gene Therapy                     9        3    33.3%
Cell Therapy                     5        1    20.0%
Small Molecule                  10        2    20.0%
Monoclonal Antibody             10        1    10.0%
...

Score a drug's safety risk

python3 main.py score --modality gene_therapy --target AAV9 --indication SMA --phase "Phase 1"
======================================================================
SAFETY RISK ASSESSMENT
======================================================================

  Modality:    Gene Therapy
  Target:      AAV9
  Indication:  SMA
  Ind. Class:  rare_disease
  Phase:       Phase 1

--- Risk Components ---
  Modality base risk:       72/100
  Target risk modifier:     1.3x
  Indication risk modifier: 1.1x
  Phase risk modifier:      1.3x

  OVERALL SAFETY RISK SCORE: 64/100 [HIGH]
  [################################------------------]

--- Comparable Historical Events (5) ---

  [1999] OTC transgene (adenovirus vector) — University of Pennsylvania
    Patient Death
    Jesse Gelsinger died from a massive immune reaction to the adenoviral vector...

  [2021] Zynteglo (betibeglogene autotemcel) — bluebird bio
    Clinical Hold (stock: -37.5%)
    Clinical hold placed after two patients developed myelodysplastic syndrome...

Filter and browse safety events

python3 main.py events --modality car_t
python3 main.py events --event-type patient_death
python3 main.py events --year 2023

Look up a company's safety profile

python3 main.py compare BLUE
python3 main.py compare MRNA

Web Dashboard

Open index.html in a browser or visit the live GitHub Pages site to see an interactive visualization of:

  • Risk scores by modality (bar chart)
  • Patient death rates by modality
  • Safety termination rates
  • Stock impact by event type
  • Full event timeline

Project Structure

safety-analyzer/
├── main.py              # CLI interface (4 commands: stats, score, events, compare)
├── safety_events.json   # Curated dataset — 63 events, 8 modalities, 1993–2024
├── safety_stats.py      # Statistical analysis (termination, holds, deaths, stock impact, trends)
├── risk_scorer.py       # Risk scoring engine (base risk × target × indication × phase)
├── index.html           # Web dashboard — dark theme, static HTML/JS
└── README.md

How the Risk Score Works

The scoring algorithm combines four factors into a 0–100 composite score:

Factor Range Example
Modality base risk 30–72 Gene Therapy = 72, Small Molecule = 30
Target modifier 1.0–1.5x Adenovirus = 1.5x, CD20 = 1.0x
Indication modifier 0.85–1.5x Healthy volunteers = 1.5x, Heme malignancy = 0.85x
Phase modifier 0.8–1.3x Phase 1 = 1.3x, Post-market = 0.8x
score = min(100, base × target × indication × phase × 100 / 210)

Modalities Covered

Modality Base Risk Key Risks Notable Events
Gene Therapy 72 Immunogenicity, hepatotoxicity, insertional oncogenesis Gelsinger (1999), Zolgensma liver failure, bluebird MDS
CAR-T 68 CRS, ICANS, T-cell malignancy JCAR015 cerebral edema deaths, secondary cancers
Cell Therapy 55 GVHD, contamination, manufacturing failures Allogeneic transplant mortality, Provenge (-95% stock)
Bispecific 52 CRS, ICANS, infections from B-cell depletion High CRS rates (39–72%) across approved products
ADC 48 Payload toxicity (ILD, VOD, skin), off-target effects Mylotarg withdrawal, Enhertu ILD, Blenrep keratopathy
mRNA/LNP 38 Myocarditis, reactogenicity, immunogenicity COVID vaccine myocarditis, CureVac failure (-52%)
Monoclonal Antibody 35 PML, immune-mediated AEs, cytokine storm TGN1412 catastrophe, Tysabri PML (-43%)
Small Molecule 30 Organ toxicity, cardiovascular risk Vioxx (88K+ excess CV events), Fen-Phen ($21B settlements)

Data Sources

All events sourced from public records:

  • FDA safety communications, clinical holds, and prescribing information
  • Published clinical trial results (NEJM, Lancet, JAMA, Blood)
  • SEC filings and company press releases
  • MHRA and EMA regulatory actions
  • CDC VAERS data

Design Decisions

  • No ML — Historical rates and risk multipliers are more interpretable and auditable than black-box models for this dataset size
  • No dependencies — Pure Python standard library only. Runs anywhere Python 3.6+ exists
  • Public data only — Every event includes a source citation
  • Extensible — JSON dataset and modular scoring make it straightforward to add new events or modalities

Author

Kyle Stringfellow — Environmental consultant building biotech and data analysis tools. Learning to code by solving real problems in drug safety, environmental compliance, and scientific data management.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors