A modular, diagnostic-first pipeline for time-to-event data analysis in clinical research. The pipeline automatically inspects your dataset for competing risks, recurrent events, time-varying exposures, immortal time bias, left truncation, informative censoring, and clustering, then routes each analysis to the appropriate statistical method. Every stage is an independent, reusable R module that can be composed per-project and rendered into a publication-ready manuscript via Quarto.
- Diagnostic-first routing -- the pipeline examines data characteristics before choosing a model, not the other way around
- 7 simulated scenarios covering standard KM, competing risks, recurrent/multistate events, time-varying exposures, and advanced adjustments
- 10 composable modules from data intake through manuscript rendering
- Quarto IMRAD book with full project guide deployed to GitHub Pages
- AMA and APA citation styles included for journal submission
flowchart TD
A[Start: Raw Data] --> B[sa-data-intake<br>Clean + Validate]
B --> C[sa-diagnostics<br>Detect Data Features]
C --> D{Competing<br>risks?}
D -- Yes --> E[sa-competing-risks<br>Fine-Gray / Cause-Specific]
D -- No --> F{Recurrent or<br>multistate?}
F -- Yes --> G[sa-recurrent-multistate<br>Multistate Models]
F -- No --> H[sa-standard-km<br>KM + Cox]
E --> I{Time-varying<br>exposure?}
G --> I
H --> I
I -- Yes --> J[sa-time-varying<br>Time-Dependent Cox]
I -- No --> K{Clustering or<br>advanced issues?}
J --> K
K -- Yes --> L[sa-advanced-adjustments<br>Frailty / IPW / Truncation]
K -- No --> M[sa-publication-figures<br>Assemble Panels]
L --> M
M --> N[sa-manuscript-quarto<br>Render IMRAD Paper]
# 1. Clone the repository
git clone https://github.com/htlin222/survival-pipe.git
cd survival-pipe
# 2. Bootstrap environment (R, Python, renv, .venv)
bash setup.sh
# 3. Verify everything is ready
bash verify_environment.sh
# 4. Start a new analysis project
# Tell the agent: "Start project my-study"| Module | Stage | What It Does |
|---|---|---|
sa-data-intake |
Data prep | Ingest, clean, validate raw data |
sa-diagnostics |
Feature detection | Detect competing risks, clustering, truncation |
sa-standard-km |
Kaplan-Meier | KM curves, log-rank tests, Cox PH models |
sa-competing-risks |
Competing risks | Fine-Gray and cause-specific hazard models |
sa-recurrent-multistate |
Recurrent/multistate | Recurrent events, multistate transition models |
sa-time-varying |
Time-varying | Time-dependent covariates, landmark analysis |
sa-advanced-adjustments |
Advanced | Frailty, IPW, left truncation, informative censor |
sa-publication-figures |
Figure assembly | Multi-panel publication figures (300 DPI) |
sa-manuscript-quarto |
Manuscript | Quarto-rendered IMRAD paper with tables/figures |
sa-end-to-end |
Integration | Full pipeline orchestration and validation |
| Scenario | Description | Key Feature Tested |
|---|---|---|
| 1 | Standard time-to-event | KM + Cox PH baseline |
| 2 | Competing risks (death vs relapse) | Fine-Gray subdistribution |
| 3 | Recurrent events | Multistate transition models |
| 4 | Time-varying drug exposure | Time-dependent Cox |
| 5 | Immortal time bias | Landmark / time-varying fix |
| 6 | Left truncation + late entry | Delayed entry adjustment |
| 7 | Clustered data (multi-center) | Frailty / robust variance |
Full project guide is available at htlin222.github.io/survival-pipe, built as a Quarto book from the pages/ directory.
If you use this pipeline in your research, please cite it. See CITATION.cff for the structured citation, or use the BibTeX entry in citation.bib:
@software{lin2025survivalpipe,
author = {Lin, Hsieh-Ting},
title = {survival-pipe: Diagnostic-First Survival Analysis Pipeline},
year = {2025},
url = {https://github.com/htlin222/survival-pipe},
version = {0.1.0}
}- R >= 4.2
- Python >= 3.12
- Quarto CLI
- gfortran (for frailtypack;
brew install gccon macOS) - uv (Python package manager)
This project is licensed under the MIT License.