A production-grade heart disease risk assessment platform β not a notebook, a system. Calibrated ML Β· Deep Clinical XAI Β· Conformal Prediction Β· MongoDB Atlas Β· Live on Streamlit Cloud.
https://cardiolens-heart.streamlit.app
Enter any patient profile β get an instantaneous risk score, full clinical explanation of every contributing factor, a "what-if" counterfactual showing what to change, a 90% prediction interval with mathematical guarantee, and a downloadable PDF clinical report.
| Output | Description |
|---|---|
| Risk Score | Calibrated probability of heart disease (0β100%) |
| Risk Tier | Low / Moderate / High / Critical with clinical guidance |
| SHAP Explanation | Per-feature attribution β which factors drove this specific prediction |
| LIME Cross-check | Independent second explainer validating SHAP |
| Deep Clinical Reasoning | What the value means β how it affects the heart β what it leads to |
| Combined Verdict | Synthesised medical conclusion across all top factors |
| Counterfactual | "If cholesterol drops from X to Y, risk falls from 74% to 28%" |
| Prediction Interval | 90% conformal interval β a mathematical guarantee |
| PDF Report | Downloadable clinical-grade patient report |
| MongoDB Persistence | Every prediction saved with full metadata for cohort analysis |
| Typical college project | CardioLens |
|---|---|
| Single model, one accuracy number | 4 models, 5-fold CV, isotonic calibration, Brier score |
| SHAP bar chart | SHAP + LIME + interaction values + counterfactuals + conformal intervals |
| "The model predicted X" | Full clinical reasoning: mechanism β consequence β action |
| Point estimate only | 90% prediction interval with formal coverage guarantee |
| Local notebook | Live deployed at cardiolens-heart.streamlit.app |
| No persistence | MongoDB Atlas: patient history, tier analytics, cohort insights |
| No fairness analysis | Subgroup AUC and recall audit across sex |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Streamlit Dashboard (Live) β
β Blood-flow canvas Β· Gauge Β· Radar Β· SHAP Β· LIME Β· PDF β
ββββββββββββββββ¬βββββββββββββββ¬βββββββββββββββ¬βββββββββββββββββ
β β β
βββββββββΌβββββββ βββββββΌβββββββ ββββββΌβββββββββββββ
β ML Core β β XAI Layer β β Uncertainty β
β RF champion β β SHAP+LIME β β Split-conformal β
β AUC 0.969 β β Clinical β β 90% guarantee β
β Brier 0.077 β β reasoning β β β
ββββββββββββββββ ββββββββββββββ βββββββββββββββββββ
β
βββββββββΌβββββββββββββββ ββββββββββββββββββββββββββ
β MongoDB Atlas β β Azure ML (code-ready) β
β predictions coll. β β Managed Endpoint β
β model_registry coll. β β Blob Storage β
β Cohort analytics β β Application Insights β
ββββββββββββββββββββββββ ββββββββββββββββββββββββββ
Champion: Random Forest β selected by 5-fold stratified cross-validation
| Metric | Score | What it means |
|---|---|---|
| AUC-ROC | 0.9692 | Ranks sick above healthy 97% of the time |
| Accuracy | 0.9016 | 90% of patients correctly classified |
| Recall | 0.9286 | Catches 93 out of 100 real cardiac cases |
| Precision | 0.8667 | 87% of flagged patients truly have disease |
| F1 Score | 0.8966 | Strong balance of precision and recall |
| Brier Score | 0.0766 | Excellent calibration β probabilities are honest |
All 4 candidates (CV AUC):
| Model | CV AUC | Description |
|---|---|---|
| π Random Forest | 0.8792 Β± 0.0376 | Champion β 400 trees, majority vote |
| Logistic Regression | 0.8682 Β± 0.0532 | Linear baseline, highly competitive |
| XGBoost | 0.8508 Β± 0.0425 | Sequential error-correcting trees |
| LightGBM | 0.8481 Β± 0.0418 | Leaf-wise boosting |
Fairness audit:
| Subgroup | n | Accuracy | AUC | Recall |
|---|---|---|---|---|
| Female | 20 | 0.950 | 1.000 | 0.857 |
| Male | 41 | 0.878 | 0.950 | 0.952 |
TreeExplainer computes exact Shapley values β mathematically guaranteed feature attribution. Not an approximation. Every prediction decomposes as:
final_risk = baseline + SHAP(age) + SHAP(chol) + SHAP(oldpeak) + ... + SHAP(thal)
Generates 5,000 perturbed patient variants, scores all of them, fits a local linear model β completely independent from SHAP. Agreement between SHAP and LIME = high-confidence explanation.
The key differentiator. For each top feature, three layers of medical explanation are generated based on the actual raw value:
Example β ST Depression = 2.3mm:
- What it means: "Marked ST segment depression β strong objective evidence of myocardial ischaemia."
- How it affects the heart: "During ischaemia, the subendocardial myocardium depolarises abnormally, shifting the ECG's ST segment downward. Depression β₯2mm is a Class I indication for cardiac investigation."
- What it leads to: "Predicts multi-vessel coronary disease. Associated with 5-10x increased cardiac event risk vs a negative stress test."
Example β Reversible Thalassemia Defect:
- What it means: "Thallium scan shows reduced blood flow under stress that recovers at rest β the direct imaging definition of myocardial ischaemia."
- How it affects the heart: "A critically narrowed coronary artery cannot increase flow during stress. Thallium creates a cold spot in underperfused zones. At rest, flow recovers."
- What it leads to: "Class I indication for coronary angiography. Tissue at immediate risk of infarction if the causative stenosis is untreated."
All top factors synthesised into one conclusive medical verdict:
"The dominant contributors are ST depression and thalassemia status β each independently associated with significant coronary artery disease. Together, this warrants urgent cardiovascular evaluation. Coronary angiography should be strongly considered."
Greedy search over modifiable features (cholesterol, BP, heart rate, ST depression) finds the minimum real-world intervention:
"If your cholesterol drops from 2.03 to 0.53 (scaled units), predicted risk falls from 30.7% to 14.4%."
Split-conformal prediction provides a 90% coverage guarantee β a mathematical theorem, not an estimate. Uses 46 validation patients as calibration. No Bayesian assumptions required.
predictions collection β one document per inference:
{
"patient_id": "P-0001",
"timestamp": "2025-05-08T14:32:11Z",
"input_vitals": { "age": 54, "chol": 240, "oldpeak": 2.3 },
"risk_score": 0.73,
"risk_tier": "High",
"shap_values": { "ca": 0.18, "thal": 0.15, "oldpeak": 0.14 },
"counterfactual": "If cholesterol drops from 2.03 to 0.53...",
"interval": { "lower": 0.55, "upper": 0.91 }
}model_registry collection β in-house MLflow-lite tracking every trained model.
5 aggregation pipelines:
- Recent predictions timeline
- Risk tier distribution chart
- Top risk drivers across High/Critical cohort (
$objectToArrayon SHAP sub-document) - Weekly volume and mean risk trend
- Per-patient history search
cardiolens/
βββ src/
β βββ data_loader.py # Load β clean β stratified split β StandardScaler
β βββ models.py # 4 candidates + CV champion + isotonic calibration
β βββ train.py # End-to-end training entrypoint
β βββ evaluation.py # Metrics + ROC + Brier + fairness audit
β βββ xai.py # SHAP + LIME + deep clinical reasoning engine
β β # + counterfactuals + combined conclusion
β βββ uncertainty.py # Split-conformal prediction
β βββ risk.py # 4-tier risk stratification
β βββ reports.py # ReportLab PDF generator
βββ app/
β βββ streamlit_app.py # 5-tab dashboard with live MongoDB + animations
βββ mongo/
β βββ schemas.py # Pydantic v2 document schemas
β βββ client.py # PyMongo client + index creation
β βββ analytics.py # 5 aggregation pipelines
βββ azure/
β βββ deploy.py # Azure ML SDK v2 deployment
β βββ score.py # Scoring script with App Insights logging
β βββ blob_upload.py # Push artifact to Blob Storage
βββ data/cleveland.csv # UCI Cleveland Heart Disease (303 patients)
βββ reports/ # Model artifact + metrics + PDFs
βββ requirements.txt
βββ .env.example
git clone https://github.com/Aadithyaar22/cardiolens.git
cd cardiolens
# Create environment
conda create -n cardiolens python=3.11 -y
conda activate cardiolens
pip install -r requirements.txt
# Mac M-series only
brew install libomp
# Train the model
python -m src.train
# Launch
streamlit run app/streamlit_app.pyCreate .env in the project root:
# MongoDB Atlas (free M0 tier)
MONGO_URI=mongodb+srv://<user>:<password>@cluster0.xxxxx.mongodb.net/
CARDIOLENS_DB=cardiolens
# Azure ML (optional)
AZURE_SUBSCRIPTION_ID=
AZURE_RESOURCE_GROUP=cardiolens-rg
AZURE_WORKSPACE_NAME=cardiolens-workspace
AZURE_ENDPOINT_URI=
AZURE_ENDPOINT_KEY=
AZURE_BLOB_CONNECTION_STRING=| Category | Technologies |
|---|---|
| ML | scikit-learn, XGBoost, LightGBM, Random Forest |
| Explainability | SHAP (TreeExplainer), LIME, custom clinical reasoning engine |
| Uncertainty | Split-conformal prediction (custom) |
| Dashboard | Streamlit, Plotly, HTML5 Canvas |
| Database | MongoDB Atlas, PyMongo, Pydantic v2 |
| Cloud | Azure ML, Azure Blob Storage, Azure Application Insights |
| Reports | ReportLab |
| Deployment | Streamlit Community Cloud |
| Language | Python 3.11 |
UCI Cleveland Heart Disease Dataset
- 303 patients Β· 13 features Β· Collected 1988, Cleveland Clinic Foundation
- Binary target: 0 = no disease, 1 = disease (any severity)
- UCI ML Repository
CardioLens is an educational and research project. It is not validated for clinical use and does not constitute medical advice. All predictions and explanations are for demonstration purposes only. Consult a qualified clinician for any medical decisions.
MIT License β free to use, modify, and distribute with attribution.