Skip to content

Aadithyaar22/cardiolens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ«€ CardioLens

Live Demo GitHub Python MongoDB Streamlit License

A production-grade heart disease risk assessment platform β€” not a notebook, a system. Calibrated ML Β· Deep Clinical XAI Β· Conformal Prediction Β· MongoDB Atlas Β· Live on Streamlit Cloud.


🌐 Live Demo

https://cardiolens-heart.streamlit.app

Enter any patient profile β†’ get an instantaneous risk score, full clinical explanation of every contributing factor, a "what-if" counterfactual showing what to change, a 90% prediction interval with mathematical guarantee, and a downloadable PDF clinical report.


🎯 What CardioLens Does

Output Description
Risk Score Calibrated probability of heart disease (0–100%)
Risk Tier Low / Moderate / High / Critical with clinical guidance
SHAP Explanation Per-feature attribution β€” which factors drove this specific prediction
LIME Cross-check Independent second explainer validating SHAP
Deep Clinical Reasoning What the value means β†’ how it affects the heart β†’ what it leads to
Combined Verdict Synthesised medical conclusion across all top factors
Counterfactual "If cholesterol drops from X to Y, risk falls from 74% to 28%"
Prediction Interval 90% conformal interval β€” a mathematical guarantee
PDF Report Downloadable clinical-grade patient report
MongoDB Persistence Every prediction saved with full metadata for cohort analysis

✨ What Makes This Different

Typical college project CardioLens
Single model, one accuracy number 4 models, 5-fold CV, isotonic calibration, Brier score
SHAP bar chart SHAP + LIME + interaction values + counterfactuals + conformal intervals
"The model predicted X" Full clinical reasoning: mechanism β†’ consequence β†’ action
Point estimate only 90% prediction interval with formal coverage guarantee
Local notebook Live deployed at cardiolens-heart.streamlit.app
No persistence MongoDB Atlas: patient history, tier analytics, cohort insights
No fairness analysis Subgroup AUC and recall audit across sex

πŸ— Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              Streamlit Dashboard (Live)                      β”‚
β”‚  Blood-flow canvas Β· Gauge Β· Radar Β· SHAP Β· LIME Β· PDF      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚              β”‚              β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚   ML Core    β”‚ β”‚ XAI Layer  β”‚ β”‚  Uncertainty    β”‚
       β”‚ RF champion  β”‚ β”‚ SHAP+LIME  β”‚ β”‚ Split-conformal β”‚
       β”‚ AUC 0.969    β”‚ β”‚ Clinical   β”‚ β”‚ 90% guarantee   β”‚
       β”‚ Brier 0.077  β”‚ β”‚ reasoning  β”‚ β”‚                 β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚   MongoDB Atlas      β”‚     β”‚   Azure ML (code-ready) β”‚
       β”‚ predictions coll.    β”‚     β”‚ Managed Endpoint        β”‚
       β”‚ model_registry coll. β”‚     β”‚ Blob Storage            β”‚
       β”‚ Cohort analytics     β”‚     β”‚ Application Insights    β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

πŸ“Š Model Performance

Champion: Random Forest β€” selected by 5-fold stratified cross-validation

Metric Score What it means
AUC-ROC 0.9692 Ranks sick above healthy 97% of the time
Accuracy 0.9016 90% of patients correctly classified
Recall 0.9286 Catches 93 out of 100 real cardiac cases
Precision 0.8667 87% of flagged patients truly have disease
F1 Score 0.8966 Strong balance of precision and recall
Brier Score 0.0766 Excellent calibration β€” probabilities are honest

All 4 candidates (CV AUC):

Model CV AUC Description
πŸ† Random Forest 0.8792 Β± 0.0376 Champion β€” 400 trees, majority vote
Logistic Regression 0.8682 Β± 0.0532 Linear baseline, highly competitive
XGBoost 0.8508 Β± 0.0425 Sequential error-correcting trees
LightGBM 0.8481 Β± 0.0418 Leaf-wise boosting

Fairness audit:

Subgroup n Accuracy AUC Recall
Female 20 0.950 1.000 0.857
Male 41 0.878 0.950 0.952

πŸ”¬ Explainability β€” The Core Innovation

SHAP (Primary Explainer)

TreeExplainer computes exact Shapley values β€” mathematically guaranteed feature attribution. Not an approximation. Every prediction decomposes as:

final_risk = baseline + SHAP(age) + SHAP(chol) + SHAP(oldpeak) + ... + SHAP(thal)

LIME (Independent Cross-check)

Generates 5,000 perturbed patient variants, scores all of them, fits a local linear model β€” completely independent from SHAP. Agreement between SHAP and LIME = high-confidence explanation.

Deep Clinical Reasoning Engine

The key differentiator. For each top feature, three layers of medical explanation are generated based on the actual raw value:

Example β€” ST Depression = 2.3mm:

  • What it means: "Marked ST segment depression β€” strong objective evidence of myocardial ischaemia."
  • How it affects the heart: "During ischaemia, the subendocardial myocardium depolarises abnormally, shifting the ECG's ST segment downward. Depression β‰₯2mm is a Class I indication for cardiac investigation."
  • What it leads to: "Predicts multi-vessel coronary disease. Associated with 5-10x increased cardiac event risk vs a negative stress test."

Example β€” Reversible Thalassemia Defect:

  • What it means: "Thallium scan shows reduced blood flow under stress that recovers at rest β€” the direct imaging definition of myocardial ischaemia."
  • How it affects the heart: "A critically narrowed coronary artery cannot increase flow during stress. Thallium creates a cold spot in underperfused zones. At rest, flow recovers."
  • What it leads to: "Class I indication for coronary angiography. Tissue at immediate risk of infarction if the causative stenosis is untreated."

Combined Clinical Conclusion

All top factors synthesised into one conclusive medical verdict:

"The dominant contributors are ST depression and thalassemia status β€” each independently associated with significant coronary artery disease. Together, this warrants urgent cardiovascular evaluation. Coronary angiography should be strongly considered."

Counterfactual Explanations

Greedy search over modifiable features (cholesterol, BP, heart rate, ST depression) finds the minimum real-world intervention:

"If your cholesterol drops from 2.03 to 0.53 (scaled units), predicted risk falls from 30.7% to 14.4%."

Conformal Prediction Intervals

Split-conformal prediction provides a 90% coverage guarantee β€” a mathematical theorem, not an estimate. Uses 46 validation patients as calibration. No Bayesian assumptions required.


πŸ—„ MongoDB β€” Clinical Data Layer

predictions collection β€” one document per inference:

{
  "patient_id": "P-0001",
  "timestamp": "2025-05-08T14:32:11Z",
  "input_vitals": { "age": 54, "chol": 240, "oldpeak": 2.3 },
  "risk_score": 0.73,
  "risk_tier": "High",
  "shap_values": { "ca": 0.18, "thal": 0.15, "oldpeak": 0.14 },
  "counterfactual": "If cholesterol drops from 2.03 to 0.53...",
  "interval": { "lower": 0.55, "upper": 0.91 }
}

model_registry collection β€” in-house MLflow-lite tracking every trained model.

5 aggregation pipelines:

  • Recent predictions timeline
  • Risk tier distribution chart
  • Top risk drivers across High/Critical cohort ($objectToArray on SHAP sub-document)
  • Weekly volume and mean risk trend
  • Per-patient history search

πŸ“ Project Structure

cardiolens/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ data_loader.py    # Load β†’ clean β†’ stratified split β†’ StandardScaler
β”‚   β”œβ”€β”€ models.py         # 4 candidates + CV champion + isotonic calibration
β”‚   β”œβ”€β”€ train.py          # End-to-end training entrypoint
β”‚   β”œβ”€β”€ evaluation.py     # Metrics + ROC + Brier + fairness audit
β”‚   β”œβ”€β”€ xai.py            # SHAP + LIME + deep clinical reasoning engine
β”‚   β”‚                     # + counterfactuals + combined conclusion
β”‚   β”œβ”€β”€ uncertainty.py    # Split-conformal prediction
β”‚   β”œβ”€β”€ risk.py           # 4-tier risk stratification
β”‚   └── reports.py        # ReportLab PDF generator
β”œβ”€β”€ app/
β”‚   └── streamlit_app.py  # 5-tab dashboard with live MongoDB + animations
β”œβ”€β”€ mongo/
β”‚   β”œβ”€β”€ schemas.py        # Pydantic v2 document schemas
β”‚   β”œβ”€β”€ client.py         # PyMongo client + index creation
β”‚   └── analytics.py      # 5 aggregation pipelines
β”œβ”€β”€ azure/
β”‚   β”œβ”€β”€ deploy.py         # Azure ML SDK v2 deployment
β”‚   β”œβ”€β”€ score.py          # Scoring script with App Insights logging
β”‚   └── blob_upload.py    # Push artifact to Blob Storage
β”œβ”€β”€ data/cleveland.csv    # UCI Cleveland Heart Disease (303 patients)
β”œβ”€β”€ reports/              # Model artifact + metrics + PDFs
β”œβ”€β”€ requirements.txt
└── .env.example

πŸš€ Run Locally

git clone https://github.com/Aadithyaar22/cardiolens.git
cd cardiolens

# Create environment
conda create -n cardiolens python=3.11 -y
conda activate cardiolens
pip install -r requirements.txt

# Mac M-series only
brew install libomp

# Train the model
python -m src.train

# Launch
streamlit run app/streamlit_app.py

πŸ”§ Environment Variables

Create .env in the project root:

# MongoDB Atlas (free M0 tier)
MONGO_URI=mongodb+srv://<user>:<password>@cluster0.xxxxx.mongodb.net/
CARDIOLENS_DB=cardiolens

# Azure ML (optional)
AZURE_SUBSCRIPTION_ID=
AZURE_RESOURCE_GROUP=cardiolens-rg
AZURE_WORKSPACE_NAME=cardiolens-workspace
AZURE_ENDPOINT_URI=
AZURE_ENDPOINT_KEY=
AZURE_BLOB_CONNECTION_STRING=

πŸ“¦ Tech Stack

Category Technologies
ML scikit-learn, XGBoost, LightGBM, Random Forest
Explainability SHAP (TreeExplainer), LIME, custom clinical reasoning engine
Uncertainty Split-conformal prediction (custom)
Dashboard Streamlit, Plotly, HTML5 Canvas
Database MongoDB Atlas, PyMongo, Pydantic v2
Cloud Azure ML, Azure Blob Storage, Azure Application Insights
Reports ReportLab
Deployment Streamlit Community Cloud
Language Python 3.11

πŸ“‹ Dataset

UCI Cleveland Heart Disease Dataset

  • 303 patients Β· 13 features Β· Collected 1988, Cleveland Clinic Foundation
  • Binary target: 0 = no disease, 1 = disease (any severity)
  • UCI ML Repository

⚠️ Disclaimer

CardioLens is an educational and research project. It is not validated for clinical use and does not constitute medical advice. All predictions and explanations are for demonstration purposes only. Consult a qualified clinician for any medical decisions.


πŸ“„ License

MIT License β€” free to use, modify, and distribute with attribution.


πŸ‘₯ Built By


Aadithya A R
B.Tech CSE (AI & ML)
Global Academy of Technology, Bengaluru
ML Pipeline Β· XAI Engine Β· Clinical Reasoning Β· Dashboard Β· MongoDB Β· Deployment


Yadunandan M Nimbalkar
B.Tech CSE (AI & ML)
Global Academy of Technology, Bengaluru
Co-Builder Β· Project Architecture Β· Research Β· Testing Β· Validation


⭐ Star this repo if you found it useful

About

Heart disease risk assessment with Explainable AI, MongoDB Atlas, and Azure ML

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors