Skip to content

siyansimsh/Fintech_Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FinTech Final Project — Credit Default Prediction

This project builds credit default prediction models using the UCI Default of Credit Card Clients dataset. It focuses on improving defaulter detection with threshold tuning, and provides explainability via SHAP.

What you can do with this repo

  • Train and evaluate a Random Forest model with threshold tuning (targeting Recall ≈ 0.6).
  • Generate evaluation artifacts: confusion matrix, ROC curve / AUC, threshold sensitivity analysis, and SHAP plots.
  • (Optional) Train & save Logistic Regression / Random Forest / XGBoost models.
  • (Optional) Run an interactive Streamlit dashboard for credit risk scoring.

Environment & Dependencies

  • Recommended: Python 3.9+
  • Install dependencies:
pip install -U pip
pip install pandas numpy scikit-learn matplotlib seaborn shap joblib xgboost streamlit plotly

Quick Start

1) Train & evaluate Random Forest (with threshold tuning)

Run from the project root:

python train_rf.py

This script will:

  1. Load uci_default_cleaned.csv
  2. Split train/test sets
  3. Train a Random Forest (class-weighted for imbalance)
  4. Find a threshold targeting Recall ≈ 0.6
  5. Print classification metrics
  6. Export plots and CSV outputs (see below)

2) (Optional) Train & save all models for the dashboard

python "Train and Save All Models.py"

It will generate:

  • lr_model.pkl, rf_model.pkl, xgb_model.pkl
  • feature_names.pkl, reference_data.csv

3) (Optional) Run the Streamlit dashboard

streamlit run web_app.py

Output Files

Generated by python train_rf.py (saved to the project root):

File Description
confusion_matrix_final.png Confusion matrix (threshold-adjusted)
roc_curve_final.png ROC curve & AUC
shap_importance_bar.png SHAP feature importance (bar)
shap_summary_plot.png SHAP summary plot
threshold_comparison.png Recall/Precision/Accuracy vs. threshold
threshold_sensitivity_analysis.csv Threshold performance table

Repository Layout (high-level)

Key files/folders:

final_project/
├─ Readme.md
├─ train_rf.py
├─ train_rf.ipynb
├─ Train and Save All Models.py
├─ web_app.py
├─ uci_default_cleaned.csv
├─ Dataset/                  # raw + reference CSVs
├─ Random_Forest/            # additional RF experiments/results
├─ Logistic_Regression/      # LR report/code
└─ web_source/               # web app bundle (models + assets)

Notes

  • Threshold tuning is used to prioritize recall for the defaulter class.
  • Model artifacts (*.pkl) are included to make the dashboard runnable without retraining.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors