Healthcare – European Cardiovascular Health Consortium (Personal)

Predicting Heart disease with Machine Learning

Fictional

01/10/2025 > 19/11/2025

Context

I was commissioned by the European Cardiovascular Health Consortium (ECHC) (fictional), a non-profit consortium of cardiovascular research centers, to develop a machine learning system for preventive heart disease screening. The project was supervised by Dr. Elena Rossi (Head of Preventive Cardiology).

My role was to:  

Build predictive models to assess cardiovascular disease risk from clinical data 
Optimize model performance while addressing overfitting issues 
Create an interactive Power BI dashboard enabling clinicians to explore “what-if” scenarios 
Translate complex ML predictions into actionable risk categories (Low/Moderate/High)

Datasets

Source:
UCI Heart Disease Dataset 
Creators:
Hungarian Institute of Cardiology (Budapest), University Hospital (Zurich), University Hospital (Basel), V.A. Medical Center (Long Beach & Cleveland Clinic Foundation)

Main dataset:

303 patients from 4 international hospitals 
13 clinical variables: age, sex, chest pain type (cp), resting blood pressure (trestbps), cholesterol (chol), fasting blood sugar (fbs), resting ECG (restecg), max heart rate (thalach), exercise-induced angina (exang), ST depression (oldpeak), ST slope (slope), colored vessels (ca), thalassemia (thal) 
Target variable: presence (1) or absence (0) of heart disease

Data quality: Zero missing values 
Challenge: Original dataset had 5 imbalanced classes (0, 1, 2, 3, 4). Transformed into binary classification (Healthy vs Disease) to achieve balanced classes: 160 healthy (54%) vs 137 diseased (46%). 
Dashboard dataset: 133,056 simulated scenarios generated by systematic combinations of clinical parameters for “What-If” analysis.

Workflow

Environment: Jupyter Notebook (Python 3) using scikit-learn, pandas, numpy, matplotlib, seaborn, joblib 

1. Data Preparation:  

Train-test split: 80% training (237 patients) / 20% test (60 patients) with stratification 
Avoided data snooping: test set locked until final evaluation 
5-Fold Cross-Validation on training set only for hyperparameter tuning 

Preprocessing pipeline:  
Numerical variables: median imputation + StandardScaler (μ=0, σ=1) 
Categorical variables: One-Hot Encoding  

2. Model Development: 

Model 1 – Random Forest:  

Initial GridSearchCV with overfitting detected (AUC gap: 5.1%) 

Regularization strategy applied: 

Limited tree depth (max_depth: 10) 
Increased minimum samples per split (min_samples_split: 5) 
Increased minimum samples per leaf (min_samples_leaf: 4) 
Feature randomization (max_features: ‘sqrt’)

Result after optimization: Accuracy 85%, AUC 0.946, reduced overfitting gap to 3.5%  

Model 2 – Logistic Regression:  
GridSearchCV with L2 regularization (C: 0.01)
Result: Accuracy 86.7%, AUC 0.951, zero overfitting  

Model 3 – Stacking Ensemble:  
Base models: optimized Random Forest + Logistic Regression 
Meta-model: Logistic Regression with 5-fold CV 
Result: Accuracy 83.3%, AUC 0.953 (best discrimination)  

3. Feature Importance Analysis:  

Top 5 predictive variables (62% total importance): chest pain type (15.8%), thalassemia (15.0%), max heart rate (11.1%), colored vessels (10.4%), ST depression (9.8%) 
Surprising finding: cholesterol and fasting blood sugar had weak correlation with target  

4. Dashboard Development:  

Generated 133,056 What-If scenarios through systematic parameter combinations 
Risk categorization: Low (<30%), Moderate (30-70%), High (≥70%) 
Interactive Power BI dashboard with slicers for age, cholesterol, heart rate, chest pain type, thalassemia, ST depression, colored vessels 
Exported as heart_disease_dashboard_stacking.csv

Insights

Model 1: Random Forest (Optimized)

Test Accuracy: 85.0%
Test AUC-ROC: 0.946
Precision (Disease): 0.88
Recall (Disease): 0.79
F1-Score: 0.83
Overfitting gap reduced: From 11.2% to 6.1% (45% improvement)

Model 2: Logistic Regression

Test Accuracy: 86.7%
Test AUC-ROC: 0.951
Precision (Disease): 0.92
Recall (Disease): 0.79
F1-Score: 0.85
Generalization: Excellent (negative gap = test > training)

Model 3: Stacking Ensemble (Selected)

Test Accuracy: 83.3%
Test AUC-ROC: 0.953 ✨ (highest)
Cross-validation AUC: 0.903
Precision (Disease): 0.88
Recall (Disease): 0.75
F1-Score: 0.81
Overfitting: Negligible

Key findings

✅ Stacking achieved best discrimination (AUC = 0.953)
✅ Random Forest had highest sensitivity (79% recall = fewer missed cases)
✅ Logistic Regression had highest precision (92% = fewer false alarms)
✅ All models exceeded 0.94 AUC (excellent clinical performance)

Feature Importance (Top 5)

Chest pain type (cp): 15.8%
Thalassemia (thal): 15.0%
Max heart rate (thalach): 11.1%
Colored vessels (ca): 10.4%
ST depression (oldpeak): 9.8%

Total contribution: 62% of predictive power

Business Impact

For Clinicians: 

Rapid screening tool with 95% discrimination accuracy supporting triage decisions 
Evidence-based prioritization: identifies which patients need immediate cardiological workup vs routine follow-up 
Explainable predictions: feature importance shows which clinical factors drive risk assessment

For Patients: 

Visual risk communication: interactive dashboard translates complex probabilities into clear Low/Moderate/High categories 
Empowerment: patients can see how lifestyle changes (reducing cholesterol, improving fitness) would impact their risk score

For Healthcare System: 

Resource optimization: focuses expensive diagnostic tests (echocardiograms, angiography) on moderate/high-risk patients 
Preventive economics: early detection reduces costly emergency interventions and hospitalizations 
Scalability: model processes unlimited scenarios instantly for population health management

Scientific Contribution: 

Demonstrated that ensemble methods (Stacking) combining linear and non-linear approaches outperform individual models 
Confirmed that diagnostic variables (chest pain characteristics, exercise test results) are more informative than isolated risk factors 
Provided reproducible methodology for small medical datasets (n=303) with rigorous overfitting control

Limitations Acknowledged: