Sepsis Prediction Pipeline
> Overview
The pipeline involves a meticulous process for detecting sepsis using patient-level data: - Data Handling: Missing values handled via the MICE algorithm, with categorical encoding and robust scaling applied. - Feature Engineering: Automated column dropping, log transformation, and feature interaction analysis. - Model Training: Models include Random Forest, XGBoost, and Logistic Regression, optimized using Optuna. - Evaluation: Metrics such as AUROC, Precision, Recall, and F1 Score logged with custom visualization reports. - Deployment: Model registry supports versioning and metadata storage, enabling reproducibility.
> Technologies
> Key Features
- •Patient-level data splitting to ensure no data leakage.
- •Comprehensive data preprocessing pipeline with iterative imputation (MICE), log transformation, and robust scaling.
- •Automated feature engineering with redundant column removal, categorical encoding, and scaling.
- •Advanced model evaluation with metrics logging, calibration plots, and feature importance analysis.
- •Automated model registry with versioning, hyperparameter tracking, and artifact storage.
- •Dynamic report generation with comprehensive visualizations (e.g., ROC, PR curves).
> Performance Metrics
randomForest
auroc:0.9760
f1:0.5594
precision:0.5280
recall:0.5948
xgboost
auroc:0.9998
f1:0.2591
precision:0.2399
recall:0.8721
logisticRegression
auroc:0.8955
f1:0.7830
precision:0.7164
recall:0.8858
> Visualizations

ROC Curve - XGBoost
Receiver Operating Characteristic (ROC) curve showing near-perfect separation.

Precision-Recall Curve - XGBoost
Precision-Recall curve for tuned XGBoost model.

ROC Curve - Random Forest
Receiver Operating Characteristic (ROC) curve for tuned Random Forest model.

Precision-Recall Curve - Random Forest
Precision-Recall curve for tuned Random Forest model.

ROC Curve - Logistic Regression
Receiver Operating Characteristic (ROC) curve for tuned Logistic Regression model.

Precision-Recall Curve - Logistic Regression
Precision-Recall curve for tuned Logistic Regression model.
> Key Learnings
- •Handling class imbalance with advanced techniques like SMOTEENN.
- •Optimizing hyperparameters effectively using Optuna.
- •Understanding the trade-offs between interpretability and performance in models.
> Team
Jeremy Cleland
Graudate Student
