2023PublishedPython · Scikit-learn · SHAP · Published

Improvements in Adverse Drug Reaction Prediction

Published first-author paper improving ADR prediction via rigorous interpretability (SHAP, feature importance) across Random Forest, Gradient Boosting, and SVM.

Live

Overview

Joined a Berkeley URAP research stream on using ML for drug-safety pharmacovigilance. The lead question: can we beat the existing Random Forest baseline on ADR prediction while keeping the result interpretable enough for clinicians to trust?

Process

01
EDA & feature selection
Ran exploratory analysis to identify the predictors that actually moved prediction, dropped the ones that were noise. Reproducibility was non-negotiable — every step was notebooked and versioned.
02
Model sweep
Tuned Random Forest, Gradient Boosting, and SVM with grid search and stratified cross-validation. Scikit-learn, Pandas, Matplotlib, Seaborn throughout.
03
Interpretability layer
Added SHAP values and feature-importance plots on top of every model. This is where the research became clinically trustable — we could show why a patient-feature combination raised risk, not just that it did.

Result

Published Wanyu Zhu et al., 2023, Improvements in Adverse Drug Reaction Prediction, Journal of Physics: Conference Series, 2646(1), 012041. Gains across precision, recall, F1, and ROC-AUC. Peer review was the real teacher — it taught me what good ML practice actually means beyond a leaderboard score.

By the numbers

Models tested

Evaluation metrics

JPCS 2023

Published

Next project

Multi-Modal Apple Stock Prediction

Fused LSTM time-series with FinBERT sentiment embeddings to predict AAPL direction and price — beat linear baseline by 15pp accuracy. IEOR 242B team capstone.

Overview

Process

EDA & feature selection

Model sweep

Interpretability layer

Result

Multi-Modal Apple Stock Prediction