A
All projects
2023PublishedPython · Scikit-learn · SHAP · Published

Improvements in Adverse Drug Reaction Prediction

Published first-author paper improving ADR prediction via rigorous interpretability (SHAP, feature importance) across Random Forest, Gradient Boosting, and SVM.

01

Overview

Joined a Berkeley URAP research stream on using ML for drug-safety pharmacovigilance. The lead question: can we beat the existing Random Forest baseline on ADR prediction while keeping the result interpretable enough for clinicians to trust?

02

Process

  1. 01

    EDA & feature selection

    Ran exploratory analysis to identify the predictors that actually moved prediction, dropped the ones that were noise. Reproducibility was non-negotiable — every step was notebooked and versioned.

  2. 02

    Model sweep

    Tuned Random Forest, Gradient Boosting, and SVM with grid search and stratified cross-validation. Scikit-learn, Pandas, Matplotlib, Seaborn throughout.

  3. 03

    Interpretability layer

    Added SHAP values and feature-importance plots on top of every model. This is where the research became clinically trustable — we could show why a patient-feature combination raised risk, not just that it did.

03

Result

Published Wanyu Zhu et al., 2023, Improvements in Adverse Drug Reaction Prediction, Journal of Physics: Conference Series, 2646(1), 012041. Gains across precision, recall, F1, and ROC-AUC. Peer review was the real teacher — it taught me what good ML practice actually means beyond a leaderboard score.

By the numbers

3

Models tested

4

Evaluation metrics

JPCS 2023

Published

Next project

Multi-Modal Apple Stock Prediction

Fused LSTM time-series with FinBERT sentiment embeddings to predict AAPL direction and price — beat linear baseline by 15pp accuracy. IEOR 242B team capstone.