Improvements in Adverse Drug Reaction Prediction
Published first-author paper improving ADR prediction via rigorous interpretability (SHAP, feature importance) across Random Forest, Gradient Boosting, and SVM.
Overview
Joined a Berkeley URAP research stream on using ML for drug-safety pharmacovigilance. The lead question: can we beat the existing Random Forest baseline on ADR prediction while keeping the result interpretable enough for clinicians to trust?
Process
- 01
EDA & feature selection
Ran exploratory analysis to identify the predictors that actually moved prediction, dropped the ones that were noise. Reproducibility was non-negotiable — every step was notebooked and versioned.
- 02
Model sweep
Tuned Random Forest, Gradient Boosting, and SVM with grid search and stratified cross-validation. Scikit-learn, Pandas, Matplotlib, Seaborn throughout.
- 03
Interpretability layer
Added SHAP values and feature-importance plots on top of every model. This is where the research became clinically trustable — we could show why a patient-feature combination raised risk, not just that it did.
Result
Published Wanyu Zhu et al., 2023, Improvements in Adverse Drug Reaction Prediction, Journal of Physics: Conference Series, 2646(1), 012041. Gains across precision, recall, F1, and ROC-AUC. Peer review was the real teacher — it taught me what good ML practice actually means beyond a leaderboard score.
By the numbers
3
Models tested
4
Evaluation metrics
JPCS 2023
Published
Next project
Multi-Modal Apple Stock Prediction
Fused LSTM time-series with FinBERT sentiment embeddings to predict AAPL direction and price — beat linear baseline by 15pp accuracy. IEOR 242B team capstone.