2025CompleteLSTM · FinBERT · PyTorch · Multi-modal

Multi-Modal Apple Stock Prediction

Fused LSTM time-series with FinBERT sentiment embeddings to predict AAPL direction and price — beat linear baseline by 15pp accuracy. IEOR 242B team capstone.

Source

Overview

242B final project with a team of six. Traditional stock models rely purely on price and volume and leave sentiment on the table. We built a multi-modal deep-learning architecture that treats the tape and the news as two complementary signals and fuses them at the representation layer.

Process

01
Time-series branch
Pulled five years of AAPL OHLCV from yfinance + Alpha Vantage. Engineered MACD, RSI, 5/10-day moving averages. Built a hybrid 1D-CNN + bidirectional LSTM with SMOTE oversampling and Focal Loss to handle the imbalanced up/down/neutral labels.
02
Sentiment branch
Collected date-matched financial news from Kaggle. Built three parallel embedders — FinBERT for contextual finance-aware vectors, Word2Vec for fast static baselines, a custom LSTM encoder for pure sequential learning. Each produced 128-d vectors per day.
03
Fusion & evaluation
Concatenated both branches into an MLP head and trained end-to-end for both regression (next-day close) and classification (up/down). Validated against a logistic-regression baseline to isolate the sentiment lift.

Result

Regression: MAE 0.6146, RMSE 0.7198. Classification: 53.70% accuracy vs 38.65% baseline, ROC-AUC 0.5838, F1 0.3866. The sentiment branch measurably helped — proof that even noisy public text carries price-relevant signal. Real lesson: label imbalance matters more than model depth at this sample size.

By the numbers

+15pp

Accuracy lift

0.72

Regression RMSE

Team size

Next project

BTC ETF Analysis

Three-month quantitative deep-dive on spot Bitcoin ETFs: decomposed premium/discount dynamics across 12 products and identified a persistent 3–7bp arbitrage window.

Overview

Process

Time-series branch

Sentiment branch

Fusion & evaluation

Result

BTC ETF Analysis