This project explores classical and deep learning approaches to time series forecasting using two real-world datasets.
- Johnson & Johnson (JJ) quarterly earnings per share (EPS)
- Amazon (AMZN) daily closing stock prices
Both case studies follow a similar pipeline
- Exploratory time series visualization
- Seasonal decomposition
- Stationarity testing (ADF) and differencing
- ACF/PACF analysis
- ARIMA model selection (grid search + auto-ARIMA)
- Final ARIMA model fit and forecasting
- LSTM-based forecasting as a deep learning alternative
- FFT-based frequency-domain analysis
You might have the code organized roughly as:
part1_jj.ipynb— JJ EPS analysis & forecastingpart2_amzn.ipynb— Amazon stock analysis & forecastingjj.csv— Johnson & Johnson EPS dataAMZN.csv— Amazon daily stock datadecomposejj.png— JJ seasonal decomposition plotsales_pred.png— JJ ARIMA forecast with CIlstmjj.png— JJ LSTM 24-step forecastlstmamz.png— AMZN LSTM forecast (≈2 years)
(Names can be adjusted to match your actual files.)
Install required Python packages:
pip install numpy pandas matplotlib statsmodels scikit-learn pmdarima tqdm tensorflowMain libraries used:
- Core:
numpy,pandas - Plots:
matplotlib - Time Series:
statsmodels(ARIMA, SARIMAX, ADF, seasonal_decompose) - Model selection:
pmdarima(auto_arima) - Metrics & scaling:
scikit-learn(MinMaxScaler,mean_squared_error) - Deep learning:
tensorflow.keras(LSTM, Dense, Sequential) - Progress bars:
tqdm - Frequency analysis:
numpy.fft(fft,ifft)
-
File:
jj.csv -
Columns:
date— Quarterly datesdata— EPS values (Johnson & Johnson)
The series spans 84 quarterly observations.
- Load data, set
dateas DateTime index - Plot EPS over time to observe trend and clear seasonality
-
Multiplicative decomposition with quarterly period:
seasonal_decompose(data['data'], model='multiplicative', period=4)
-
Plots:
- Observed
- Trend
- Seasonal
- Residuals
-
Figure saved as
decomposejj.png
-
ADF test on original series → non-stationary (p ≈ 1.0)
-
Apply log transform & first difference:
data['data_log'] = np.log(data['data']) data['data_tr_1'] = data['data_log'].diff()
-
ADF on transformed series → stationary (p < 0.001), supporting d = 1 for ARIMA.
- Plots computed on the differenced log series to guide AR (p) and MA (q) orders.
-
Search over p, q ∈ {0,…,7} with d = 1
order_list = [(p,1,q) for p in range(8) for q in range(8)]
-
For each order, fit ARIMA and record AIC.
-
Best model (lowest AIC):
ARIMA(6,1,3) with AIC ≈ 115.50
pmdarima.auto_arimais run on a train subset and also supports (6,1,3) as optimal under constraints, reinforcing the grid search result.
-
Fitted on full
data['data']:best_model = ARIMA(data['data'], order=(6,1,3)) best_model_fit = best_model.fit()
-
Residual diagnostics:
- Ljung–Box test
- Jarque–Bera test
- Residual plots
These indicate no major autocorrelation left and decent residual behavior (though not perfectly normal).
-
Predictions over the entire historical range.
-
Metrics used:
- MAPE
- MAE
- MPE
- RMSE
- Correlation
- MinMax error
Representative results:
- RMSE ≈ 0.401
- MAPE ≈ 8.95%
- Corr ≈ 0.996
-
get_forecast(steps=26)for the next 26 quarters:- Predicted mean
- 95% confidence intervals
-
Forecast index built via quarterly DateTime, and plots include:
- Historical EPS
- Future predictions
- Confidence interval band
-
Figure saved as
sales_pred.png
- Raw values reshaped to
(n,1) MinMaxScalerused for normalization- Sliding window:
window_size = 24,forecast_steps = 24(24-step-ahead multi-output forecast).
-
Stacked LSTM with Dense layers:
model = Sequential([ LSTM(64, return_sequences=True, input_shape=(window_size, 1)), LSTM(32), Dense(64, activation='relu'), Dense(32, activation='relu'), Dense(forecast_steps) ])
-
Trained for 50 epochs with Adam optimizer.
-
Rolling 24-step forecast RMSE ≈ 1.636
-
Plots:
- LSTM forecast vs actual for evaluation
- Extended 24-month forecast appended to the original series
-
Figure saved as
lstmjj.png
-
FFT applied to the EPS series to inspect dominant frequencies:
X_jj = fft(jj_array)
-
Plots:
- FFT amplitude spectrum (stem plot)
- IFFT reconstruction to confirm original signal recovery
This helps confirm the presence of strong seasonal components.
-
File:
AMZN.csv -
Columns:
Date— Trading dateClose— Daily closing stock price
The series contains ~1,258 daily data points.
- Load AMZN data, keep
DateandClose. - Plot the closing prices to inspect trend, volatility, and structural breaks.
-
Multiplicative decomposition with
period=12:result_amazon = seasonal_decompose(data['Close'], model='multiplicative', period=12)
-
Same 4 components plotted: observed, trend, seasonal, residuals.
-
ADF test on
Close:- Statistic ≈ -1.66
- p-value ≈ 0.45 → non-stationary
-
First difference:
data['Close_diff'] = data['Close'].diff()
-
ADF on
Close_diff:- Statistic ≈ -36.25
- p-value = 0.0 → stationary
Thus d = 1 is appropriate.
- ACF and PACF computed on
Close_diffto explore AR and MA orders.
-
Similar
optimize_ARIMAfunction used as in the JJ example. -
Candidate orders: p, q ∈ {0,…,7} with d=1.
-
Best model by AIC:
ARIMA(2,1,2) with AIC ≈ 6118.42
auto_arimaon a train subset selects a simpler ARIMA(0,1,0) model, but the full grid search on complete data supports ARIMA(2,1,2) as superior.
-
Fit:
best_model = ARIMA(data['Close'], order=(2,1,2)) best_model_fit = best_model.fit()
-
Summary includes AR/MA coefficients and diagnostics (Ljung–Box, JB, heteroskedasticity).
-
Predictions over the full historical index.
-
Metrics like MAPE, MAE, RMSE, correlation are computed:
- RMSE ≈ 0.3123
- MAPE ≈ 0.21%
- Corr ≈ 0.99996
The in-sample fit is extremely tight.
-
get_forecast(steps=504)(~2 years of trading days). -
Combine predicted mean with lower/upper confidence intervals.
-
Plot:
- Historical AMZN close prices
- Forecasted path
- Shaded region for 95% CI
(You can specify a filename in plt.savefig() to store this plot.)
- Use only
Closevalues. - Normalize with
MinMaxScaler. - Build sequences with
time_steps = 60(use 60 past days to predict next day).
X, y = create_sequences(scaled_data, time_steps)
train_size = int(len(X) * 0.9)
X_train, y_train = X[:train_size], y[:train_size]-
Stacked LSTM:
model = Sequential([ LSTM(100, return_sequences=True, activation='relu', input_shape=(time_steps, 1)), LSTM(50, activation='relu'), Dense(1) ])
-
Trained for 50 epochs with batch size 32.
-
Training loss steadily decreases to ~1e-3 range.
- Start from the last 60 normalized observations.
- Predict 1-step ahead, append, slide window, repeat.
future_steps = 504(≈ 2 years of business days).- Inverse transform predictions back to price scale.
- Build forecast index with business-day frequency and combine with historical data.
Plot:
- Original AMZN
Close - LSTM forecast (dashed, dark orange)
Saved as: lstmamz.png
-
FFT applied to daily closing prices:
X = fft(close_array)
-
Plots:
- Amplitude spectrum (stem plot)
- IFFT reconstruction to confirm signal consistency
This gives a frequency-domain perspective on the stock’s price behavior.
-
ARIMA
- Pros: Interpretable coefficients, confidence intervals, well-suited to linear structures and low-frequency series (like JJ EPS).
- Cons: Limited in capturing complex nonlinear patterns; assumptions may break in highly volatile financial data.
-
LSTM
- Pros: Flexible for nonlinear dynamics and long-range dependencies; can handle complex patterns in stock prices.
- Cons: Less interpretable, needs more data & tuning, no built-in statistical confidence intervals.
In both JJ and AMZN cases:
- ARIMA produces strong in-sample fits with low RMSE and high correlation.
- LSTM offers an alternative approach, especially for richer horizons and potentially capturing nonlinear behavior, at the cost of interpretability.
-
Ensure
jj.csvandAMZN.csvare in your working directory. -
Open the notebooks/scripts for each part (e.g.
assignment_1.ipynbandassignment_1_part2.ipynb). -
Run all cells in order:
- Data loading & visualization
- Decomposition & stationarity tests
- ARIMA model selection & forecast
- LSTM training & forecast
- FFT analysis
-
Check the generated plots:
decomposejj.png,sales_pred.png,lstmjj.pnglstmamz.png(and any ARIMA forecast plot you save for AMZN)