This project focuses on predicting the next day’s closing stock price using historical market data. A research-oriented approach is applied to explore relationships between stock features and evaluate machine learning models for short-term forecasting.
The analysis combines data exploration, regression modeling, and visualization to derive meaningful insights about stock price behavior.
- Analyze historical stock market data
- Understand relationships between price features
- Build predictive models for short-term forecasting
- Compare performance of linear and non-linear models
- Evaluate predictions using metrics and visualization
- Source: Yahoo Finance (via
yfinanceAPI) - Stock Used: Apple (AAPL) (can be changed)
- Time Period: Last 2 years (daily data)
- Open
- High
- Low
- Volume
- Close (Next Day Closing Price)
- Retrieved stock data using
yfinance
- Checked structure, statistics, and consistency
- Verified absence of major missing values
- Selected price-based and volume features
- Linear Regression (baseline model)
- Random Forest Regressor (non-linear model)
- Mean Squared Error (MSE)
- R² Score
- Compared actual vs predicted prices using line plots
| Model | MSE ↓ | R² ↑ | Performance |
|---|---|---|---|
| Linear Regression | 2.45 | 0.975 | Excellent |
| Random Forest | 9.81 | 0.902 | Moderate |
- Linear Regression outperformed Random Forest in both accuracy and visualization
- Strong linear relationship exists between:
- Open, High, Low → Close price
- Random Forest showed larger deviations and lower generalization performance
Although stock markets are often considered complex and non-linear, this study shows that:
Short-term stock price prediction can be effectively modeled using simple linear relationships when features are highly correlated.
This highlights the importance of understanding data before choosing complex models.
- External factors (news, economy, sentiment) not included
- No technical indicators used (e.g., moving averages)
- Limited to short-term prediction
- Models not hyperparameter-tuned
- Add lag features (previous day prices)
- Apply advanced models (LSTM, XGBoost)
- Integrate sentiment analysis from news/social media
- Deploy a real-time prediction system
This project demonstrates that model simplicity can outperform complexity when data relationships are inherently linear.
Linear Regression proved to be highly effective for short-term stock prediction, while Random Forest did not provide additional benefits without tuning.
The key takeaway:
Choosing the right model depends more on data characteristics than model complexity.
Predict_Future_Stock_Prices/ │── stock_prediction.ipynb │── README.md
- Python
- Pandas
- Matplotlib
- Seaborn
- Scikit-learn
- yfinance
Malik Muhammad Mudassir Iqbal
AI/ML Engineering Intern — DevelopersHub Corporation