Skip to content

he-yufeng/DRL-MultiFactorTrading

Repository files navigation

DRL-MultiFactorTrading

CI Python NumPy License: MIT

Deep Reinforcement Learning trading strategies combining Double DQN with Transformer Attention and Multi-Factor Models inspired by Fama-French. Features adaptive risk management and volatility targeting.

English | δΈ­ζ–‡

πŸ“Š Performance Visualizations

Excellent adaptability on growth stocks with strong momentum characteristics

Xiaomi Corporation (01810.HK) - DRL Learning in Action

Radical Strategy - Xiaomi

Tencent Holdings (00700.HK) - High Returns, Higher Volatility

Radical Strategy - Tencent

Evaluation Helpers

The repository includes pure Python helpers for reviewing an equity curve before showing results:

from strategy_metrics import drawdown_analysis, summarize_equity_curve

summary = summarize_equity_curve([100_000, 103_000, 98_500, 110_000])
drawdown = drawdown_analysis([100_000, 103_000, 98_500, 110_000])
print(summary["total_return"], summary["max_drawdown"], summary["sharpe"])
print(drawdown["drawdown_duration"], drawdown["recovery_duration"])

This keeps headline metrics such as total return, annualized return, max drawdown, drawdown and recovery duration, underwater episodes, Ulcer Index, annualized volatility, downside deviation, Sharpe, Sortino, and Calmar ratio consistent across the conservative and radical strategies.

Benchmarking against buy-and-hold

A trading strategy is only worth running if it beats a passive hold. benchmark_comparison takes the strategy and benchmark equity curves (same length) and reports excess annualized return, information ratio, tracking error, beta, CAPM-style alpha, and the fraction of periods the strategy beats the benchmark:

from strategy_metrics import benchmark_comparison, summarize_vs_benchmark

strategy = [100_000, 108_000, 104_000, 119_000]
buy_and_hold = [100_000, 103_000, 101_000, 106_000]

rel = benchmark_comparison(strategy, buy_and_hold)
print(rel["excess_annualized_return"], rel["information_ratio"], rel["beta"], rel["alpha"])

# Strategy, benchmark, and relative metrics in one bundle for a side-by-side table.
report = summarize_vs_benchmark(strategy, buy_and_hold)
print(report["strategy"]["sharpe"], report["benchmark"]["sharpe"], report["relative"]["win_rate"])

Beta and alpha fall back to safe values when the benchmark has no variance, and a flat or single-point curve returns zeros instead of raising.

Measuring factor predictive power

Equity-curve metrics only tell you how a strategy did after the fact. Before a factor is worth trading it has to predict forward returns, so factor_analysis reports the information coefficient (IC), its stability over time (ICIR), and the spread between factor quantiles:

from factor_analysis import information_coefficient, summarize_factor, factor_quantile_returns

# One rebalance date: factor exposures across the cross-section vs. next-period returns.
rank_ic = information_coefficient(factor_today, forward_returns, method="spearman")

# A panel shaped (periods, assets): one row per rebalance date.
report = summarize_factor(factor_panel, forward_return_panel, method="spearman")
print(report["mean_ic"], report["ic_ir"], report["hit_rate"], report["t_stat"])

# Sanity-check monotonicity: mean forward return from the lowest to highest factor bucket.
buckets = factor_quantile_returns(factor_today, forward_returns, quantiles=5)
print(buckets[-1] - buckets[0])  # top-minus-bottom spread

Rank IC (Spearman) is the default because it is robust to outliers and to monotonic but nonlinear relationships. Non-finite pairs are dropped, thin cross-sections are skipped rather than reported as zero IC, and degenerate inputs return zeros instead of raising.

πŸ“‹ Overview

This repository contains two sophisticated algorithmic trading strategies designed for quantitative trading:

Strategy Approach Risk Profile Key Technology
Conservative Multi-Factor Model Low-Medium Weighted Signal Aggregation
Radical Deep Reinforcement Learning Medium-High Double DQN + Transformer

πŸ—οΈ Architecture

Strategy 1: Conservative Multi-Factor Model

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                    SIGNAL GENERATION                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  Trend Analysis ──────────────────────────────── 35%        β”‚
β”‚  Momentum Indicators ─────────────────────────── 25%        β”‚
β”‚  RSI (Relative Strength Index) ───────────────── 20%        β”‚
β”‚  MACD (Moving Average Convergence Divergence) ── 15%        β”‚
β”‚  Bollinger Bands ─────────────────────────────── 5%         β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                 WEIGHTED AGGREGATION                        β”‚
β”‚                        ↓                                    β”‚
β”‚              FINAL TRADING SIGNAL                           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features:

  • Volatility Targeting: Dynamically adjusts position size based on 15% annualized volatility target
  • Drawdown Protection: Reduces exposure when drawdown exceeds 10%
  • ATR-based Stops: Stop-loss at 2x ATR, take-profit at 4x ATR
  • Time-based Exit: Maximum holding period of 150 bars

Strategy 2: Radical Deep Reinforcement Learning

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              24-DIMENSIONAL STATE VECTOR                    β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  [1-6]   Multi-timeframe Momentum (2,3,5,8,13,21 periods)   β”‚
β”‚  [7-10]  Moving Average Position (5,10,20,40 periods)       β”‚
β”‚  [11-14] Technical Indicators (Vol, RSI, MACD, CCI)         β”‚
β”‚  [15-18] Volume Features (ratio, trend, correlation, vol)   β”‚
β”‚  [19-21] Breakout & Trend Strength                          β”‚
β”‚  [22-24] Acceleration, Volatility Change, Position PnL      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚              TRANSFORMER SELF-ATTENTION                     β”‚
β”‚                                                             β”‚
β”‚         Q = XΒ·Wq    K = XΒ·Wk    V = XΒ·Wv                    β”‚
β”‚                                                             β”‚
β”‚         Attention(Q,K,V) = softmax(QK^T/√d)Β·V               β”‚
β”‚                                                             β”‚
β”‚         Output = X + 0.5 Γ— Attention(Q,K,V)                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                 DOUBLE DQN NETWORK                          β”‚
β”‚                                                             β”‚
β”‚    Input(24) β†’ Dense(128) β†’ Dense(64) β†’ Dense(32) β†’ (9)     β”‚
β”‚                    ↓           ↓           ↓                β”‚
β”‚                  tanh        tanh        tanh               β”‚
β”‚                                                             β”‚
β”‚    Actions: [-4, -3, -2, -1, 0, +1, +2, +3, +4]             β”‚
β”‚             (Short)     (Hold)      (Long)                  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                            ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚           PRIORITIZED EXPERIENCE REPLAY                     β”‚
β”‚                                                             β”‚
β”‚    Priority = |TD-error|^Ξ±        (Ξ± = 0.6)                 β”‚
β”‚    Sampling = Priority / Ξ£(Priority)                        β”‚
β”‚    IS Weight = (N Γ— P(i))^(-Ξ²)    (Ξ² β†’ 1.0)                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key Features:

  • Double DQN: Reduces Q-value overestimation using separate target network
  • Transformer Attention: Enhances feature representation with self-attention mechanism
  • Prioritized Replay: Samples important experiences more frequently (Ξ±=0.6, Ξ²=0.4β†’1.0)
  • Ξ΅-greedy Exploration: Starts at 25%, decays to 5% minimum
  • Dynamic Trailing Stop: 1.8x ATR with profit lock-in at 70%

πŸ“ Project Structure

DRL-MultiFactorTrading/
β”œβ”€β”€ Conservative_strategy_clean.py  # Multi-Factor strategy (streamlined)
β”œβ”€β”€ Radical_strategy_clean.py       # DRL strategy (streamlined)
β”œβ”€β”€ requirements.txt                 # Python dependencies
β”œβ”€β”€ LICENSE                          # MIT License
β”œβ”€β”€ README.md                        # This file
β”œβ”€β”€ .gitignore                       # Git ignore rules
β”œβ”€β”€ .flake8                          # Linting configuration
β”œβ”€β”€ .github/
β”‚   └── workflows/
β”‚       └── ci.yml                   # CI pipeline (Python 3.9-3.12)
β”‚
β”œβ”€β”€ radical-01810HK.png             # Performance: Xiaomi (01810.HK)
└── radical-00700HK.png             # Performance: Tencent (00700.HK)

πŸš€ Quick Start

Prerequisites

# Install dependencies
pip install -r requirements.txt

Run a backtest

The strategies are event-driven and self-contained: they implement on_marketdatafeed(...) and emit orders, and the bundled engine.BacktestEngine acts as the broker and event loop. No trading account or third-party platform is needed β€” just feed it OHLCV bars:

python backtest.py                       # Conservative on synthetic data
python backtest.py --strategy radical    # Radical (numpy Double-DQN)
python backtest.py --strategy ensemble   # 50/50 portfolio of both strategies
python backtest.py --csv prices.csv      # your own OHLCV CSV (close/high/low/volume)
python backtest.py --ticker 1810.HK      # real data via yfinance (pip install yfinance)

It prints the trade count plus a full metrics bundle (return, Sharpe, Sortino, max drawdown, Calmar, ...) computed by strategy_metrics. Example on Xiaomi (1810.HK) daily bars over two years:

Strategy : conservative
Trades   : 57
total_return            : 0.2606
sharpe                  : 1.7380
max_drawdown            : 0.0585

Ensemble portfolio

--strategy ensemble splits capital across the stable Conservative model and the aggressive Radical agent and sums their equity curves. Diversifying across the two smooths the combined curve β€” on Xiaomi the ensemble keeps most of the upside while holding max drawdown near the calmer leg:

  - conservative : return +0.2606  sharpe +1.7380  trades 57
  - radical      : return +0.0956  sharpe +0.5904  trades 111
total_return            : 0.1781
max_drawdown            : 0.0559
sharpe                  : 1.4041

Programmatic usage

import backtest
from engine import BacktestEngine
from Conservative_strategy_clean import AlgoEvent

# convenience helper
bars = backtest.synthetic_bars(n=400)            # or backtest.csv_bars("prices.csv")
result = backtest.run_backtest("conservative", bars)
print(result.trades, result.metrics["sharpe"])

# the combined portfolio of both strategies
portfolio, legs = backtest.run_ensemble(bars, weights={"conservative": 0.5, "radical": 0.5})

# or drive a strategy instance directly
result = BacktestEngine(initial_capital=1_000_000).run(AlgoEvent(), bars)

Strategy Parameters

Conservative Strategy

Parameter Default Description
base_position_pct 0.35 Base position size (35% of capital)
max_position_pct 0.55 Maximum position size
target_volatility 0.15 Target annualized volatility (15%)
stop_loss_atr 2.0 Stop-loss in ATR multiples
take_profit_atr 4.0 Take-profit in ATR multiples
min_gap 8 Minimum bars between trades

Radical Strategy

Parameter Default Description
base_position_pct 0.40 Base position size (40% of capital)
max_position_pct 0.70 Maximum position size
epsilon 0.25 Initial exploration rate
epsilon_min 0.05 Minimum exploration rate
gamma 0.97 Discount factor
learning_rate 0.005 Network learning rate
buffer_size 2000 Replay buffer capacity
batch_size 64 Training batch size

πŸ“Š Signal Generation

Multi-Factor Model (Conservative)

The signal is computed as a weighted sum of five independent factors:

Final_Signal = Ξ£(Factor_i Γ— Weight_i Γ— Strength_i)

where:
  - Trend:     Weight = 0.35, based on MA crossovers (8/20/40)
  - Momentum:  Weight = 0.25, based on 5/10-bar returns
  - RSI:       Weight = 0.20, oversold (<35) / overbought (>65)
  - MACD:      Weight = 0.15, histogram direction
  - Bollinger: Weight = 0.05, band breakouts

DQN Action Space (Radical)

Action Signal Strength Interpretation
0 -4 0.55 Strong Short
1 -3 0.45 Medium Short
2 -2 0.35 Weak Short
3 -1 0.25 Very Weak Short
4 0 0.00 Hold
5 +1 0.25 Very Weak Long
6 +2 0.35 Weak Long
7 +3 0.45 Medium Long
8 +4 0.55 Strong Long

πŸ›‘οΈ Risk Management

Both strategies implement comprehensive risk controls:

Position Sizing

# Volatility-adjusted position sizing
if realized_volatility > target_volatility:
    position_size *= target_volatility / realized_volatility

# Drawdown protection
if drawdown > 0.10:
    position_size *= (1 - drawdown * 0.6)

Exit Conditions

  1. Stop-Loss: ATR-based dynamic stop (2.0x for Conservative, 1.8x for Radical)
  2. Take-Profit: ATR-based target (4.0x for Conservative, 5.0x for Radical)
  3. Trailing Stop: Locks in 50-70% of maximum profit
  4. Time Stop: Maximum holding period (150 bars Conservative, 60 bars Radical)

πŸ”¬ Research Methodology

Development Process

  • 600+ iterations on Conservative Strategy (parameter optimization, factor weight tuning)
  • 400+ experiments on Radical Strategy (network architecture search, hyperparameter tuning)
  • 1000+ total backtests across multiple assets and timeframes
  • 4+ years of historical data (2020-2024) covering multiple market regimes

Testing Period Coverage

  • βœ… COVID-19 crash and recovery (2020)
  • βœ… Bull market conditions (2021)
  • βœ… Bear market stress test (2022)
  • βœ… Recovery rally (2023)
  • βœ… Recent market conditions (2024)

Instruments Tested

  • Hong Kong Equities: Tencent (00700.HK), Xiaomi (01810.HK), Meituan (03690.HK)

πŸ“š References

Academic Papers

  1. Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3-56.

  2. Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.

  3. Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. AAAI Conference on Artificial Intelligence.

  4. Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.

  5. Schaul, T., et al. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.

⚠️ Disclaimer

This software is for educational and research purposes only.

  • Past performance does not guarantee future results
  • Trading involves substantial risk of loss
  • The authors are not responsible for any financial losses
  • Always conduct thorough backtesting before live trading
  • Consult with a qualified financial advisor

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Made with ❀️ for Quantitative Trading Research

About

Deep Reinforcement Learning trading strategies: Double DQN with Transformer Attention + Multi-Factor Model (Fama-French inspired). Features adaptive risk management and volatility targeting.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages