Deep Reinforcement Learning trading strategies combining Double DQN with Transformer Attention and Multi-Factor Models inspired by Fama-French. Features adaptive risk management and volatility targeting.
Excellent adaptability on growth stocks with strong momentum characteristics
The repository includes pure Python helpers for reviewing an equity curve before showing results:
from strategy_metrics import drawdown_analysis, summarize_equity_curve
summary = summarize_equity_curve([100_000, 103_000, 98_500, 110_000])
drawdown = drawdown_analysis([100_000, 103_000, 98_500, 110_000])
print(summary["total_return"], summary["max_drawdown"], summary["sharpe"])
print(drawdown["drawdown_duration"], drawdown["recovery_duration"])This keeps headline metrics such as total return, annualized return, max drawdown, drawdown and recovery duration, underwater episodes, Ulcer Index, annualized volatility, downside deviation, Sharpe, Sortino, and Calmar ratio consistent across the conservative and radical strategies.
A trading strategy is only worth running if it beats a passive hold. benchmark_comparison
takes the strategy and benchmark equity curves (same length) and reports excess annualized
return, information ratio, tracking error, beta, CAPM-style alpha, and the fraction of periods
the strategy beats the benchmark:
from strategy_metrics import benchmark_comparison, summarize_vs_benchmark
strategy = [100_000, 108_000, 104_000, 119_000]
buy_and_hold = [100_000, 103_000, 101_000, 106_000]
rel = benchmark_comparison(strategy, buy_and_hold)
print(rel["excess_annualized_return"], rel["information_ratio"], rel["beta"], rel["alpha"])
# Strategy, benchmark, and relative metrics in one bundle for a side-by-side table.
report = summarize_vs_benchmark(strategy, buy_and_hold)
print(report["strategy"]["sharpe"], report["benchmark"]["sharpe"], report["relative"]["win_rate"])Beta and alpha fall back to safe values when the benchmark has no variance, and a flat or single-point curve returns zeros instead of raising.
Equity-curve metrics only tell you how a strategy did after the fact. Before a factor is
worth trading it has to predict forward returns, so factor_analysis reports the
information coefficient (IC), its stability over time (ICIR), and the spread between
factor quantiles:
from factor_analysis import information_coefficient, summarize_factor, factor_quantile_returns
# One rebalance date: factor exposures across the cross-section vs. next-period returns.
rank_ic = information_coefficient(factor_today, forward_returns, method="spearman")
# A panel shaped (periods, assets): one row per rebalance date.
report = summarize_factor(factor_panel, forward_return_panel, method="spearman")
print(report["mean_ic"], report["ic_ir"], report["hit_rate"], report["t_stat"])
# Sanity-check monotonicity: mean forward return from the lowest to highest factor bucket.
buckets = factor_quantile_returns(factor_today, forward_returns, quantiles=5)
print(buckets[-1] - buckets[0]) # top-minus-bottom spreadRank IC (Spearman) is the default because it is robust to outliers and to monotonic but nonlinear relationships. Non-finite pairs are dropped, thin cross-sections are skipped rather than reported as zero IC, and degenerate inputs return zeros instead of raising.
This repository contains two sophisticated algorithmic trading strategies designed for quantitative trading:
| Strategy | Approach | Risk Profile | Key Technology |
|---|---|---|---|
| Conservative | Multi-Factor Model | Low-Medium | Weighted Signal Aggregation |
| Radical | Deep Reinforcement Learning | Medium-High | Double DQN + Transformer |
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β SIGNAL GENERATION β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β Trend Analysis ββββββββββββββββββββββββββββββββ 35% β
β Momentum Indicators βββββββββββββββββββββββββββ 25% β
β RSI (Relative Strength Index) βββββββββββββββββ 20% β
β MACD (Moving Average Convergence Divergence) ββ 15% β
β Bollinger Bands βββββββββββββββββββββββββββββββ 5% β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β WEIGHTED AGGREGATION β
β β β
β FINAL TRADING SIGNAL β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Features:
- Volatility Targeting: Dynamically adjusts position size based on 15% annualized volatility target
- Drawdown Protection: Reduces exposure when drawdown exceeds 10%
- ATR-based Stops: Stop-loss at 2x ATR, take-profit at 4x ATR
- Time-based Exit: Maximum holding period of 150 bars
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β 24-DIMENSIONAL STATE VECTOR β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β [1-6] Multi-timeframe Momentum (2,3,5,8,13,21 periods) β
β [7-10] Moving Average Position (5,10,20,40 periods) β
β [11-14] Technical Indicators (Vol, RSI, MACD, CCI) β
β [15-18] Volume Features (ratio, trend, correlation, vol) β
β [19-21] Breakout & Trend Strength β
β [22-24] Acceleration, Volatility Change, Position PnL β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β TRANSFORMER SELF-ATTENTION β
β β
β Q = XΒ·Wq K = XΒ·Wk V = XΒ·Wv β
β β
β Attention(Q,K,V) = softmax(QK^T/βd)Β·V β
β β
β Output = X + 0.5 Γ Attention(Q,K,V) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β DOUBLE DQN NETWORK β
β β
β Input(24) β Dense(128) β Dense(64) β Dense(32) β (9) β
β β β β β
β tanh tanh tanh β
β β
β Actions: [-4, -3, -2, -1, 0, +1, +2, +3, +4] β
β (Short) (Hold) (Long) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PRIORITIZED EXPERIENCE REPLAY β
β β
β Priority = |TD-error|^Ξ± (Ξ± = 0.6) β
β Sampling = Priority / Ξ£(Priority) β
β IS Weight = (N Γ P(i))^(-Ξ²) (Ξ² β 1.0) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Features:
- Double DQN: Reduces Q-value overestimation using separate target network
- Transformer Attention: Enhances feature representation with self-attention mechanism
- Prioritized Replay: Samples important experiences more frequently (Ξ±=0.6, Ξ²=0.4β1.0)
- Ξ΅-greedy Exploration: Starts at 25%, decays to 5% minimum
- Dynamic Trailing Stop: 1.8x ATR with profit lock-in at 70%
DRL-MultiFactorTrading/
βββ Conservative_strategy_clean.py # Multi-Factor strategy (streamlined)
βββ Radical_strategy_clean.py # DRL strategy (streamlined)
βββ requirements.txt # Python dependencies
βββ LICENSE # MIT License
βββ README.md # This file
βββ .gitignore # Git ignore rules
βββ .flake8 # Linting configuration
βββ .github/
β βββ workflows/
β βββ ci.yml # CI pipeline (Python 3.9-3.12)
β
βββ radical-01810HK.png # Performance: Xiaomi (01810.HK)
βββ radical-00700HK.png # Performance: Tencent (00700.HK)
# Install dependencies
pip install -r requirements.txtThe strategies are event-driven and self-contained: they implement
on_marketdatafeed(...) and emit orders, and the bundled engine.BacktestEngine
acts as the broker and event loop. No trading account or third-party platform is
needed β just feed it OHLCV bars:
python backtest.py # Conservative on synthetic data
python backtest.py --strategy radical # Radical (numpy Double-DQN)
python backtest.py --strategy ensemble # 50/50 portfolio of both strategies
python backtest.py --csv prices.csv # your own OHLCV CSV (close/high/low/volume)
python backtest.py --ticker 1810.HK # real data via yfinance (pip install yfinance)It prints the trade count plus a full metrics bundle (return, Sharpe, Sortino,
max drawdown, Calmar, ...) computed by strategy_metrics. Example on Xiaomi
(1810.HK) daily bars over two years:
Strategy : conservative
Trades : 57
total_return : 0.2606
sharpe : 1.7380
max_drawdown : 0.0585
--strategy ensemble splits capital across the stable Conservative model and the
aggressive Radical agent and sums their equity curves. Diversifying across the
two smooths the combined curve β on Xiaomi the ensemble keeps most of the upside
while holding max drawdown near the calmer leg:
- conservative : return +0.2606 sharpe +1.7380 trades 57
- radical : return +0.0956 sharpe +0.5904 trades 111
total_return : 0.1781
max_drawdown : 0.0559
sharpe : 1.4041
import backtest
from engine import BacktestEngine
from Conservative_strategy_clean import AlgoEvent
# convenience helper
bars = backtest.synthetic_bars(n=400) # or backtest.csv_bars("prices.csv")
result = backtest.run_backtest("conservative", bars)
print(result.trades, result.metrics["sharpe"])
# the combined portfolio of both strategies
portfolio, legs = backtest.run_ensemble(bars, weights={"conservative": 0.5, "radical": 0.5})
# or drive a strategy instance directly
result = BacktestEngine(initial_capital=1_000_000).run(AlgoEvent(), bars)| Parameter | Default | Description |
|---|---|---|
base_position_pct |
0.35 | Base position size (35% of capital) |
max_position_pct |
0.55 | Maximum position size |
target_volatility |
0.15 | Target annualized volatility (15%) |
stop_loss_atr |
2.0 | Stop-loss in ATR multiples |
take_profit_atr |
4.0 | Take-profit in ATR multiples |
min_gap |
8 | Minimum bars between trades |
| Parameter | Default | Description |
|---|---|---|
base_position_pct |
0.40 | Base position size (40% of capital) |
max_position_pct |
0.70 | Maximum position size |
epsilon |
0.25 | Initial exploration rate |
epsilon_min |
0.05 | Minimum exploration rate |
gamma |
0.97 | Discount factor |
learning_rate |
0.005 | Network learning rate |
buffer_size |
2000 | Replay buffer capacity |
batch_size |
64 | Training batch size |
The signal is computed as a weighted sum of five independent factors:
Final_Signal = Ξ£(Factor_i Γ Weight_i Γ Strength_i)
where:
- Trend: Weight = 0.35, based on MA crossovers (8/20/40)
- Momentum: Weight = 0.25, based on 5/10-bar returns
- RSI: Weight = 0.20, oversold (<35) / overbought (>65)
- MACD: Weight = 0.15, histogram direction
- Bollinger: Weight = 0.05, band breakouts
| Action | Signal | Strength | Interpretation |
|---|---|---|---|
| 0 | -4 | 0.55 | Strong Short |
| 1 | -3 | 0.45 | Medium Short |
| 2 | -2 | 0.35 | Weak Short |
| 3 | -1 | 0.25 | Very Weak Short |
| 4 | 0 | 0.00 | Hold |
| 5 | +1 | 0.25 | Very Weak Long |
| 6 | +2 | 0.35 | Weak Long |
| 7 | +3 | 0.45 | Medium Long |
| 8 | +4 | 0.55 | Strong Long |
Both strategies implement comprehensive risk controls:
# Volatility-adjusted position sizing
if realized_volatility > target_volatility:
position_size *= target_volatility / realized_volatility
# Drawdown protection
if drawdown > 0.10:
position_size *= (1 - drawdown * 0.6)- Stop-Loss: ATR-based dynamic stop (2.0x for Conservative, 1.8x for Radical)
- Take-Profit: ATR-based target (4.0x for Conservative, 5.0x for Radical)
- Trailing Stop: Locks in 50-70% of maximum profit
- Time Stop: Maximum holding period (150 bars Conservative, 60 bars Radical)
- 600+ iterations on Conservative Strategy (parameter optimization, factor weight tuning)
- 400+ experiments on Radical Strategy (network architecture search, hyperparameter tuning)
- 1000+ total backtests across multiple assets and timeframes
- 4+ years of historical data (2020-2024) covering multiple market regimes
- β COVID-19 crash and recovery (2020)
- β Bull market conditions (2021)
- β Bear market stress test (2022)
- β Recovery rally (2023)
- β Recent market conditions (2024)
- Hong Kong Equities: Tencent (00700.HK), Xiaomi (01810.HK), Meituan (03690.HK)
-
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3-56.
-
Mnih, V., et al. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529-533.
-
Van Hasselt, H., Guez, A., & Silver, D. (2016). Deep reinforcement learning with double Q-learning. AAAI Conference on Artificial Intelligence.
-
Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30.
-
Schaul, T., et al. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.
This software is for educational and research purposes only.
- Past performance does not guarantee future results
- Trading involves substantial risk of loss
- The authors are not responsible for any financial losses
- Always conduct thorough backtesting before live trading
- Consult with a qualified financial advisor
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Made with β€οΈ for Quantitative Trading Research

