Skip to content

Latest commit

 

History

History
505 lines (376 loc) · 15.4 KB

File metadata and controls

505 lines (376 loc) · 15.4 KB

NLBT — Natural Language Backtesting

Ask DeepWiki

Turn plain English into professional backtesting reports in minutes.
Describe your trading strategy in natural language. Get Python code, backtest results, and professional reports. No coding required.

🆕 v0.3.0: Now powered by 8 LLM-driven intelligence features with multilingual support and dramatically improved performance!


🚀 Quick Start

# 1. Install
git clone https://github.com/yourusername/nlbt && cd nlbt
pip install -e .

# 2. Configure LLM
llm keys set openrouter
llm models default openrouter/anthropic/claude-3.5-sonnet

# 3. Run
nlbt

Try it: Type "Buy and hold AAPL in 2024 with $10,000" and press enter.


✨ Key Benefits

Feature Benefit
💬 Natural Language Describe strategies in plain English - no coding needed
🧠 LLM-Powered Intelligence 8 AI features: smart extraction, validation, multilingual reports
🌍 Multilingual Support Generate reports in any language (Spanish, Hindi, etc.)
High Performance Dramatically improved strategy execution (up to 24x better returns)
🔄 Self-Correcting Auto-retries with intelligent error diagnosis
📊 Professional Reports Markdown + PDF with metrics, charts, and full code
🔧 Clean Architecture LLM-first design with 20% less code, more intelligence

🚀 What's New in v0.3.0

Major Architecture Overhaul: Complete "extreme promptification" with 8 LLM-powered intelligence features:

🧠 LLM-Powered Features

  • Smart Title Generation: Dynamic, context-aware report titles
  • Intelligent Requirement Extraction: Structured parsing from natural language
  • Flexible User Intent Detection: Understands "yes", "go", "proceed" variations
  • Adaptive Result Validation: Evaluates backtest quality intelligently
  • Multilingual Section Naming: Localized headings for any language
  • Smart Column Detection: Automatically finds best DataFrame columns
  • Dynamic Clarification Limits: Stops asking when enough info gathered
  • Targeted Error Diagnosis: Analyzes errors and suggests specific fixes

📈 Performance Impact

Real-world example: Same NVDA RSI strategy

  • Before v0.3.0: 10% return (1 trade)
  • After v0.3.0: 240% return (multiple trades)
  • 24x improvement in strategy execution quality

🏗️ Architecture Improvements

  • 20% less code: Removed 311 lines of redundant logic
  • LLM-first design: Intelligent reasoning replaces hardcoded rules
  • Clean fallbacks: Simple backups instead of complex regex patterns
  • Zero breaking changes: Seamless upgrade path

📥 What You Get

Input → Output

You type:

"NVDA RSI strategy: buy when RSI drops below 30 with larger positions when RSI is lower, sell when RSI goes above 70, use 2023 data with $50000 capital"

You get (in reports/NVDA_2023_<timestamp>/):

📁 See actual example: reports/EXAMPLE_NVDA_2023/
📄 View report: report.md | report.pdf
💻 View code: strategy.py

📊 Professional Report (report.md / report.pdf)

# NVDA 2023 Trading Strategy

Initial Capital: $50,000 → Final Equity: $55,039.29 → Gain: +$5,039.29 (+10.08%)

## Summary
- Test Period: 2023-01-03 to 2023-12-29 (360 days)
- Strategy: RSI Mean Reversion with Dynamic Position Sizing
- Total Return: 10.08% vs Buy & Hold 158.14%
- Risk Metrics: Sharpe 1.54, Max Drawdown -2.80%

## Strategy Implementation
- Entry: Buy when RSI < 30 with position scaling
- Position Size: Larger positions when RSI is lower (1x to 2x)
- Exit: Sell when RSI > 70
- Risk Management: 95% max equity exposure

## Performance Metrics
- Alpha: 7.80% (significant outperformance vs risk)
- Beta: 0.01439 (low market correlation)
- Calmar Ratio: 3.63 (excellent risk-adjusted returns)
- Win Rate: 100% (1 successful trade)

[Full analysis with code implementation]

💻 Executable Code (strategy.py)

# Generated by NLBT - NVDA RSI Strategy with Dynamic Position Sizing

from backtesting import Backtest, Strategy
import numpy as np
import pandas as pd

def RSI(array, n=14):
    """Helper for RSI calculation"""
    delta = pd.Series(array).diff()
    gain = (delta.where(delta > 0, 0)).rolling(n).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(n).mean()
    rs = gain / loss
    return (100 - (100 / (1 + rs))).to_numpy()

class MyStrategy(Strategy):
    def init(self):
        self.rsi = self.I(RSI, self.data.Close, 14)
    
    def next(self):
        if not self.position:
            if self.rsi[-1] < 30:
                # Dynamic position sizing based on RSI
                rsi_scale = (30 - self.rsi[-1]) / 30  # 0 to 1 scale
                position_size = 1 + rsi_scale  # 1x to 2x sizing
                
                units = int((self.equity * 0.95 * position_size) / self.data.Close[-1])
                if units > 0:
                    self.buy(size=units)
        
        elif self.position and self.rsi[-1] > 70:
            self.position.close()

# Execute backtest
data = get_ohlcv_data('NVDA', '2023-01-01', '2023-12-31')
bt = Backtest(data, MyStrategy, cash=50000)
stats = bt.run()

🔍 Debug & Agent Logs

  • debug.log - Execution trace for troubleshooting
  • agent.log - Full LLM context for iteration (~6-8K words)

⚠️ Important Notes

  • Safety: This tool runs AI-generated Python code locally. Use in trusted environments only.
  • Status: Functional for single-ticker strategies. APIs may change without notice.
  • Limitations: Multi-asset portfolios not yet supported. Works best with clear strategy descriptions.

Requirements

  • Python 3.8+
  • OpenRouter account (recommended) or OpenAI/Anthropic
  • 5 minutes for setup

Install & Setup

1. Clone and install everything

git clone https://github.com/yourusername/nlbt
cd nlbt
pip install -e .

This installs all dependencies including llm CLI, backtesting, ta, and more

2. Set up OpenRouter (recommended)

Why OpenRouter? Cost control, multiple models, spending limits

  1. Create account: Go to https://openrouter.ai/
  2. Get API key: Click "Keys" → "Create Key"
  3. Add credits: Add $5-10 (you'll use <$1 for examples)
  4. Set spending limit: Optional but recommended
  5. Configure locally:
llm keys set openrouter
# Paste your API key when prompted

llm models default openrouter/anthropic/claude-3.5-sonnet

3. Quick test

nlbt

Try: "Buy and hold AAPL in 2024 with $1000"

What you should see:

  • Agent asks clarifying questions (if needed)
  • Shows "Phase 1 - Understanding" → "Phase 2 - Implementation" → "Phase 3 - Reporting"
  • Saves report to reports/<TICKER>_<PERIOD>_<TIMESTAMP>/report.md (+ PDF)
  • Takes 2-3 minutes total

💬 Usage

nlbt                    # Start interactive session

In-chat commands:

  • info - Show current phase and requirements
  • debug - Show internal state
  • lucky - Quick demo with AAPL
  • exit - Quit

Language preference

  • Set report language: Include lang <language> or language: <language> anywhere in your message to generate the entire report (including the TL;DR) in that language. Defaults to English if omitted.

Example:

💭 You: Buy and hold AAPL in 2024 with $10,000; lang Spanish

🔄 How It Works

NLBT uses a 3-phase agentic workflow with automatic error recovery:

Simple Overview

  1. 🔍 Understanding - Chat with AI to gather requirements (ticker, period, capital, strategy)
  2. ⚙️ Implementation - AI generates Python code, tests it, and auto-retries if needed
  3. 📊 Reporting - AI creates professional analysis with metrics and insights

Visual Workflow

Click to see detailed architecture diagram

Color Key:

  • Purple = User actions | Yellow = LLM actions | Green = System/sandbox
  • Orange = Decisions | Teal = Phase states | Gray = Outputs
graph TD
    Start([User describes strategy]) --> P1[Phase 1: Understanding]
    P1 --> Extract[Extract requirements from conversation]
    Extract --> Check{Complete &<br/>implementable?}
    
    Check -->|Missing/unclear| Ask[Ask clarifying questions]
    Ask --> P1
    
    Check -->|Complete & valid| Ready[Ready to Implement]
    Ready --> Present[Present plan to user]
    Present --> Response{User response}
    
    Response -->|Anything else| BackToP1[Return to understanding]
    BackToP1 --> P1
    Response -->|Yes/Go| P2[Phase 2: Implementation]
    
    P2 --> Plan[Plan: LLM creates implementation plan]
    Plan --> Code[Producer: Generate Python code]
    Code --> Test[Test: Validate syntax & imports]
    Test --> Execute[Execute: Run in sandbox]
    Execute --> Critic[Critic: Evaluate results]
    Critic --> Decision{Critic decision}
    
    Decision -->|PASS| P3[Phase 3: Reporting]
    Decision -->|RETRY| Count{Attempt < 3?}
    Count -->|Yes| Plan
    Count -->|No| FailBack[Show error & return to understanding]
    FailBack --> P1
    
    P3 --> ReportPlan[Plan: Structure report]
    ReportPlan --> Write[Write: Generate markdown]
    Write --> Refine[Refine: Polish & save]
    Refine --> Done([Report saved])

    %% Role-based styling
    classDef user fill:#d1c4e9,stroke:#7e57c2,color:#4a148c;
    classDef llm fill:#fff9c4,stroke:#fbc02d,color:#6d4c41;
    classDef system fill:#e8f5e9,stroke:#43a047,color:#1b5e20;
    classDef decision fill:#ffccbc,stroke:#e64a19,color:#bf360c;
    classDef userInput fill:#e1bee7,stroke:#8e24aa,color:#4a148c;
    classDef state fill:#b2dfdb,stroke:#00897b,color:#004d40;
    classDef output fill:#eceff1,stroke:#90a4ae,color:#37474f;

    %% Assign roles
    class Start user;
    class P1,P2,P3,Ready state;
    class Extract,Ask,Plan,Code,Critic,ReportPlan,Write,Refine llm;
    class Test,Execute,Present system;
    class Check,Decision,Count decision;
    class Response userInput;
    class Done output;
    class BackToP1,FailBack system;
Loading

Key Features

  • Smart Confirmation: Say "yes" to proceed, anything else returns to conversation
  • Auto-Retry: Up to 3 attempts with error feedback
  • Error Recovery: After failures, returns to chat with error context
  • Producer-Critic Pattern: Separate AI for generation and evaluation (reduces bias)

Examples (realistic conversations)

Simple Buy & Hold

💭 You: Buy and hold AAPL in 2024 with $10,000

🤖 Perfect! I have everything needed:
   • Ticker: AAPL
   • Period: 2024
   • Capital: $10,000
   • Strategy: Buy and hold
   
   Ready to implement? (yes/no/change/explain)

💭 You: yes

🤖 ⚙️ Phase 2: Implementation
   [Attempt 1/3] Planning implementation...
   [Attempt 1/3] Generating Python code...
   [Attempt 1/3] Testing code...
   [Attempt 1/3] Executing backtest...
   [Attempt 1/3] Results: Return 38.88%, Sharpe 1.25, Max DD -15.26%
   ✅ Critic: PASS - Proceeding to reporting

🤖 📊 Phase 3: Reporting
   Planning report structure...
   Writing comprehensive analysis...
   ✅ Report saved: reports/AAPL_2024_20241002_123456/report.md

RSI Strategy

💭 You: NVDA with RSI: buy when RSI < 30, sell when RSI > 70, 2023, $20K

🤖 STATUS: READY
   • Ticker: NVDA
   • Period: 2023  
   • Capital: $20,000
   • Strategy: RSI mean reversion (buy <30, sell >70)
   
   Proceed? (yes/no/change/explain)

💭 You: yes

🤖 [Generates → Tests → ✓ PASS]
   ✅ Report: reports/NVDA_2023_20241002_143012/report.md

Progressive Disclosure (provide info gradually)

💭 You: Test a moving average crossover

🤖 I'll help you backtest a moving average strategy! I need:
   • Which ticker? (e.g., AAPL, SPY, TSLA)
   • Time period? (e.g., 2024, 2020-2023)
   • Initial capital? (e.g., $10,000)
   • Which MA periods? (e.g., 50/200 day)

💭 You: SPY, 2024, $25K

🤖 Great! Which moving average periods? (e.g., 20/50, 50/200)

💭 You: use 50/200 day crossover

🤖 Perfect! All set. Ready to proceed?

💭 You: yes

🔧 Troubleshooting

Common Issues & Solutions

"Unknown model" error

llm models list                    # See available models
llm models default [model-name]    # Set default

"LLM failed" or timeout

  • Check API key: llm keys list
  • Check OpenRouter credits/limits
  • Try simpler strategy description
  • Use debug command to see internal state

"No data found" error

  • Verify ticker symbol (use Yahoo Finance format)
  • Ensure date range is in the past
  • Try different dates or ticker

Code execution fails

  • Agent will auto-retry up to 3 times
  • If still failing, simplify your strategy
  • Use info to see what requirements were gathered
  • Check for typos in ticker/dates

General debugging

  • Use info command to see current phase
  • Use debug command to see conversation history
  • Check reports/ folder for any partial outputs
  • Restart with exit and try again
Alternative LLM Providers

OpenAI:

llm keys set openai
llm models default gpt-4o-mini

Anthropic:

llm keys set anthropic  
llm models default claude-3-5-sonnet-20241022

🤝 Contributing

Contributions welcome! Areas of interest:

  • Multi-asset portfolio backtesting
  • Additional technical indicators
  • Parameter optimization
  • Risk management strategies
  • Interactive visualizations

See issues or open a PR!


📄 License

GPL-3.0 License. See LICENSE.

This is copyleft software - any derivative works must also be open source under GPL-3.0.


🏗️ Technical Details

Project Structure
src/nlbt/
├── cli.py              # Interactive CLI with rich formatting
├── reflection.py       # 3-phase reflection engine
├── llm.py              # LLM wrapper using `llm` CLI
└── sandbox.py          # Safe code execution

reports/                # Generated backtest reports
├── <TICKER>_<PERIOD>_<TIMESTAMP>/
│   ├── report.md       # User: Professional report
│   ├── report.pdf      # User: PDF version
│   ├── strategy.py     # Developer: Executable code
│   ├── debug.log       # Developer: Execution trace
│   └── agent.log       # Agent: Full LLM context
└── EXAMPLE_*/          # Sample outputs

tests/                  # Unit and integration tests
Architecture & Design Patterns

This project implements several Agentic Design Patterns:

  • Reflection Pattern: 3-phase autonomous workflow with LLM controlling transitions
  • Producer-Critic Pattern: Separate models for generation and evaluation (avoids confirmation bias)
  • Planning Pattern: Phase 2 plans before coding; Phase 3 plans before writing
  • Tool Use Pattern: Sandbox execution, data fetching, indicator calculations
  • Prompt Chaining: Phase transitions chain prompts with context
  • Error Recovery: Auto-retry loop (max 3 attempts) with error feedback
  • Checkpoint Pattern: Three-tier output (user/developer/agent) for reproducibility

See cursor_chats/Agentic_Design_Patterns_Complete.md for detailed documentation.