Skip to content

Koronos/K-Pytorch-Schedulers

Repository files navigation

K-pytorch-schedulers

License: MIT

A collection of custom PyTorch learning rate schedulers designed to improve training convergence and model performance.

Overview

This package provides carefully crafted learning rate schedulers that go beyond PyTorch's built-in options. Each scheduler is optimized for performance and ease of use, offering advanced features for modern deep learning training workflows.

Requirements

  • Python >= 3.7
  • PyTorch >= 1.4.0

Installation

From PyPI (when published)

pip install k-pytorch-schedulers

Directly from GitHub

pip install git+https://github.com/Koronos/K-Pytorch-Schedulers.git

From source (for development)

git clone https://github.com/Koronos/K-Pytorch-Schedulers.git
cd K-Pytorch-Schedulers
pip install -e .

Development installation with test dependencies

pip install -e ".[dev]"

Available Schedulers

1. Cosine Plateau Scheduler

An advanced scheduler combining cosine annealing, warm-up, and plateau steps for optimal training convergence.

Complete Overview Complete visualization showing warmup phase, base LR, min LR, plateau regions with cosine decay between them

Key Features:

  • Linear warm-up phase for training stability
  • Cosine annealing decay with independent segments
  • Plateau steps for maintaining constant LR at critical intervals
  • High performance (~0.5μs per step)
  • Resume support with last_epoch

Quick Example:

import torch
import torch.nn as nn
from k_pytorch_schedulers import CosinePlateauScheduler

# Setup model and optimizer
model = nn.Linear(10, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Create scheduler with all parameters
scheduler = CosinePlateauScheduler(
    optimizer,
    total_steps=10000,
    base_lr=None,  # Use optimizer's LR (default)
    min_lr_ratio=0.1,
    warmup=0.1,  # 10% of total steps (or use absolute: warmup=1000)
    plateau_steps=[(50, 30), (85, 10)],
    lr_scale_factor=1.0,  # Scale all LRs by this factor (default: 1.0)
    last_epoch=-1,  # For resuming training (default: -1)
    verbose=False  # Print LR updates (default: False)
)

# Training loop
for step in range(10000):
    loss = model(data).sum()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    scheduler.step()

Parameters:

  • optimizer (Optimizer): PyTorch optimizer to schedule
  • total_steps (int): Total number of training steps
  • base_lr (float, optional): Base learning rate. If None, uses optimizer's LR
  • min_lr_ratio (float): Minimum LR as ratio of base LR (default: 0.0)
  • warmup (Union[int, float]): Warmup configuration (default: 0)
    • If < 1: Percentage of total_steps (e.g., 0.1 = 10% warmup)
    • If >= 1: Absolute number of warmup steps
  • plateau_steps (List[Tuple[float, float]], optional): List of (position%, duration%) tuples
  • lr_scale_factor (float): Scaling factor for all learning rates (default: 1.0) - Useful for distributed training
  • last_epoch (int): Index of last epoch for resuming (default: -1)
  • verbose (bool): Print messages on updates (default: False)

Use Cases:

  • Large-scale training with gradual warm-up
  • Fine-tuning with controlled LR adjustments
  • Long training runs requiring stable convergence periods
  • Transfer learning to prevent catastrophic forgetting

For detailed documentation and examples, see the examples directory.


2. Cosine Waypoint Scheduler

A flexible scheduler that connects waypoints using cosine curves, supporting both LR increases and decreases with smooth transitions.

Complete Overview Complete visualization showing all key features: warmup, plateau emulation, mixed modes, and smooth cosine transitions

Configuration shown in overview image:

# total_steps = 10000
waypoints = [
    {'position': 0, 'lr': 0.0001},      # Step 0 (int): Warmup start
    {'position': 1000, 'lr_ratio': 1.0},# Step 1000 (int): Warmup end
    {'position': 0.25, 'lr_ratio': 0.6},# 25% (float): Drop to 60%
    {'position': 4000, 'lr': 0.0009},   # Step 4000 (int): Plateau start
    {'position': 6000, 'lr': 0.0009},   # Step 6000 (int): Plateau end - flat region!
    {'position': 0.75, 'lr_ratio': 0.4},# 75% (float): Drop to 40%
    {'position': 1.0, 'lr_ratio': 0.05} # 100% (float): Min LR at 5%
]
# Demonstrates: int (absolute steps) and float (percentage) position formats

Key Features:

  • Unified waypoint-based API - All features (warmup, min_lr, plateaus, schedules) via waypoints
  • Flexible position formats - Use int (absolute steps) or float (percentage)
  • Supports both ratio mode (percentage of base_lr) and absolute mode (fixed LR values)
  • Plateau emulation - Two consecutive waypoints with same LR create a plateau
  • High performance - Pre-computed segments, minimal runtime overhead (~0.0024ms per step)
  • Smooth S-curve transitions between waypoints (zero derivatives at boundaries)
  • Works seamlessly for both LR increases and decreases
  • Mix absolute and ratio modes freely for maximum flexibility
  • Backward compatible tuple format (position, lr_ratio)

Quick Example:

import torch
import torch.nn as nn
from k_pytorch_schedulers import CosineWaypointScheduler

# Setup model and optimizer
model = nn.Linear(10, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Define waypoints - mix int (steps) and float (percentage) positions
waypoints = [
    {'position': 0, 'lr': 0.0},            # Step 0: Warmup start at LR=0
    {'position': 1000, 'lr_ratio': 1.0},   # Step 1000 (int): Warmup to base_lr
    {'position': 0.3, 'lr': 0.0005},       # 30% (float): Drop to fixed LR of 0.0005
    {'position': 6000, 'lr_ratio': 0.8},   # Step 6000 (int): Rise to 80% of base_lr
    {'position': 0.9, 'lr': 0.0002},       # 90% (float): Drop to fixed LR
    {'position': 1.0, 'lr_ratio': 0.1}     # 100% (float): End at 10% of base_lr (min_lr)
]

# Create scheduler with all parameters
scheduler = CosineWaypointScheduler(
    optimizer,
    total_steps=10000,
    waypoints=waypoints,
    base_lr=None,  # Use optimizer's LR (default)
    lr_scale_factor=1.0,  # Scale all LRs by this factor (default: 1.0)
    last_epoch=-1,  # For resuming training (default: -1)
    verbose=False  # Print LR updates (default: False)
)

# Training loop
for step in range(10000):
    loss = model(data).sum()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    scheduler.step()

Position Formats:

Waypoints support flexible position specification:

  • int: Absolute step number (e.g., 1 = step 1, 1000 = step 1000)
  • float ≤ 1.0: Percentage of total steps (e.g., 0.1 = 10%, 1.0 = 100%)
  • float > 1.0: Absolute step number (e.g., 5000.0 = step 5000)

⚠️ Important: To specify 100%, use 1.0 (float), not 1 (int). The value 1 (int) means step 1, while 1.0 (float) means 100% of total steps.

Waypoint Modes:

  1. Ratio Mode: LR as percentage of base_lr

    {'position': 0.5, 'lr_ratio': 0.5}   # 50% progress, 50% of base_lr
    {'position': 5000, 'lr_ratio': 0.8}  # Step 5000, 80% of base_lr
  2. Absolute Mode: Fixed LR value

    {'position': 0.5, 'lr': 0.0005}   # 50% progress, LR = 0.0005 exactly
    {'position': 5000, 'lr': 0.0003}  # Step 5000, LR = 0.0003 exactly
  3. Tuple Format (backward compatible, ratio only):

    (0.5, 0.5)    # 50% progress (float), 50% of base_lr
    (5000, 0.8)   # Step 5000 (int), 80% of base_lr

Parameters:

  • optimizer (Optimizer): PyTorch optimizer to schedule
  • total_steps (int): Total number of training steps
  • waypoints (List): List of waypoint definitions (dict or tuple)
    • Dict format: {'position': %, 'lr': value} or {'position': %, 'lr_ratio': ratio}
    • Tuple format: (position%, lr_ratio) - defaults to ratio mode
  • base_lr (float, optional): Base learning rate for ratio calculations. If None, uses optimizer's LR
  • lr_scale_factor (float): Scaling factor for all learning rates (default: 1.0) - Useful for distributed training
  • last_epoch (int): For resuming training (default: -1)
  • verbose (bool): Print messages on updates (default: False)

Note: Warmup, min_lr, and other features are now implemented via waypoints, providing maximum flexibility with a unified API.

How It Works:

The scheduler uses waypoints to define the entire learning rate trajectory:

  1. Warmup: Define with early waypoints

    {'position': 0, 'lr': 0.0},         # Start at 0
    {'position': 10, 'lr_ratio': 1.0}   # Reach base_lr at 10%
  2. Plateaus: Two consecutive waypoints with same LR

    {'position': 40, 'lr_ratio': 0.5},  # Start plateau at 40%
    {'position': 60, 'lr_ratio': 0.5}   # End plateau at 60% (flat region)
  3. Min LR: Control with the final waypoint

    {'position': 100, 'lr_ratio': 0.1}  # End at 10% of base_lr

All transitions use smooth cosine interpolation for optimal training stability.

Use Cases:

  • Cyclical training patterns (drop, rise, drop)
  • Plateau emulation without separate plateau scheduler
  • Fine-grained control over LR trajectory with warmup
  • Experiments with LR increases during training
  • Custom training schedules for specific model architectures
  • High-performance training requiring minimal scheduler overhead

The cosine interpolation formula ensures:

  • Smooth transitions without abrupt changes
  • Zero derivative at waypoint boundaries (continuous gradient)
  • Predictable and reproducible training dynamics

Mathematical Detail:

For a segment between two waypoints at positions ( p_1 ) and ( p_2 ) with LR values ( lr_1 ) and ( lr_2 ):

[ lr(t) = lr_1 + (lr_2 - lr_1) \cdot \frac{1 - \cos(\pi \cdot \frac{t - p_1}{p_2 - p_1})}{2} ]

This produces a smooth S-curve that:

  • Starts at ( lr_1 ) when ( t = p_1 )
  • Ends at ( lr_2 ) when ( t = p_2 )
  • Has zero slope at both endpoints (smooth connection)

Generate Visualizations:

For documentation-ready overview images:

cd examples
python generate_documentation_graphics.py

For multiple example patterns:

python visualize_waypoint_scheduler.py  # Generates 5 different patterns
python visualize_schedulers.py          # Generates plateau examples

Comparison with PyTorch Built-in Schedulers

Feature CosinePlateauScheduler CosineWaypointScheduler CosineAnnealingLR OneCycleLR
Warm-up ✅ Built-in ✅ Via waypoints
Cosine Decay
LR Increases Limited
Plateau Steps ✅ Explicit ✅ Via waypoints
Waypoint Control
Mixed Ratio/Absolute
Min LR Control ✅ Via waypoints Limited
Independent Segments
Performance ✅ (~0.5μs) ✅ (~0.0024ms)
Unified API

Examples

See the examples directory for:

  • Basic usage examples
  • Visualization tools
  • Integration with training loops
  • Checkpoint/resume patterns

Generate visualizations:

# Cosine Plateau Scheduler
python examples/visualize_schedulers.py

# Cosine Waypoint Scheduler
python examples/visualize_waypoint_scheduler.py

Testing

Run the test suite:

pip install -e ".[dev]"
pytest tests/

Future Schedulers

This package is actively developed with plans to add more schedulers:

  • Custom cyclic schedulers
  • Adaptive schedulers based on loss/metrics
  • Multi-phase training schedulers
  • And more...

Suggestions and contributions are welcome!

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Whether it's:

  • Adding new schedulers
  • Improving documentation
  • Reporting bugs
  • Suggesting features

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this package in your research, please cite:

@software{k_pytorch_schedulers,
  author = {Koronos},
  title = {K-Pytorch-Schedulers: A Collection of Custom PyTorch Learning Rate Schedulers},
  year = {2025},
  url = {https://github.com/Koronos/K-Pytorch-Schedulers}
}

Support

For issues, questions, or suggestions, please open an issue on GitHub.

Acknowledgments

Inspired by various successful training strategies in deep learning research and community feedback.

About

High-performance PyTorch LR schedulers with cosine annealing, flexible waypoints, plateau steps, and LR scaling. Unified API with pre-computed segments for zero runtime overhead.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages