A collection of custom PyTorch learning rate schedulers designed to improve training convergence and model performance.
This package provides carefully crafted learning rate schedulers that go beyond PyTorch's built-in options. Each scheduler is optimized for performance and ease of use, offering advanced features for modern deep learning training workflows.
- Python >= 3.7
- PyTorch >= 1.4.0
pip install k-pytorch-schedulerspip install git+https://github.com/Koronos/K-Pytorch-Schedulers.gitgit clone https://github.com/Koronos/K-Pytorch-Schedulers.git
cd K-Pytorch-Schedulers
pip install -e .pip install -e ".[dev]"An advanced scheduler combining cosine annealing, warm-up, and plateau steps for optimal training convergence.
Complete visualization showing warmup phase, base LR, min LR, plateau regions with cosine decay between them
Key Features:
- Linear warm-up phase for training stability
- Cosine annealing decay with independent segments
- Plateau steps for maintaining constant LR at critical intervals
- High performance (~0.5μs per step)
- Resume support with
last_epoch
Quick Example:
import torch
import torch.nn as nn
from k_pytorch_schedulers import CosinePlateauScheduler
# Setup model and optimizer
model = nn.Linear(10, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Create scheduler with all parameters
scheduler = CosinePlateauScheduler(
optimizer,
total_steps=10000,
base_lr=None, # Use optimizer's LR (default)
min_lr_ratio=0.1,
warmup=0.1, # 10% of total steps (or use absolute: warmup=1000)
plateau_steps=[(50, 30), (85, 10)],
lr_scale_factor=1.0, # Scale all LRs by this factor (default: 1.0)
last_epoch=-1, # For resuming training (default: -1)
verbose=False # Print LR updates (default: False)
)
# Training loop
for step in range(10000):
loss = model(data).sum()
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()Parameters:
optimizer(Optimizer): PyTorch optimizer to scheduletotal_steps(int): Total number of training stepsbase_lr(float, optional): Base learning rate. If None, uses optimizer's LRmin_lr_ratio(float): Minimum LR as ratio of base LR (default: 0.0)warmup(Union[int, float]): Warmup configuration (default: 0)- If < 1: Percentage of total_steps (e.g., 0.1 = 10% warmup)
- If >= 1: Absolute number of warmup steps
plateau_steps(List[Tuple[float, float]], optional): List of (position%, duration%) tupleslr_scale_factor(float): Scaling factor for all learning rates (default: 1.0) - Useful for distributed traininglast_epoch(int): Index of last epoch for resuming (default: -1)verbose(bool): Print messages on updates (default: False)
Use Cases:
- Large-scale training with gradual warm-up
- Fine-tuning with controlled LR adjustments
- Long training runs requiring stable convergence periods
- Transfer learning to prevent catastrophic forgetting
For detailed documentation and examples, see the examples directory.
A flexible scheduler that connects waypoints using cosine curves, supporting both LR increases and decreases with smooth transitions.
Complete visualization showing all key features: warmup, plateau emulation, mixed modes, and smooth cosine transitions
Configuration shown in overview image:
# total_steps = 10000
waypoints = [
{'position': 0, 'lr': 0.0001}, # Step 0 (int): Warmup start
{'position': 1000, 'lr_ratio': 1.0},# Step 1000 (int): Warmup end
{'position': 0.25, 'lr_ratio': 0.6},# 25% (float): Drop to 60%
{'position': 4000, 'lr': 0.0009}, # Step 4000 (int): Plateau start
{'position': 6000, 'lr': 0.0009}, # Step 6000 (int): Plateau end - flat region!
{'position': 0.75, 'lr_ratio': 0.4},# 75% (float): Drop to 40%
{'position': 1.0, 'lr_ratio': 0.05} # 100% (float): Min LR at 5%
]
# Demonstrates: int (absolute steps) and float (percentage) position formatsKey Features:
- Unified waypoint-based API - All features (warmup, min_lr, plateaus, schedules) via waypoints
- Flexible position formats - Use int (absolute steps) or float (percentage)
- Supports both ratio mode (percentage of base_lr) and absolute mode (fixed LR values)
- Plateau emulation - Two consecutive waypoints with same LR create a plateau
- High performance - Pre-computed segments, minimal runtime overhead (~0.0024ms per step)
- Smooth S-curve transitions between waypoints (zero derivatives at boundaries)
- Works seamlessly for both LR increases and decreases
- Mix absolute and ratio modes freely for maximum flexibility
- Backward compatible tuple format
(position, lr_ratio)
Quick Example:
import torch
import torch.nn as nn
from k_pytorch_schedulers import CosineWaypointScheduler
# Setup model and optimizer
model = nn.Linear(10, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
# Define waypoints - mix int (steps) and float (percentage) positions
waypoints = [
{'position': 0, 'lr': 0.0}, # Step 0: Warmup start at LR=0
{'position': 1000, 'lr_ratio': 1.0}, # Step 1000 (int): Warmup to base_lr
{'position': 0.3, 'lr': 0.0005}, # 30% (float): Drop to fixed LR of 0.0005
{'position': 6000, 'lr_ratio': 0.8}, # Step 6000 (int): Rise to 80% of base_lr
{'position': 0.9, 'lr': 0.0002}, # 90% (float): Drop to fixed LR
{'position': 1.0, 'lr_ratio': 0.1} # 100% (float): End at 10% of base_lr (min_lr)
]
# Create scheduler with all parameters
scheduler = CosineWaypointScheduler(
optimizer,
total_steps=10000,
waypoints=waypoints,
base_lr=None, # Use optimizer's LR (default)
lr_scale_factor=1.0, # Scale all LRs by this factor (default: 1.0)
last_epoch=-1, # For resuming training (default: -1)
verbose=False # Print LR updates (default: False)
)
# Training loop
for step in range(10000):
loss = model(data).sum()
optimizer.zero_grad()
loss.backward()
optimizer.step()
scheduler.step()Position Formats:
Waypoints support flexible position specification:
- int: Absolute step number (e.g.,
1= step 1,1000= step 1000) - float ≤ 1.0: Percentage of total steps (e.g.,
0.1= 10%,1.0= 100%) - float > 1.0: Absolute step number (e.g.,
5000.0= step 5000)
1.0 (float), not 1 (int). The value 1 (int) means step 1, while 1.0 (float) means 100% of total steps.
Waypoint Modes:
-
Ratio Mode: LR as percentage of base_lr
{'position': 0.5, 'lr_ratio': 0.5} # 50% progress, 50% of base_lr {'position': 5000, 'lr_ratio': 0.8} # Step 5000, 80% of base_lr -
Absolute Mode: Fixed LR value
{'position': 0.5, 'lr': 0.0005} # 50% progress, LR = 0.0005 exactly {'position': 5000, 'lr': 0.0003} # Step 5000, LR = 0.0003 exactly -
Tuple Format (backward compatible, ratio only):
(0.5, 0.5) # 50% progress (float), 50% of base_lr (5000, 0.8) # Step 5000 (int), 80% of base_lr
Parameters:
optimizer(Optimizer): PyTorch optimizer to scheduletotal_steps(int): Total number of training stepswaypoints(List): List of waypoint definitions (dict or tuple)- Dict format:
{'position': %, 'lr': value}or{'position': %, 'lr_ratio': ratio} - Tuple format:
(position%, lr_ratio)- defaults to ratio mode
- Dict format:
base_lr(float, optional): Base learning rate for ratio calculations. If None, uses optimizer's LRlr_scale_factor(float): Scaling factor for all learning rates (default: 1.0) - Useful for distributed traininglast_epoch(int): For resuming training (default: -1)verbose(bool): Print messages on updates (default: False)
Note: Warmup, min_lr, and other features are now implemented via waypoints, providing maximum flexibility with a unified API.
How It Works:
The scheduler uses waypoints to define the entire learning rate trajectory:
-
Warmup: Define with early waypoints
{'position': 0, 'lr': 0.0}, # Start at 0 {'position': 10, 'lr_ratio': 1.0} # Reach base_lr at 10% -
Plateaus: Two consecutive waypoints with same LR
{'position': 40, 'lr_ratio': 0.5}, # Start plateau at 40% {'position': 60, 'lr_ratio': 0.5} # End plateau at 60% (flat region) -
Min LR: Control with the final waypoint
{'position': 100, 'lr_ratio': 0.1} # End at 10% of base_lr
All transitions use smooth cosine interpolation for optimal training stability.
Use Cases:
- Cyclical training patterns (drop, rise, drop)
- Plateau emulation without separate plateau scheduler
- Fine-grained control over LR trajectory with warmup
- Experiments with LR increases during training
- Custom training schedules for specific model architectures
- High-performance training requiring minimal scheduler overhead
The cosine interpolation formula ensures:
- Smooth transitions without abrupt changes
- Zero derivative at waypoint boundaries (continuous gradient)
- Predictable and reproducible training dynamics
Mathematical Detail:
For a segment between two waypoints at positions ( p_1 ) and ( p_2 ) with LR values ( lr_1 ) and ( lr_2 ):
[ lr(t) = lr_1 + (lr_2 - lr_1) \cdot \frac{1 - \cos(\pi \cdot \frac{t - p_1}{p_2 - p_1})}{2} ]
This produces a smooth S-curve that:
- Starts at ( lr_1 ) when ( t = p_1 )
- Ends at ( lr_2 ) when ( t = p_2 )
- Has zero slope at both endpoints (smooth connection)
Generate Visualizations:
For documentation-ready overview images:
cd examples
python generate_documentation_graphics.pyFor multiple example patterns:
python visualize_waypoint_scheduler.py # Generates 5 different patterns
python visualize_schedulers.py # Generates plateau examples| Feature | CosinePlateauScheduler | CosineWaypointScheduler | CosineAnnealingLR | OneCycleLR |
|---|---|---|---|---|
| Warm-up | ✅ Built-in | ✅ Via waypoints | ❌ | ✅ |
| Cosine Decay | ✅ | ✅ | ✅ | ✅ |
| LR Increases | ❌ | ✅ | ❌ | Limited |
| Plateau Steps | ✅ Explicit | ✅ Via waypoints | ❌ | ❌ |
| Waypoint Control | ❌ | ✅ | ❌ | ❌ |
| Mixed Ratio/Absolute | ❌ | ✅ | ❌ | ❌ |
| Min LR Control | ✅ | ✅ Via waypoints | ✅ | Limited |
| Independent Segments | ✅ | ✅ | ❌ | ❌ |
| Performance | ✅ (~0.5μs) | ✅ (~0.0024ms) | ✅ | ✅ |
| Unified API | ❌ | ✅ | ❌ | ❌ |
See the examples directory for:
- Basic usage examples
- Visualization tools
- Integration with training loops
- Checkpoint/resume patterns
Generate visualizations:
# Cosine Plateau Scheduler
python examples/visualize_schedulers.py
# Cosine Waypoint Scheduler
python examples/visualize_waypoint_scheduler.pyRun the test suite:
pip install -e ".[dev]"
pytest tests/This package is actively developed with plans to add more schedulers:
- Custom cyclic schedulers
- Adaptive schedulers based on loss/metrics
- Multi-phase training schedulers
- And more...
Suggestions and contributions are welcome!
Contributions are welcome! Please feel free to submit a Pull Request. Whether it's:
- Adding new schedulers
- Improving documentation
- Reporting bugs
- Suggesting features
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this package in your research, please cite:
@software{k_pytorch_schedulers,
author = {Koronos},
title = {K-Pytorch-Schedulers: A Collection of Custom PyTorch Learning Rate Schedulers},
year = {2025},
url = {https://github.com/Koronos/K-Pytorch-Schedulers}
}For issues, questions, or suggestions, please open an issue on GitHub.
Inspired by various successful training strategies in deep learning research and community feedback.