K-pytorch-schedulers

A collection of custom PyTorch learning rate schedulers designed to improve training convergence and model performance.

Overview

This package provides carefully crafted learning rate schedulers that go beyond PyTorch's built-in options. Each scheduler is optimized for performance and ease of use, offering advanced features for modern deep learning training workflows.

Requirements

Python >= 3.7
PyTorch >= 1.4.0

Installation

From PyPI (when published)

pip install k-pytorch-schedulers

Directly from GitHub

pip install git+https://github.com/Koronos/K-Pytorch-Schedulers.git

From source (for development)

git clone https://github.com/Koronos/K-Pytorch-Schedulers.git
cd K-Pytorch-Schedulers
pip install -e .

Development installation with test dependencies

pip install -e ".[dev]"

Available Schedulers

1. Cosine Plateau Scheduler

An advanced scheduler combining cosine annealing, warm-up, and plateau steps for optimal training convergence.

Complete visualization showing warmup phase, base LR, min LR, plateau regions with cosine decay between them

Key Features:

Linear warm-up phase for training stability
Cosine annealing decay with independent segments
Plateau steps for maintaining constant LR at critical intervals
High performance (~0.5μs per step)
Resume support with last_epoch

Quick Example:

import torch
import torch.nn as nn
from k_pytorch_schedulers import CosinePlateauScheduler

# Setup model and optimizer
model = nn.Linear(10, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Create scheduler with all parameters
scheduler = CosinePlateauScheduler(
    optimizer,
    total_steps=10000,
    base_lr=None,  # Use optimizer's LR (default)
    min_lr_ratio=0.1,
    warmup=0.1,  # 10% of total steps (or use absolute: warmup=1000)
    plateau_steps=[(50, 30), (85, 10)],
    lr_scale_factor=1.0,  # Scale all LRs by this factor (default: 1.0)
    last_epoch=-1,  # For resuming training (default: -1)
    verbose=False  # Print LR updates (default: False)
)

# Training loop
for step in range(10000):
    loss = model(data).sum()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    scheduler.step()

Parameters:

optimizer (Optimizer): PyTorch optimizer to schedule
total_steps (int): Total number of training steps
base_lr (float, optional): Base learning rate. If None, uses optimizer's LR
min_lr_ratio (float): Minimum LR as ratio of base LR (default: 0.0)
warmup (Union[int, float]): Warmup configuration (default: 0)
- If < 1: Percentage of total_steps (e.g., 0.1 = 10% warmup)
- If >= 1: Absolute number of warmup steps
plateau_steps (List[Tuple[float, float]], optional): List of (position%, duration%) tuples
lr_scale_factor (float): Scaling factor for all learning rates (default: 1.0) - Useful for distributed training
last_epoch (int): Index of last epoch for resuming (default: -1)
verbose (bool): Print messages on updates (default: False)

Use Cases:

Large-scale training with gradual warm-up
Fine-tuning with controlled LR adjustments
Long training runs requiring stable convergence periods
Transfer learning to prevent catastrophic forgetting

For detailed documentation and examples, see the examples directory.

2. Cosine Waypoint Scheduler

A flexible scheduler that connects waypoints using cosine curves, supporting both LR increases and decreases with smooth transitions.

Complete visualization showing all key features: warmup, plateau emulation, mixed modes, and smooth cosine transitions

Configuration shown in overview image:

# total_steps = 10000
waypoints = [
    {'position': 0, 'lr': 0.0001},      # Step 0 (int): Warmup start
    {'position': 1000, 'lr_ratio': 1.0},# Step 1000 (int): Warmup end
    {'position': 0.25, 'lr_ratio': 0.6},# 25% (float): Drop to 60%
    {'position': 4000, 'lr': 0.0009},   # Step 4000 (int): Plateau start
    {'position': 6000, 'lr': 0.0009},   # Step 6000 (int): Plateau end - flat region!
    {'position': 0.75, 'lr_ratio': 0.4},# 75% (float): Drop to 40%
    {'position': 1.0, 'lr_ratio': 0.05} # 100% (float): Min LR at 5%
]
# Demonstrates: int (absolute steps) and float (percentage) position formats

Key Features:

Unified waypoint-based API - All features (warmup, min_lr, plateaus, schedules) via waypoints
Flexible position formats - Use int (absolute steps) or float (percentage)
Supports both ratio mode (percentage of base_lr) and absolute mode (fixed LR values)
Plateau emulation - Two consecutive waypoints with same LR create a plateau
High performance - Pre-computed segments, minimal runtime overhead (~0.0024ms per step)
Smooth S-curve transitions between waypoints (zero derivatives at boundaries)
Works seamlessly for both LR increases and decreases
Mix absolute and ratio modes freely for maximum flexibility
Backward compatible tuple format (position, lr_ratio)

Quick Example:

import torch
import torch.nn as nn
from k_pytorch_schedulers import CosineWaypointScheduler

# Setup model and optimizer
model = nn.Linear(10, 2)
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Define waypoints - mix int (steps) and float (percentage) positions
waypoints = [
    {'position': 0, 'lr': 0.0},            # Step 0: Warmup start at LR=0
    {'position': 1000, 'lr_ratio': 1.0},   # Step 1000 (int): Warmup to base_lr
    {'position': 0.3, 'lr': 0.0005},       # 30% (float): Drop to fixed LR of 0.0005
    {'position': 6000, 'lr_ratio': 0.8},   # Step 6000 (int): Rise to 80% of base_lr
    {'position': 0.9, 'lr': 0.0002},       # 90% (float): Drop to fixed LR
    {'position': 1.0, 'lr_ratio': 0.1}     # 100% (float): End at 10% of base_lr (min_lr)
]

# Create scheduler with all parameters
scheduler = CosineWaypointScheduler(
    optimizer,
    total_steps=10000,
    waypoints=waypoints,
    base_lr=None,  # Use optimizer's LR (default)
    lr_scale_factor=1.0,  # Scale all LRs by this factor (default: 1.0)
    last_epoch=-1,  # For resuming training (default: -1)
    verbose=False  # Print LR updates (default: False)
)

# Training loop
for step in range(10000):
    loss = model(data).sum()
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    scheduler.step()

Position Formats:

Waypoints support flexible position specification:

int: Absolute step number (e.g., 1 = step 1, 1000 = step 1000)
float ≤ 1.0: Percentage of total steps (e.g., 0.1 = 10%, 1.0 = 100%)
float > 1.0: Absolute step number (e.g., 5000.0 = step 5000)

⚠️ Important: To specify 100%, use 1.0 (float), not 1 (int). The value 1 (int) means step 1, while 1.0 (float) means 100% of total steps.

Waypoint Modes:

Ratio Mode: LR as percentage of base_lr

{'position': 0.5, 'lr_ratio': 0.5}   # 50% progress, 50% of base_lr
{'position': 5000, 'lr_ratio': 0.8}  # Step 5000, 80% of base_lr

Absolute Mode: Fixed LR value

{'position': 0.5, 'lr': 0.0005}   # 50% progress, LR = 0.0005 exactly
{'position': 5000, 'lr': 0.0003}  # Step 5000, LR = 0.0003 exactly

Tuple Format (backward compatible, ratio only):

(0.5, 0.5)    # 50% progress (float), 50% of base_lr
(5000, 0.8)   # Step 5000 (int), 80% of base_lr

Parameters:

optimizer (Optimizer): PyTorch optimizer to schedule
total_steps (int): Total number of training steps
waypoints (List): List of waypoint definitions (dict or tuple)
- Dict format: {'position': %, 'lr': value} or {'position': %, 'lr_ratio': ratio}
- Tuple format: (position%, lr_ratio) - defaults to ratio mode
base_lr (float, optional): Base learning rate for ratio calculations. If None, uses optimizer's LR
lr_scale_factor (float): Scaling factor for all learning rates (default: 1.0) - Useful for distributed training
last_epoch (int): For resuming training (default: -1)
verbose (bool): Print messages on updates (default: False)

Note: Warmup, min_lr, and other features are now implemented via waypoints, providing maximum flexibility with a unified API.

How It Works:

The scheduler uses waypoints to define the entire learning rate trajectory:

Warmup: Define with early waypoints

{'position': 0, 'lr': 0.0},         # Start at 0
{'position': 10, 'lr_ratio': 1.0}   # Reach base_lr at 10%

Plateaus: Two consecutive waypoints with same LR

{'position': 40, 'lr_ratio': 0.5},  # Start plateau at 40%
{'position': 60, 'lr_ratio': 0.5}   # End plateau at 60% (flat region)

Min LR: Control with the final waypoint

{'position': 100, 'lr_ratio': 0.1}  # End at 10% of base_lr

All transitions use smooth cosine interpolation for optimal training stability.

Use Cases:

Cyclical training patterns (drop, rise, drop)
Plateau emulation without separate plateau scheduler
Fine-grained control over LR trajectory with warmup
Experiments with LR increases during training
Custom training schedules for specific model architectures
High-performance training requiring minimal scheduler overhead

The cosine interpolation formula ensures:

Smooth transitions without abrupt changes
Zero derivative at waypoint boundaries (continuous gradient)
Predictable and reproducible training dynamics

Mathematical Detail:

For a segment between two waypoints at positions ( p_1 ) and ( p_2 ) with LR values ( lr_1 ) and ( lr_2 ):

[ lr(t) = lr_1 + (lr_2 - lr_1) \cdot \frac{1 - \cos(\pi \cdot \frac{t - p_1}{p_2 - p_1})}{2} ]

This produces a smooth S-curve that:

Starts at ( lr_1 ) when ( t = p_1 )
Ends at ( lr_2 ) when ( t = p_2 )
Has zero slope at both endpoints (smooth connection)

Generate Visualizations:

For documentation-ready overview images:

cd examples
python generate_documentation_graphics.py

For multiple example patterns:

python visualize_waypoint_scheduler.py  # Generates 5 different patterns
python visualize_schedulers.py          # Generates plateau examples

Comparison with PyTorch Built-in Schedulers

Feature	CosinePlateauScheduler	CosineWaypointScheduler	CosineAnnealingLR	OneCycleLR
Warm-up	✅ Built-in	✅ Via waypoints	❌	✅
Cosine Decay	✅	✅	✅	✅
LR Increases	❌	✅	❌	Limited
Plateau Steps	✅ Explicit	✅ Via waypoints	❌	❌
Waypoint Control	❌	✅	❌	❌
Mixed Ratio/Absolute	❌	✅	❌	❌
Min LR Control	✅	✅ Via waypoints	✅	Limited
Independent Segments	✅	✅	❌	❌
Performance	✅ (~0.5μs)	✅ (~0.0024ms)	✅	✅
Unified API	❌	✅	❌	❌

Examples

See the examples directory for:

Basic usage examples
Visualization tools
Integration with training loops
Checkpoint/resume patterns

Generate visualizations:

# Cosine Plateau Scheduler
python examples/visualize_schedulers.py

# Cosine Waypoint Scheduler
python examples/visualize_waypoint_scheduler.py

Testing

Run the test suite:

pip install -e ".[dev]"
pytest tests/

Future Schedulers

This package is actively developed with plans to add more schedulers:

Custom cyclic schedulers
Adaptive schedulers based on loss/metrics
Multi-phase training schedulers
And more...

Suggestions and contributions are welcome!

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. Whether it's:

Adding new schedulers
Improving documentation
Reporting bugs
Suggesting features

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this package in your research, please cite:

@software{k_pytorch_schedulers,
  author = {Koronos},
  title = {K-Pytorch-Schedulers: A Collection of Custom PyTorch Learning Rate Schedulers},
  year = {2025},
  url = {https://github.com/Koronos/K-Pytorch-Schedulers}
}

Support

For issues, questions, or suggestions, please open an issue on GitHub.

Acknowledgments

Inspired by various successful training strategies in deep learning research and community feedback.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
examples		examples
src/k_pytorch_schedulers		src/k_pytorch_schedulers
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

K-pytorch-schedulers

Overview

Requirements

Installation

From PyPI (when published)

Directly from GitHub

From source (for development)

Development installation with test dependencies

Available Schedulers

1. Cosine Plateau Scheduler

2. Cosine Waypoint Scheduler

Comparison with PyTorch Built-in Schedulers

Examples

Testing

Future Schedulers

Contributing

License

Citation

Support

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

K-pytorch-schedulers

Overview

Requirements

Installation

From PyPI (when published)

Directly from GitHub

From source (for development)

Development installation with test dependencies

Available Schedulers

1. Cosine Plateau Scheduler

2. Cosine Waypoint Scheduler

Comparison with PyTorch Built-in Schedulers

Examples

Testing

Future Schedulers

Contributing

License

Citation

Support

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages