NicolasLM

nicolas-lm is a small research-oriented language modeling lab. It focuses on character-level autoregressive models and on the tools needed to train, evaluate, and analyze them in a controlled setting.

The current codebase includes:

a character tokenizer and corpus utilities;
a bigram language model;
a decoder-only Transformer;
a LLaMA-style decoder with RMSNorm, RoPE, and SwiGLU;
corpus statistics and evaluation scripts;
experiment helpers for corpus building, training, sampling, and result summarization.

Installation

The project targets Python 3.10+.

python -m pip install -r requirements.txt
python -m pip install -e .

Project layout

src/nicolasm/        Core library code
scripts/             Training, evaluation, and analysis entry points
tests/               Unit tests for tokenizer, models, and metrics
configs/             Model and experiment configuration files
data/                Local corpora, tokenizers, and PDF references
experiments/         Run artifacts, plots, and reports
docs/                Project documentation and reading notes

Typical workflow

Build corpora and tokenizers:

PYTHONPATH=src python scripts/build_corpora.py
PYTHONPATH=src python scripts/train.py

Train or evaluate a model:

PYTHONPATH=src python scripts/train.py
PYTHONPATH=src python scripts/evaluate.py

Analyze a corpus or summarize results:

PYTHONPATH=src python scripts/analyze_corpus.py
PYTHONPATH=src python scripts/run_effective_tokens.py
PYTHONPATH=src python scripts/summarize_results.py

Run the tests:

pytest

Models

The repository currently centers on three language-model families:

BigramLanguageModel: a first-order categorical model over tokens.
TinyTransformerLanguageModel: a causal Transformer decoder.
LLaMAStyleLanguageModel: a Transformer variant with LLaMA-style design choices.

Each model is implemented in src/nicolasm/models/ and built from reusable modules in src/nicolasm/modules/.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
data		data
docs		docs
experiments		experiments
notes		notes
scripts		scripts
src/nicolasm		src/nicolasm
tests		tests
.gitignore		.gitignore
.railwayignore		.railwayignore
LICENSE		LICENSE
Procfile		Procfile
README.md		README.md
paper_overleaf.zip		paper_overleaf.zip
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NicolasLM

Installation

Project layout

Typical workflow

Models

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NicolasLM

Installation

Project layout

Typical workflow

Models

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages