🧠 Python Code Autocomplete LLM (From Scratch)

A GPT-style decoder-only Transformer trained entirely from scratch for Python code autocompletion.

This project now uses a true causal GPT decoder architecture with KV-cache support, enabling faster incremental generation and more stable autoregressive behavior.

The entire system was built and trained locally on CPU — no external LLM APIs.

🚀 What’s New (Decoder Upgrade)

✅ Replaced TransformerEncoder with true GPT-style decoder blocks ✅ Implemented custom Causal Self-Attention ✅ Added KV-cache for incremental decoding ✅ Resume-safe training (Ctrl+C supported) ✅ Dual inference modes (Autocomplete / Creative) ✅ Cleaned & curated training dataset pipeline

This is now a proper autoregressive language model architecture.

🧩 System Overview

1️⃣ Data Pipeline

Raw repositories collected
Hardened cleaning removes:
- tests
- build files
- compiled artifacts
- duplicate files
Curated alignment patterns added (balanced, non-repetitive)
Train / validation split

2️⃣ Tokenizer

Custom BPE tokenizer (vocab size: 8000)
Trained on processed corpus
Python-aware tokenization

3️⃣ Model Architecture

Decoder-only GPT-style architecture:

Component	Value
Layers	8
Attention Heads	8
Embedding Size	512
Context Length	256
Parameters	~33.5M
Attention	Causal Self-Attention
KV Cache	✅ Supported

Total Parameters: ~33,551,168

⚡ KV-Cache Support

Generation now uses incremental decoding:

First forward pass processes full prompt
Subsequent tokens reuse stored key/value tensors
No full-sequence recomputation

Result:

Faster inference
Lower latency
True GPT-style decoding behavior

🎯 Training Setup

Optimizer: AdamW
Loss: Cross-Entropy
Gradient Accumulation Supported
Resume-safe checkpointing

Training can be interrupted safely:

Ctrl+C

Restarting resumes automatically from the last checkpoint.

📊 Performance Benchmarks

Dataset

~5.1M training tokens
~0.5M validation tokens
Balanced curated alignment

Final Metrics (Decoder Architecture)

Metric	Value
Epochs	2
Final Validation Loss	2.84
Perplexity	17.20

For a 33M parameter CPU-trained model, this is strong stability.

✨ Example Outputs

DFS

Input:

def dfs(graph, node, visited):

Output:

stack = [start]
while stack:
    node = stack.pop()
    if node not in visited:
        visited.add(node)
        stack.extend(graph.get(node, []))

Stack

Input:

class Stack:
    def push(self, item):

Output:

self._items.append(item)

Binary Search

Input:

def binary_search(arr, target):

Output:

lo, hi = 0, len(arr)
while lo < hi:
    mid = (lo + hi) // 2
    if arr[mid] == target:
        return mid
    if arr[mid] < target:
        lo = mid + 1
    else:
        hi = mid
return -1

🧠 Inference Modes

Autocomplete Mode (default)

temperature = 0.2
top_k = 10
Deterministic
Code-focused

Creative Mode

temperature = 0.8
top_k = 50
More diverse
Useful for code generation

Run with:

python inference/run_model.py \
  -c model/checkpoints/latest_checkpoint.pth \
  -p "def dfs(graph, node, visited):" \
  --mode autocomplete

🖥️ CLI Usage (Simple Wrapper)

You can build a simple CLI wrapper:

python codellm.py autocomplete "def binary_search(arr, target):"
python codellm.py creative "Write a Python LRU cache implementation"

📂 Project Structure

AutoComplete-LLm/
│
├── model/
│   ├── ai.py
│   └── checkpoints/
│
├── tokenizer/
│   ├── tokenizer.json
│   └── train_tokenizer.py
│
├── training/
│   ├── dataset.py
│   └── train.py
│
├── inference/
│   └── run_model.py
│
├── tools/
│   ├── hardened_clean.py
│   ├── build_train_file.py
│   ├── evaluate_model.py
│   └── generate_alignment_pack_v3.py
│
├── data/
│   ├── raw/
│   ├── cleaned/
│   └── processed/
│
└── README.md

🎓 What This Project Demonstrates

Full LLM lifecycle from scratch
Decoder-only Transformer implementation
Custom causal attention
KV-cache integration
Dataset curation & alignment engineering
CPU-only training of 33M parameter model
Practical engineering for small-scale LLM systems

📜 License

MIT License

🚀 Status

This model is:

Stable
Usable for Python autocomplete
Structurally aligned
KV-cache enabled
Resume-safe trained

Further scaling would require:

Larger dataset (20–50M tokens)
GPU acceleration
60–120M parameter scale

But at current scale, this is a functional local Python code LLM.

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
inference		inference
model		model
tokenizer		tokenizer
tools		tools
training		training
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Python Code Autocomplete LLM (From Scratch)

🚀 What’s New (Decoder Upgrade)

🧩 System Overview

1️⃣ Data Pipeline

2️⃣ Tokenizer

3️⃣ Model Architecture

⚡ KV-Cache Support

🎯 Training Setup

📊 Performance Benchmarks

Dataset

Final Metrics (Decoder Architecture)

✨ Example Outputs

DFS

Stack

Binary Search

🧠 Inference Modes

Autocomplete Mode (default)

Creative Mode

🖥️ CLI Usage (Simple Wrapper)

📂 Project Structure

🎓 What This Project Demonstrates

📜 License

🚀 Status

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 Python Code Autocomplete LLM (From Scratch)

🚀 What’s New (Decoder Upgrade)

🧩 System Overview

1️⃣ Data Pipeline

2️⃣ Tokenizer

3️⃣ Model Architecture

⚡ KV-Cache Support

🎯 Training Setup

📊 Performance Benchmarks

Dataset

Final Metrics (Decoder Architecture)

✨ Example Outputs

DFS

Stack

Binary Search

🧠 Inference Modes

Autocomplete Mode (default)

Creative Mode

🖥️ CLI Usage (Simple Wrapper)

📂 Project Structure

🎓 What This Project Demonstrates

📜 License

🚀 Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages