GitHub Repository Summarizer

A FastAPI service that accepts a GitHub repository URL and returns an LLM-generated summary: what the project does, what technologies it uses, and how it's structured.

Requirements

Python 3.10+

Setup

1. Clone the repository

git clone <your-repo-url>
cd <repo-folder>

2. Create and activate a virtual environment

python -m venv venv
source venv/bin/activate       # macOS/Linux
venv\Scripts\activate          # Windows

3. Install dependencies

pip install -r requirements.txt

4. Set your API key

export NEBIUS_API_KEY=your_key_here          # macOS/Linux
set NEBIUS_API_KEY=your_key_here             # Windows

Optionally, set a GitHub token to raise the API rate limit from 60 to 5000 requests/hour:

export GITHUB_TOKEN=your_github_token

5. Start the server

uvicorn main:app --reload

The server starts at http://localhost:8000.

Usage

curl -X POST http://localhost:8000/summarize \
  -H "Content-Type: application/json" \
  -d '{"github_url": "https://github.com/psf/requests"}'

Expected response:

{
  "summary": "...",
  "technologies": ["Python", "..."],
  "structure": "..."
}

Submitting

Create a zip of the project (exclude .env and venv/):

zip -r solution.zip . --exclude ".env" --exclude "venv/*" --exclude ".git/*" --exclude "__pycache__/*"

Model

Model used: deepseek-ai/DeepSeek-V3-0324-fast via Nebius Token Factory.

DeepSeek-V3 is the default model on Nebius Token Factory, with strong reasoning ability and reliable structured JSON output. The -fast variant gives lower latency without meaningful quality loss for summarization tasks.

Approach to Repository Content

Context Management: Instead of sending all files, we fetch the full recursive directory tree first. This gives the LLM a structural "map" of the project with minimal tokens.

Heuristic Scoring: Files are ranked based on a tier system:

Tier 0–1: README and dependency manifests (highest architectural signal)
Tier 2–3: Entry points and source code
Tier 4–5: Config files and everything else

Within each tier, shallower files rank higher — main.py beats src/api/v2/main.py.

Smart Filtering: Binaries, lock files, generated/minified files, and noisy directories (node_modules, venv, .git) are excluded entirely before any ranking happens.

Concurrency: Selected files are fetched in parallel using an async semaphore (max 5 at a time) — faster than sequential fetching while staying within GitHub's rate limits.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.gitignore		.gitignore
README.md		README.md
config.py		config.py
filters.py		filters.py
github.py		github.py
llm.py		llm.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GitHub Repository Summarizer

Requirements

Setup

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Set your API key

5. Start the server

Usage

Submitting

Model

Approach to Repository Content

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GitHub Repository Summarizer

Requirements

Setup

1. Clone the repository

2. Create and activate a virtual environment

3. Install dependencies

4. Set your API key

5. Start the server

Usage

Submitting

Model

Approach to Repository Content

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages