Skip to content

nivco360/GitHub_Repository_Summarizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GitHub Repository Summarizer

A FastAPI service that accepts a GitHub repository URL and returns an LLM-generated summary: what the project does, what technologies it uses, and how it's structured.

Requirements

  • Python 3.10+

Setup

1. Clone the repository

git clone <your-repo-url>
cd <repo-folder>

2. Create and activate a virtual environment

python -m venv venv
source venv/bin/activate       # macOS/Linux
venv\Scripts\activate          # Windows

3. Install dependencies

pip install -r requirements.txt

4. Set your API key

export NEBIUS_API_KEY=your_key_here          # macOS/Linux
set NEBIUS_API_KEY=your_key_here             # Windows

Optionally, set a GitHub token to raise the API rate limit from 60 to 5000 requests/hour:

export GITHUB_TOKEN=your_github_token

5. Start the server

uvicorn main:app --reload

The server starts at http://localhost:8000.

Usage

curl -X POST http://localhost:8000/summarize \
  -H "Content-Type: application/json" \
  -d '{"github_url": "https://github.com/psf/requests"}'

Expected response:

{
  "summary": "...",
  "technologies": ["Python", "..."],
  "structure": "..."
}

Submitting

Create a zip of the project (exclude .env and venv/):

zip -r solution.zip . --exclude ".env" --exclude "venv/*" --exclude ".git/*" --exclude "__pycache__/*"

Model

Model used: deepseek-ai/DeepSeek-V3-0324-fast via Nebius Token Factory.

DeepSeek-V3 is the default model on Nebius Token Factory, with strong reasoning ability and reliable structured JSON output. The -fast variant gives lower latency without meaningful quality loss for summarization tasks.

Approach to Repository Content

Context Management: Instead of sending all files, we fetch the full recursive directory tree first. This gives the LLM a structural "map" of the project with minimal tokens.

Heuristic Scoring: Files are ranked based on a tier system:

  • Tier 0–1: README and dependency manifests (highest architectural signal)
  • Tier 2–3: Entry points and source code
  • Tier 4–5: Config files and everything else

Within each tier, shallower files rank higher — main.py beats src/api/v2/main.py.

Smart Filtering: Binaries, lock files, generated/minified files, and noisy directories (node_modules, venv, .git) are excluded entirely before any ranking happens.

Concurrency: Selected files are fetched in parallel using an async semaphore (max 5 at a time) — faster than sequential fetching while staying within GitHub's rate limits.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages