Is your feature request related to a problem?
Yes. The dashboard currently interacts directly with the GitHub API on-the-fly. To prevent hitting strict hourly API Rate Limits and causing application timeouts, we are forced to slice data fetches to the 100 most recent records.
Because of this necessary protective constraint, our analytical modules (like tab_trends) cannot visualize historical project progression past a few months (e.g., prior to April 2026 for highly active repositories). Furthermore, as traffic scales, simultaneous users will deplete token limits almost instantly.
Describe the solution you'd like
We should transition our data layer from an On-Demand API Fetching pattern to a persistent Sync & Serve Architecture using an embedded SQLite database (cache.db).
This architecture decouples the Streamlit user interface from the live GitHub endpoints entirely:
- Local Persistent Storage: Introduce a lightweight SQLite database file embedded right into the project directory structure.
- Background Sync Mechanism: Write an incremental update utility that fetches data from GitHub's endpoints periodically (e.g., updates since the last recorded sync timestamps) and upserts them into our local tables.
- Decoupled Client Reading Layer: Refactor
src/github_client.py to stop using direct web requests during UI execution. Instead, it will use Pandas' native pd.read_sql() functionality to serve structured DataFrames directly from local disk files in milliseconds.
+------------+ Sync Worker +------------+ Fast Query +-----------+
| GitHub API | ====================> | Database | ====================> | Streamlit |
| (Source) | (Incremental Pull) | (SQLite) | (Reads Local) | (UI) |
+------------+ +------------+ +-----------+
Advantages of this approach
- Infinite Retention: We can safely scrape and store thousands of historical issues going back to day one without impacting dashboard loading performance or losing historical context.
- Instant Dashboard Loading Times: Querying a local SQLite database takes fractions of a millisecond, eliminating API network roundtrip latency entirely.
- Complete Rate Limit Immunity: If hundreds of users browse the dashboard at once, the app will make zero calls to GitHub, keeping our
GITHUB_TOKEN allocations safe.
Additional context
SQLite requires zero infrastructure setup or environment configurations since it is included out-of-the-box in the Python standard library (import sqlite3). Pandas supports it natively via df.to_sql().
We should schedule this implementation immediately after merging our ongoing frontend plotting features to unlock full-history tracking capability safely.
Is your feature request related to a problem?
Yes. The dashboard currently interacts directly with the GitHub API on-the-fly. To prevent hitting strict hourly API Rate Limits and causing application timeouts, we are forced to slice data fetches to the 100 most recent records.
Because of this necessary protective constraint, our analytical modules (like
tab_trends) cannot visualize historical project progression past a few months (e.g., prior to April 2026 for highly active repositories). Furthermore, as traffic scales, simultaneous users will deplete token limits almost instantly.Describe the solution you'd like
We should transition our data layer from an On-Demand API Fetching pattern to a persistent Sync & Serve Architecture using an embedded SQLite database (
cache.db).This architecture decouples the Streamlit user interface from the live GitHub endpoints entirely:
src/github_client.pyto stop using direct web requests during UI execution. Instead, it will use Pandas' nativepd.read_sql()functionality to serve structured DataFrames directly from local disk files in milliseconds.Advantages of this approach
GITHUB_TOKENallocations safe.Additional context
SQLite requires zero infrastructure setup or environment configurations since it is included out-of-the-box in the Python standard library (
import sqlite3). Pandas supports it natively viadf.to_sql().We should schedule this implementation immediately after merging our ongoing frontend plotting features to unlock full-history tracking capability safely.