OpenJobsEU is an open-source, compliance-first project focused on aggregating legally accessible, EU-wide remote job offers.
The project is backend-first and infrastructure-oriented. It leverages a modern Serverless stack on Google Cloud to provide a zero-maintenance, zero-compute public feed, while the FastAPI runtime itself stays private behind Cloud Run IAM.
- Compliance First: Deterministic policy engine grading jobs by remote purity and EU geo-restrictions.
- Zero-Compute Public Feed: The public frontend is 100% static.
feed.jsonis refreshed by the runtime maintenance pipeline, whilefrontend/index.html,frontend/style.css, andfrontend/feed.jsare published separately by CI after a successful production deploy. Both are served from Google Cloud Storage/CDN, and runtime endpoints such as/jobs,/companies, and/jobs/stats/compliance-7dremain private Cloud Run interfaces in production rather than public internet APIs. - Modular Monolith: Cleanly separated domains (Ingestion, Compliance, Operations) within a single Python FastAPI application.
- Robust Async Processing: Leverages Google Cloud Tasks and Cloud Scheduler for time-budgeted, idempotent, and heavily retried worker execution.
- Strict Security: Endpoints split between UI (Session-based via Google OAuth) and M2M routes (OIDC tokens with strict Audience validation). For local development (
APP_RUNTIME=local), the system falls back to dummy placeholders to ensure low friction. - High Performance Data: Scalable PostgreSQL database design with GIN Trigram indexing for fuzzy search and
GROUPING SETSfor real-time audit aggregations. - Current Database Runtime: Both
devandprodnow run inDB_MODE=standardagainst Aiven PostgreSQL viaDATABASE_URL.
Detailed documentation detailing the design decisions and data flows is located in the docs/ directory:
- System Architecture
- Aiven PostgreSQL Migration Notes
- Looker Studio Audit Dashboard
- System Map
- Canonical Model
- Compliance & Data Usage
- Job Lifecycle
- Data Sources
- Roadmap
Note: OpenJobsEU does not engage in scraping closed/protected platforms, nor does it automate applications.
- Merge gatekeeper:
ci.ymlruns the fullpytestsuite for pull requests targetingmainanddevelop. This is the required status check that should block merges until green. - Additional PR quality checks:
pre-commit.ymlcontinues to validate pre-commit hooks and Commitizen commit-message compliance on pull requests. - Infra PRs:
terraform-plan.ymlrunsterraform validateandterraform planon pull requests that touchinfra/**. - Deploy only:
dev_flow.ymldeploys after pushes todevelop(typically after merge), andprod_flow.ymlhandles release/deploy steps after pushes tomain. Neither workflow runs on pull requests. - No duplicated full flow: feature-branch pushes do not trigger the deploy workflows, while PRs trigger only
ci.yml/pre-commit.yml. The deploy workflows run only after the merge commit lands ondevelopormain. - Branch-maintenance automation:
sync_main_to_develop.ymlkeepsdevelopaligned with release changes merged tomain, andprotect_develop.ymlrecreatesdevelopif the branch is deleted. - Branch protection: configure GitHub branch protection for both
mainanddevelopso the required status check includesCI / pytestbefore merge.
- Trigger: regular maintenance/runtime pipeline ticks.
- Publisher:
app/workers/frontend_exporter.pyexecuted from the private backend pipeline. - Artifact: only
feed.json. - Cadence: frequent, operational refreshes as jobs change.
- Cache policy:
Cache-Control: public, max-age=300.
- Trigger: GitHub Actions
prod_flow.yml, only after the production deploy job finishes successfully. - Publisher:
scripts/publish_frontend_assets.py, which reusesrun_frontend_export(..., sync_assets=True)fromapp/workers/frontend_exporter.py. - Artifacts:
frontend/index.html,frontend/style.css,frontend/feed.js. - Cadence: only on releases / explicit production deploys, not on each runtime tick.
frontend/index.htmlis published by CI/CD and remains the release-controlled entrypoint.frontend/style.cssandfrontend/feed.jsare also published by CI/CD, not by the runtime worker.- During publish, CI injects
?v=<release tag or commit SHA>into theindex.htmlreferences tostyle.cssandfeed.js, which provides simple cache busting for frontend changes without coupling asset deploys tofeed.jsonrefreshes. - Runtime IAM can stay limited to
feed.json, while asset publication can use a separate deploy credential.
The repository now assumes OIDC-based federation between GitHub Actions and Google Cloud instead of long-lived JSON keys.
- GitHub Actions requests a short-lived OIDC token from
token.actions.githubusercontent.com. - GCP Workload Identity Federation verifies the token against a dedicated provider in each project (
dev-openjobseuandopenjobseu). - The provider trusts only this repository:
aergaroth/openjobseu. devtrust accepts only:push/workflow_dispatchruns fromrefs/heads/developpull_requestruns whosebase_refisdevelop
prodtrust accepts only:push/workflow_dispatchruns fromrefs/heads/mainpull_requestruns whosebase_refismain
- Each workflow step then impersonates a dedicated Google service account with the minimum role set for its purpose.
github-deploy(dev/prod): builds and pushes the container image, runsterraform apply, and therefore needs Artifact Registry write access, Terraform state bucket object access, Cloud Run/Scheduler/Tasks admin, Secret Manager admin, project IAM admin, plusiam.serviceAccountUseron the runtime and scheduler identities.github-terraform-plan(dev/prod): used only by PR plans; it gets read-only access to the Terraform state bucket and project metadata (roles/viewer+roles/secretmanager.viewer).github-assets-publish(prod): used only after a successful production deploy to publishfrontend/index.html,frontend/style.css, andfrontend/feed.js; it only receivesroles/storage.objectAdminon the public bucket.- The Cloud Run runtime account remains separate and still owns only runtime responsibilities (for example the conditional write access to
feed.json).
Create/update GitHub repository variables (not secrets) with the Terraform outputs from each environment:
GCP_WIF_PROVIDER_DEVGCP_SERVICE_ACCOUNT_DEVGCP_SERVICE_ACCOUNT_TERRAFORM_PLAN_DEVGCP_WIF_PROVIDER_PRODGCP_SERVICE_ACCOUNT_PRODGCP_SERVICE_ACCOUNT_TERRAFORM_PLAN_PRODGCP_SERVICE_ACCOUNT_ASSETS_PROD
The remaining application secrets (DEV_GOOGLE_CLIENT_SECRET, PROD_GOOGLE_API_KEY, etc.) stay in GitHub Secrets because they are application data, not cloud authentication material.
- Run
terraform applyininfra/gcp/devandinfra/gcp/prodto create the workload identity pools/providers, service accounts, and IAM bindings. - Copy the Terraform outputs into the GitHub repository variables listed above.
- Trigger
terraform-plan.yml,dev_flow.yml, andprod_flow.ymlonce to confirm OIDC authentication works end to end. - After successful verification, delete the legacy GitHub Secrets
GCP_SA_KEY_DEVandGCP_SA_KEY_PROD.
Most tests expect PostgreSQL. CI starts postgres:16 and uses:
DB_MODE=standardDATABASE_URL=postgresql+psycopg://postgres:postgres@localhost:5432/testdb
Note: The test suite explicitly blocks external HTTP requests to prevent accidental hangs or timeouts. Fast DELETE sweeps are used over TRUNCATE CASCADE for near-instant teardowns.
Local pattern:
# start postgres
docker run --rm --name openjobspg -e POSTGRES_PASSWORD=postgres \
-e POSTGRES_DB=testdb -p 5432:5432 -d postgres:16make check still runs compile and pre-commit --all-files, but when the
local PostgreSQL test database is unavailable it now marks the pytest hook as
Skipped instead of letting it appear as Passed after a fully skipped test
session.