This repository contains modular Terraform that stands up a small Azure footprint tailored for Azure for Students subscriptions: one Linux VM that runs your containerized workload and a managed PostgreSQL Flexible Server, with an Azure Key Vault for application secrets, Application Insights / Log Analytics for observability, and automation to keep costs predictable. The accompanying GitHub Actions pipeline deploys it secretlessly (Azure login via OpenID Connect, no long-lived credentials) and the VM reads its runtime configuration from Key Vault using its own managed identity.
Everything lives in a single resource group whose name is derived from environment_name (normalized and truncated to 45 characters). The Terraform is split into reusable modules under modules/:
| Module | Resources |
|---|---|
| network | /16 virtual network with one VM subnet, NSG (HTTP/HTTPS open, SSH limited to allowed_admin_cidrs), a static Standard-SKU public IP, and the VM NIC. |
| compute | Ubuntu 22.04 LTS VM with a 64 GB Premium SSD OS disk, SSH-key auth only, and a system-assigned managed identity. |
| database | PostgreSQL Flexible Server (Burstable B1ms, 32 GB, auto-grow off by default to stay in the free tier) and the default postgres database. The endpoint is public but firewalled to the VM's static public IP only. |
| storage | Optional Azure Storage Account for blobs (blob_storage_enabled), Standard LRS, TLS 1.2 only, public access disabled, with blob versioning and 7-day soft delete for recoverability. |
| automation | Azure Automation account + runbooks, created only when at least one automation feature is enabled (VM start/stop schedules, ad-hoc snapshots, snapshot cleanup, on-demand PostgreSQL backups). |
| keyvault | Azure Key Vault holding the application secrets (see Secret management below), with access policies for the pipeline (read/write) and the VM identity (read). |
| monitoring | Log Analytics workspace (with a daily ingestion cap to keep costs modest), workspace-based Application Insights, diagnostic settings that route PostgreSQL and Key Vault logs/metrics to the workspace, and a free observability workbook (requests / exceptions / traces over KQL). |
The root module wires the modules together, owns the resource group, stores the Application Insights connection string in Key Vault, and exposes connection details (SSH command, VM IP, database FQDN/connection string, storage account name, Key Vault name) as outputs.
- The VM exposes only HTTP/HTTPS publicly; SSH is restricted to
allowed_admin_cidrsand the pipeline opens a temporary, run-scoped SSH rule that is always removed afterwards. - PostgreSQL keeps a public endpoint but is firewalled to a single IP — the VM's static public IP. The default
vm_public_ip_static = trueguarantees that IP is stable across VM stop/start, so the allow-list rule stays valid. (A full private endpoint / VNet-integrated server is intentionally not used: on a Flexible Server the networking model is fixed at creation and switching it is a destructive, data-migrating operation.)
The pipeline authenticates to Azure through workload identity federation (OpenID Connect) — there is no Service Principal secret stored anywhere:
azure/loginexchanges a short-lived GitHub OIDC token for an Azure access token, using theAZURE_CLIENT_ID,AZURE_TENANT_ID, andAZURE_SUBSCRIPTION_IDrepository variables (these are identifiers, not secrets).- The deployment job runs in the
productionGitHub environment, so the OIDC token's subject matches a federated credential registered on the Azure app registration.
Application secrets live in Azure Key Vault rather than in the application repository:
- The pipeline assembles the full application environment (static base + the live database connection and, when enabled, the storage account credentials) and publishes it to Key Vault as the
app-envsecret. The static base is seeded once fromAPP_ENV_VARS_B64and thereafter stored asapp-env-base. - The database connection string and the storage account name/key are also stored as individual Key Vault secrets.
- At deploy time the VM fetches
app-envfrom Key Vault using its managed identity (via the instance metadata service) and writesapp.env; the container then starts with--env-file. No application secret is copied over SSH.
Telemetry and resource logs land in a single Log Analytics workspace, with cost kept predictable by design:
- Application Insights is workspace-based; its connection string is stored in Key Vault (
appinsights-connection-string) and injected into the application environment, so the app reports requests, dependencies, exceptions, and traces. - Diagnostic settings forward PostgreSQL and Key Vault logs/metrics into the same workspace.
- A free Application Insights workbook (
<prefix> observability) charts requests, top exceptions, and recent traces with ready-made KQL queries. - A daily ingestion cap keeps the bill modest:
log_max_total_gb(default3) is enforced asdaily_quota_gb = log_max_total_gb / retention_days. Set it to-1to disable the cap. (Azure's minimum workspace retention is 30 days, so the cap limits ingestion rate rather than deleting old data row by row.)
Terraform state is stored remotely in Azure Storage so that it survives ephemeral CI runners, is shared across machines, and is protected against concurrent writes (the azurerm backend takes a blob lease, which stops two terraform apply runs from corrupting the state at the same time).
The backend is bootstrapped automatically by the pipeline — no manual setup, no extra file or secret:
versions.tfdeclares an emptybackend "azurerm" {}; the concrete settings are injected atinittime via-backend-config.- On every run the workflow ensures the backing resources exist (create-if-missing, idempotent):
- Resource group
tfstate-rg - Storage account named
<environment_name><suffix>, where the suffix is derived deterministically from the subscription ID (globally unique yet stable across runs) - Container
tfstate, state stored under the keydocker2azure.tfstate
- Resource group
- The state storage account has blob versioning and soft delete (7 days) enabled, so a bad apply can be recovered.
To work against the same remote state locally, initialise with the matching backend settings:
terraform init \
-backend-config=resource_group_name=tfstate-rg \
-backend-config=storage_account_name=<the-account-name> \
-backend-config=container_name=tfstate \
-backend-config=key=docker2azure.tfstate| Feature | Variables | What it does |
|---|---|---|
| VM daily schedule | vm_schedule_enabled, vm_schedule_start_time, vm_schedule_stop_time, vm_schedule_timezone |
Automation runbooks + schedules that start/stop the VM daily to save credits. |
| Manual VM snapshot | vm_snapshot_runbook_enabled |
Deploys the *-snapshot runbook for on-demand OS-disk snapshots. |
| Snapshot cleanup | vm_snapshot_cleanup_enabled, vm_snapshot_retention_days, vm_snapshot_cleanup_time, vm_snapshot_cleanup_timezone |
Scheduled runbook that deletes snapshots older than the retention window. |
| PostgreSQL on-demand backup | db_backup_enabled, db_backup_time, db_backup_timezone |
Runbook + schedule that calls the Flexible Server REST API for an extra daily backup. |
Set the boolean flags to false when you do not need a capability; Terraform skips the related Automation modules, runbooks, schedules, and job bindings.
You do not need Terraform or the Azure CLI installed to use this — the pipeline provisions the remote state, the infrastructure, and the application end to end. The only requirements are:
- An Azure subscription with the deployment identity configured (OIDC federated credentials) and the repository's deployment variables/secrets set.
- An SSH public key (ed25519 or RSA), which becomes the only authentication method for the VM.
Everything else (state bootstrap, resource creation, secret distribution, container deploy) happens automatically on each run.
The pipeline already does all of this; you only need the steps below if you want to drive Terraform yourself. They require Terraform >= 1.5 and an az login session.
# 1) provide values
cp terraform.tfvars.example terraform.tfvars # then edit: environment_name, location,
# admin_ssh_public_key, db_admin_password, ...
# 2) initialise against the shared remote state (see "Remote state" above) and apply
terraform init \
-backend-config=resource_group_name=tfstate-rg \
-backend-config=storage_account_name=<the-account-name> \
-backend-config=container_name=tfstate \
-backend-config=key=docker2azure.tfstate
terraform plan
terraform applyresource_group_name– Scope Azure CLI commands after deployment.vm_public_ip/ssh_connection_string– Connect to the VM.database_fqdn/database_connection_string– Configure your application. The connection string uses TLS (sslmode=require).storage_account_name– Available only whenblob_storage_enabled = true.key_vault_name– The Key Vault that holds the application secrets (including the Application Insights connection string).
Almost everything is automatic or configuration-driven — there are no manual post-deploy steps:
- VM scheduling, snapshot cleanup, and PostgreSQL backups run on their own once enabled via the automation toggles above.
- Changing the infrastructure (firewall CIDRs, VM size, automation toggles) means editing the Terraform variables; the next deploy reconciles everything, including the matching database firewall rule.
- Operator-initiated actions are taking an on-demand VM snapshot via the
*-snapshotrunbook (when enabled) and triggering a rollback of the container or the infrastructure via the rollback workflow (see Rollback below).
The sync/... branches used by deployment automation are temporary delivery branches, not feature branches. Before any important infrastructure change, update the affected README or .md files (Terraform variables, deployment flow, required secrets, operational runbooks).
Every pull request targeting main runs .github/workflows/pr-validation.yml:
terraform fmt -checkandterraform validatealways run (no cloud credentials required).- When Azure access is configured, it also runs
terraform planagainst the live remote state. - The validation output (and the plan, when produced) is published as a build artifact.
.github/workflows/security-scan.yml runs Trivy on every push and pull request:
- IaC misconfiguration scan of the Terraform (fails the job on
CRITICAL/HIGHfindings; an accepted baseline is documented in.trivyignore). - Secret scan of the working tree.
- Results are also uploaded as SARIF to the GitHub Security tab where Advanced Security is available.
.github/workflows/deploy-from-sync.yml runs on a short-lived sync/... branch that carries a sync-bundle/ directory with the application artifacts and Dockerfile. The job:
- Logs in to Azure via OIDC and ensures the remote state backend exists.
- On a brand-new environment, adopts any pre-existing Azure resources into state; on a populated state this step is skipped.
- Runs
terraform plan -out=tfplan, publishes the plan as an artifact, and applies exactly that plan. - Builds and pushes the container image, publishes the assembled
app-envto Key Vault, and has the VM load it via its managed identity. - Redeploys the container over SSH and always deletes the temporary NSG rule and the
sync/...branch when it finishes. - Records two pointers in Key Vault —
app-image-currentandapp-image-previous— so a rollback always knows the last known-good image (see Rollback below).
.github/workflows/rollback.yml is a manual (workflow_dispatch) workflow for incident response. It has a target (either container or terraform) and an apply switch so you can preview first and apply only after review:
- Container — redeploys a previous, immutable image tag on the VM. The container registry already stores every image; by default the rollback uses
app-image-previousfrom Key Vault (the last known-good tag), or you can pass an explicitimage_tag. The application configuration on the VM is left untouched — only the image is swapped. - Terraform — re-applies the infrastructure code from a known-good
git_ref(SHA, tag or branch) against the live remote state. A saved plan is deliberately not used for rollback because it goes stale as soon as the state changes; the canonical, reproducible description of the infrastructure is the git commit, and the state account's versioning + soft delete are the safety net.
Refer to AUTOMATION.md for the full automation playbook, including required secrets/variables and how the application and infrastructure repositories coordinate.
.
├── main.tf # Root module: resource group + module wiring + Key Vault secrets
├── variables.tf # Input variables with defaults and docs
├── locals.tf # Naming helpers
├── outputs.tf # Connection details for operators and CI
├── moved.tf # State `moved` blocks mapping resources to their module addresses
├── providers.tf / versions.tf # Providers + remote azurerm backend declaration
├── modules/ # network, compute, database, automation, storage, keyvault, monitoring
├── .github/workflows/ # pr-validation.yml, security-scan.yml, deploy-from-sync.yml, rollback.yml
├── scripts/tfvars_meta.py # Utility used by CI to read tfvars metadata
├── .trivyignore # Accepted security-scan baseline
├── terraform.tfvars.example
├── README.md
└── AUTOMATION.md