This guide defines how to contribute changes to sof while keeping architecture, reliability, and release standards intact.
This repository is a Rust workspace with the main crate at crates/sof-observer (published as sof).
Primary references:
- Project overview and run commands:
README.md - Architecture index:
docs/architecture/README.md - Operations docs index:
docs/operations/README.md
- Rust stable toolchain
cargo-make(used by contributor quality gates)
Quick smoke check:
cargo check -p sof- Make focused changes in the owning slice/module.
- Add or update tests with the behavior change.
- Run contributor quality gates locally.
- Open a PR with a clear summary, risk notes, and test evidence.
- Commits are brief and atomic: one logical change, easy to scan.
- Pull requests carry the full context: why, scope, testing, and impact.
Use one of:
<type>: <subject><type>(<scope>): <subject>(optional scope, e.g.app,repair,ci,docs)
Examples:
feat: add repair peer ranking telemetryfix: prevent duplicate repair request reservationchore(ci): tighten release preflight checksdocs(operations): clarify SOF_TVU_SOCKETS defaultsrefactor(app): extract gossip switch guard logic
Supported types:
docsdocumentationfeatnew featurefixbug fixrefactorcode restructuringtesttest changeschorebuild, CI, dependencies, toolingperfperformance changes
Subject rules:
- Imperative mood (
add,fix,refactor) - Lowercase start (except proper nouns)
- No trailing period
- Aim for <= 50 characters
- Be specific, not generic
Key rule:
- Keep commit titles short and scannable.
- Put detailed rationale and impact in the PR description, not in commit titles.
Use:
<type>: <subject> — <impact or scope>- Optional scope is allowed:
<type>(<scope>): <subject> — <impact or scope>
Examples:
feat(app): add gossip runtime switch guardrails — reduces false-positive failoversfix(repair): avoid stale outstanding reservation reuse — improves recovery reliabilitydocs: clarify advanced env defaults — reduces operator misconfiguration
PRs must include:
- Description: what changed and why it matters.
- Changes: concrete list of file/module changes.
- Motivation: business and technical reasons; alternatives considered if relevant.
- Scope and impact: affected slices, compatibility, operational impact.
- Testing: what was run and what scenarios were validated.
- Related docs/issues: ADR/ARD links, operational doc updates, issue references.
For cross-slice or architecture-impacting changes:
- Explicitly list affected slices (
ingest,shred,reassembly,app/runtime). - Explain cross-slice interaction changes.
- Link relevant ADR/ARD docs.
- Include migration/rollback notes if applicable.
Reference template:
.github/pull_request_template.md
Run:
cargo make ciThis includes:
- formatting check
- docs build (
docs/gitbook) - architecture boundary checks
- vendored
sof-solana-gossiplibrary/bin compile check - clippy matrix (
all-features,no-default-features) - test matrix (default + all-features)
For dependency-policy checks too:
cargo make ci-fullFollow the current ARD/ADR constraints:
- Keep slices isolated:
ingest,shred,reassemblydo not import each other internals. - Cross-slice orchestration belongs in infra/app/runtime composition layers.
- Keep
mod.rsfiles declaration/re-export only. - Prefer type-driven modeling (newtypes and validated constructors for invariants).
- Use typed enum errors (
thiserror) instead of stringly error categories. - Replace semantic magic numbers with documented named constants.
Enforcement:
cargo make arch-checkvalidates slice boundaries andmod.rspolicy.
Reference docs:
docs/architecture/ard/0001-project-structure-and-code-goals.mddocs/architecture/ard/0003-slice-dependency-contracts.mddocs/architecture/ard/0004-error-taxonomy-and-failure-handling.mddocs/architecture/ard/0005-type-system-and-newtype-guidelines.mddocs/architecture/ard/0007-infrastructure-composition-and-runtime-model.md
- For behavior changes, update or add tests that fail before the fix and pass after.
- Every bug fix should include a regression test.
- Keep tests deterministic and fast.
- If you touch parser/reassembly invariants, expand edge-case coverage (including fuzz-oriented cases where applicable).
Runtime hardening workflow for public-host changes:
- Use
cargo make vps-observer-restart-loopwhen changing shutdown, restart, gossip handoff, or queue/backpressure behavior. - Use
cargo make vps-derived-state-restart-checkwhen changing derived-state checkpoint, replay, or shutdown behavior. - Use
cargo make vps-derived-state-crash-recovery-checkwhen changing derived-state crash semantics, replay recovery, or checkpoint durability expectations. - These commands now run through the Rust
public_host_soakharness and are intended to be executed directly on the target host from a cloned SOF repo. - The harness builds the required release examples locally on that host, stores logs under
logs/soak-validation/, stores derived-state working data underdemo-state/, and serializes runs with.soak-lock/. - Include the validated host class and the observed drop/replay results in the PR notes when these scripts are part of validation.
Benchmark workflow for hot-path changes:
- Use
cargo bench -p sof --bench hot_paths --no-runas the fast compile-only smoke check for the Criterion harness. - Run
cargo bench -p sof --bench hot_pathswhen changing relay, reassembly, or derived-state dispatch hot paths. - When local host noise is too high, validate the same bench binary on the reference VPS and include the command and host class in the PR notes.
- Treat benchmark deltas as evidence, not proof by themselves: explain the workload shape and any limits of the measurement.
Fuzzing workflow:
- Fuzz targets and corpora are in
crates/sof-observer/fuzz/. - Run bounded fuzz smoke locally with
cargo make fuzz-smoke. - For deeper campaigns, run
cd crates/sof-observer && cargo +nightly fuzz run <target>. - When fuzz finds a crash, reproduce it from
artifacts/<target>/crash-*, then add:- a deterministic regression test in the owning module, and
- a minimized corpus seed in
crates/sof-observer/fuzz/corpus/<target>/.
- For UDP receive-strategy changes, run
cargo make vps-busy-poll-compareand include the 5-minute VPS queue/freshness deltas in the PR.
Reference:
docs/architecture/ard/0002-testing-strategy-and-quality-gates.md
For operationally significant changes:
- Add/update structured logs, metrics, and traces with stable fields.
- Keep telemetry lightweight on hot paths.
- Update operational docs/runbooks when behavior or config changes.
References:
docs/architecture/ard/0008-observability-and-operability-standards.mddocs/operations/README.mddocs/operations/advanced-env.md
Add an ADR when changing architecture-level constraints, including:
- slice boundaries/dependency direction
- infra orchestration patterns
- error model/type-system/dispatch strategy
- material latency/throughput tradeoffs
Reference:
docs/architecture/ard/0009-adr-process-and-governance.md
- PRs and pushes to
mainrun CI via.github/workflows/ci.yml. - Release checks and publish are handled by
.github/workflows/release-crates.yml. - Manual release preflight is available via
workflow_dispatch(runs checks, does not publish).
Before requesting review, confirm:
-
cargo make cipasses locally - tests cover new behavior and regressions
- architecture boundaries are preserved
- docs are updated if behavior/config/operations changed
- ADR added/updated when required