Create a KPI dashboard for quality numbers#448
Conversation
a6a2efd to
7380bb8
Compare
| env: | ||
| ANDROID_HOME: "" | ||
| ANDROID_SDK_ROOT: "" | ||
| FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true |
There was a problem hiding this comment.
We should not have things different in the release workflow then in others. So either, we add this everywhere or nowhere.
Can you please also state in the commit message why this change is needed?
There was a problem hiding this comment.
Node 20 will be deprecated next month on GitHub Actions runners, I can add this to the commit message
https://github.blog/changelog/2025-09-19-deprecation-of-node-20-on-github-actions-runners/
| os.system( | ||
| f"{code_ql_path} database analyze -j=0 {database_location} --format=sarifv2.1.0 --output={output_base}/codeql.sarif") | ||
| os.system( | ||
| f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv") |
There was a problem hiding this comment.
We should keep the CSV output for direct human readibility
| f"{code_ql_path} database analyze -j=0 {database_location} --format=csv --output={output_base}/codeql.csv") | ||
|
|
||
| # Analyze: run MISRA/AUTOSAR queries and produce SARIF. | ||
| # --ram: cap at 5 GB to prevent swap thrashing on GitHub runners (7 GB total) |
There was a problem hiding this comment.
We should not make this by default. We should add an extra option for github runners that we then only enable in the CI where these parameters are changed.
| - name: Set conclusion | ||
| id: set-conclusion | ||
| run: | | ||
| if [[ "${{ steps.run-coverage.outcome }}" == "success" ]]; then | ||
| echo "conclusion=success" >> $GITHUB_OUTPUT | ||
| else | ||
| echo "conclusion=failure" >> $GITHUB_OUTPUT | ||
| fi | ||
|
|
There was a problem hiding this comment.
Why is this needed, can we try to remove this again please?
There was a problem hiding this comment.
Because the coverage step has continue-on-error: true meaning if bazel coverage fails, the job doesn't stop, it keeps running. Without the "Set conclusion" step, the caller nightly_quality.yml has no way to know whether coverage actually passed or failed; it only sees the job as success because continue-on-error suppresses the failure. So In continue-on-error hides the failure from GitHub's job status, "Set conclusion" exists to un-hide it for the dashboard. For a nightly quality pipeline, partial data is actually useful, if coverage fails at night, you want something to look at the next morning rather than an empty artifact.
| @@ -0,0 +1,179 @@ | |||
| # ******************************************************************************* | |||
There was a problem hiding this comment.
Can we first just take care of the code coverage please to reduce the scope of the PR.
There was a problem hiding this comment.
during the discussion we were always talking about three jobs not only coverage, and I already implemented that.
if you insist I can add a commit on top to remove them so I can just apply it in reverse on another PR
|
|
||
| # Restore KPI history from the previous gh-pages deployment so the | ||
| # dashboard can show delta badges and trend sparklines across runs. | ||
| - name: Restore KPI history |
There was a problem hiding this comment.
I thought we agreed that we do not want history at the moment?
| # Deploy to GitHub Pages | ||
| # ------------------------------------------------------------------ | ||
| - name: Deploy quality reports to GitHub Pages | ||
| uses: peaceiris/actions-gh-pages@v4 |
There was a problem hiding this comment.
This way it is not integrated into our Sphinx build, maybe you can talk with Jochen about that
There was a problem hiding this comment.
yes that was deploys quality reports directly to gh-pages as a separate, uncoordinated publish. I made nightly_quality.yml upload quality reports as an artifact instead of deploying, then have docs.yml trigger on its completion and deploy everything in one shot.
bff61c4 to
bb988f3
Compare
There was a problem hiding this comment.
Instead of this Jinja Template maybe generate a RST File which we can include in the sphinx build?
There was a problem hiding this comment.
That is technically possible but we will lose the visual dashboard (we agreed during the discussions to have a dashboard), all what we get is a summary tables like what we have here https://github.com/ahmed0mousa/communication/actions/runs/26016646593
| f"{code_ql_path} database analyze" | ||
| f"{_analyze_flags}" | ||
| f" {database_location}" | ||
| f" --format=sarifv2.1.0 --output={output_base}/codeql.sarif", |
There was a problem hiding this comment.
Should we maybe upload the sarif file if we have it already? Could be used for local development
| env: | ||
| DOCS_VERSION: ${{ steps.vars.outputs.version }} | ||
| DOCS_BASE_URL: ${{ steps.vars.outputs.base_url }} | ||
| run: sphinx-build docs/sphinx _sphinx_output |
There was a problem hiding this comment.
Is there a reason why you run without bazel?
There was a problem hiding this comment.
Yes, the sphinx_doc Bazel target is marked testonly = True, meaning it's for CI testing (lint, link-checking), not for producing the deploy artifact. There are also two practical reasons for running sphinx-build directly:
Environment variables: conf.py reads DOCS_VERSION and DOCS_BASE_URL to configure the version switcher navbar. Passing env vars into Bazel's sandbox requires --action_env flags and is fragile in CI.
Output location: Bazel writes HTML under bazel-out/…/sphinx_doc/… making it awkward to reference in the subsequent deploy step. sphinx-build … _sphinx_output puts it exactly where we need it.
The two-phase approach in the workflow is intentional: Bazel handles the complex artifact generation (Doxygen XML, generated RST files), then sphinx-build is called directly for the final HTML output.
| touch _deploy/.nojekyll | ||
|
|
||
| # Generate versions.json for the pydata-sphinx-theme version switcher | ||
| python3 docs/sphinx/utils/update_versions_json.py \ |
There was a problem hiding this comment.
mixing here again python and bazel
There was a problem hiding this comment.
There's no Bazel target for update_versions_json.py, it's a plain utility script with no BUILD rule. So python3 is the correct call here. Also bazel's sandbox blocks all network access by default, so this can only run outside Bazel. It also needs GH_TOKEN injected from the runner environment, which Bazel would strip.
Use Node24 as Node20 will be deprecated
… only - Use subprocess.run instead of os.system for all CodeQL commands so errors are properly captured and logged. - Add --ram 5000 --timeout 20 -j 2 to database analyze to prevent OOM and hung queries on GitHub runners. - Remove CSV output; SARIF is sufficient for the CI quality report.
- Add FORCE_JAVASCRIPT_ACTIONS_TO_NODE24 env var. - Add 'conclusion' output (success/failure) to match the interface of clang_tidy.yml and codeql.yml. - Add id and continue-on-error to the bazel coverage step so the job can report a conclusion even on test failures. - Gate genhtml and archive steps on run-coverage outcome so they are skipped cleanly when coverage fails. - Fix cache-save condition to also fire on scheduled (nightly) runs. - Include raw LCOV .dat file in the artifact so the quality dashboard can read coverage percentages without re-running genhtml.
Both workflows are triggered only via workflow_call from nightly_quality.yml. Each exposes artifact-name and conclusion outputs so the caller can conditionally download reports and build a unified dashboard. clang_tidy.yml: - Runs 'bazel test --config=clang-tidy //...' with continue-on-error. - Collects per-target *.AspectRulesLintClangTidy.out files and generates an HTML summary with error/warning counts and a findings table. codeql.yml: - Runs 'bazel run --config=codeql //quality/static_analysis:codeql_lint'. - Collects SARIF output and generates an HTML summary from it. - Sets a 180-minute job timeout to guard against hung analyses.
Runs every night at midnight UTC (and on workflow_dispatch). Executes coverage, codeql, and clang-tidy in parallel as reusable workflow calls, then deploys all reports plus a unified KPI dashboard to GitHub Pages.
bb988f3 to
3b8e61b
Compare
quality/dashboard/generate_dashboard.py: - Parses CodeQL SARIF files, clang-tidy *.AspectRulesLintClangTidy.out files, and LCOV .dat data into a single Jinja2-rendered HTML page. - Maintains a quality_history.json for KPI trend tracking across runs - Writes a GitHub Actions step summary with markdown KPI tables. quality/dashboard/dashboard.html.j2: - Dark-themed single-page dashboard with tabbed panels for CodeQL, Clang-Tidy and Coverage. - Sortable/filterable findings tables, coverage progress bars, and a run-history table with trend indicators.
3b8e61b to
b7f8297
Compare
Add a nightly CI pipeline that runs three quality jobs in parallel (coverage, CodeQL, and clang-tidy) and publishes all results to GitHub Pages under
latest/quality/. A Jinja2-based dashboard aggregates the findings into a single page with KPI trend tracking across runs. The Sphinx documentation is extended with a dedicated quality reports page and a version switcher navbar, and on every push to main the docs automatically pull the latest nightly KPI numbers so they stay current without waiting for another nightly run.Issue: SWP-262453