Skip to content

fix(logging): emit valid JSON for messages containing quotes or newlines#597

Merged
bheesham merged 3 commits into
mozilla-iam:masterfrom
VegasNative:vegasnative/fix-json-log-escaping
Jun 17, 2026
Merged

fix(logging): emit valid JSON for messages containing quotes or newlines#597
bheesham merged 3 commits into
mozilla-iam:masterfrom
VegasNative:vegasnative/fix-json-log-escaping

Conversation

@VegasNative

Copy link
Copy Markdown
Contributor

What changed

Replaces the brittle string-template [formatter_json] in dashboard/logging.ini with a small JsonFormatter class (dashboard/jsonlog.py) that uses json.dumps to escape field values. Output shape is unchanged: every record still emits time, level, process_id, message, and a compound name of filename:logger:funcName:lineno.

Also adds a regression test (tests/test_jsonlog.py) asserting that a message containing both quotes and newlines round-trips through json.loads cleanly.

Why

The previous formatter interpolated %(message)s directly into a JSON-shaped template string. Any log message containing a " or a \n produced invalid JSON. This is not hypothetical: dashboard/app.py (the global exception handler around handle_exception) joins tracebacks into the message field with \n, so under the old formatter every exception log was emitted as malformed JSON. Downstream log consumers (Cloud Logging on Cloud Run, anything else parsing these as JSON) silently drop or text-fallback those records.

Using json.dumps makes the bug class disappear: every field is escaped at serialization time.

Why not python-json-logger

I considered pulling in python-json-logger, but adding a runtime dependency to a critical production system isn't worth it for ~14 lines of stdlib code. The historical JSON shape (custom keys like process_id, the compound name field) would have required a subclass to preserve regardless, so the dep would not have eliminated the code.

Blast radius

Output shape is preserved, so any consumer that worked before continues to work. The one behavioral change worth flagging: parsers that were silently dropping malformed records (every exception log) will now successfully parse them. Expect the visible volume of structured-log records to increase after this deploys, particularly in the error/warning tiers. Anyone watching log-volume dashboards or alerts on this service should be aware that the bump is the bug being fixed, not a regression.

No public-facing routes, no auth flow, no apps.yml interaction.

Cross-repo coordination

None.

Test plan

  • tests/test_jsonlog.py — new test asserts a LogRecord whose message contains both " and \n produces output that round-trips through json.loads. Regression guard for the specific bug.
  • tox passes locally (lint + types + tests + style).
  • Manual sanity: instantiated JsonFormatter, fed it a LogRecord shaped like what app.py's exception handler produces (a string concatenated with a traceback containing newlines), confirmed valid JSON output and the historical field shape.

Caveats

  • Tracebacks are still embedded in the message field as a \n-joined string, same as before. Splitting them into a separate exc_info field would be a behavior change worth doing later but is out of scope here.
  • Source-location fields (filename, funcName, lineno, logger name) are still collapsed into the single compound name field. Cloud Logging would prefer them as separate top-level keys for easier filtering, but again, scope creep — a separate PR if anyone wants it.

The string-template formatter in dashboard/logging.ini interpolated
%(message)s raw into a JSON-shaped string, producing invalid JSON
whenever a message contained a quote or newline. In practice this
breaks every exception log, since dashboard/app.py joins tracebacks
into the message with newlines.

Replace the template with a JsonFormatter class that uses json.dumps
to serialize. Output shape is preserved.
@VegasNative VegasNative requested a review from bheesham May 21, 2026 22:48

@bheesham bheesham left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that this removes a hack. There are some cases it doesn't handle, but we don't make use of any of those.

Comment thread dashboard/jsonlog.py Outdated
Calls to logger.exception(...) or logger.error(..., exc_info=True)
attach a traceback to record.exc_info. The first JsonFormatter
commit dropped that field, so although the JSON was valid the
traceback was lost from the output. Affected callers in this repo:
yaml_loader, vanity, tile, user, and app's TokenError handler.

Surface the traceback as a separate exc_info field (and stack_info
when present), serialized via json.dumps so newlines and quotes
remain escaped.
@VegasNative

Copy link
Copy Markdown
Contributor Author

Bhee — you were right. Caught it: 7 callsites in this repo use logger.exception(...), which populates record.exc_info. The original commit on this branch dropped that field, so we'd have fixed JSON validity but lost tracebacks in the output.

Pushed a follow-up commit (c23d791) that surfaces the traceback as a separate exc_info field (and stack_info when set), still through json.dumps so escaping is preserved. New regression test covers it.

Updated test plan:

  • test_format_produces_valid_json_with_quotes_and_newlines — original bug
  • test_format_includes_exc_info_when_record_has_traceback — this bug

Affected callsites in this repo (all now keep their tracebacks in the output):

  • dashboard/op/yaml_loader.py (YAML load failures)
  • dashboard/app.py (TokenError handler)
  • dashboard/vanity.py (route registration errors)
  • dashboard/models/tile.py (3 sites — etag, config fetch, config load)
  • dashboard/models/user.py (userinfo fallback path)

Cross-org search: no other mozilla-iam or mozilla-it repo copies this [formatter_json] template or parses these log fields, so no downstream impact to coordinate.

@VegasNative VegasNative requested a review from bheesham May 22, 2026 00:48
Two asymmetries flagged in review:

- The implementation handled record.stack_info but no caller in this repo
  produces it (and nothing in the runtime deps or other mozilla-iam /
  mozilla-it Python code does either). Drop the branch; if a stack_info
  caller ever appears, a test can land alongside it.
- Test 1 passed args=() while asserting against a literal message string,
  so the args parameter was dead weight. Use a %s-formatted msg with real
  args so the test actually covers record.getMessage()'s % interpolation
  alongside the JSON escaping.
@VegasNative

VegasNative commented May 22, 2026

Copy link
Copy Markdown
Contributor Author

Both asymmetries addressed in 1644ab7:

  • stack_info dropped from the impl. Before doing this I checked three places for any caller: nothing in this repo, nothing in the runtime deps (Flask, gunicorn, gevent, redis, flask_pyoidc, urllib3, requests, josepy, etc.), nothing in the rest of mozilla-iam Python code. So removing it doesn't lose any in-use diagnostic path. If a stack_info=True caller ever shows up, the handling + test can land with it.
  • Args path now actually exercised by test 1. Switched from args=() with a literal message to a %s-formatted msg with real args, so the test now covers record.getMessage()'s % interpolation alongside the JSON escaping.

@VegasNative VegasNative requested a review from bheesham May 22, 2026 02:14
@bheesham bheesham merged commit 63b2abe into mozilla-iam:master Jun 17, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants