chore(schema): add field descriptions for braze_derived#9455
chore(schema): add field descriptions for braze_derived#9455chelseyklein wants to merge 9 commits into
Conversation
There was a problem hiding this comment.
This PR adds description fields to nine schema.yaml files in braze_derived (180 fields total: fxa_win10_users_historical_v1, newsletters_v1, products_v1, subscriptions_map_v1, subscriptions_map_v2, subscriptions_v1, user_profiles_v1, users_v1, waitlists_v1). It is documentation-only — no query.sql, metadata.yaml, or table-shape changes — so it cannot affect downstream consumers' results, only the visible BigQuery field documentation. The user_profiles_v1 diff also strips the previously-existing blank lines between fields as a side effect of the regeneration.
Overall the descriptions are clear, consistent in voice, and read accurately against the source SQL I spot-checked (locale filter in fxa_win10_users_historical_v1, subscription_state mapping in subscriptions_v1, LOWER() on country_code in products_v1, hardcoded NULL first-touch fields in products_v1). Most of my inline notes are about description durability rather than correctness:
- A few descriptions encode point-in-time observations (percentage distributions, "Currently always null...", explicit locale lists) that will drift as the underlying data or query changes. Phrasing them as semantic facts about the column would survive better.
users_v1anduser_profiles_v1describe the sameemail_subscribecolumn with slightly different wording — worth aligning.subscriptions_map_v2got the same descriptions asv1even though it is loaded from a Google Sheet rather than the hardcodedscript.sqlinv1; the field semantics are the same, so this is informational only.- The PR description's own checklist ("Descriptions reviewed for accuracy", "Global.yaml candidates verified", "Contradictions investigated") is unchecked — since the descriptions are agent-generated and the checklist is part of the rollout protocol, an explicit owner sign-off in the PR before merge seems prudent.
Posting as COMMENT; a human reviewer should make the final call on accept/request-changes.
| description: The user's overall email subscription status in Braze. Values are 'opted_in' | ||
| (double opt-in confirmed, ~99%) or 'subscribed' (single opt-in, ~1%). |
There was a problem hiding this comment.
issue: The email_subscribe CASE expression in query.sql can also produce the literal 'unsubscribed' via its ELSE branch (e.g. when emails.double_opt_in IS NULL). The WHERE has_opted_out_of_email = FALSE filter only suppresses the first branch, not the ELSE. The description here claims only 'opted_in' or 'subscribed' exist, which may be true observationally today but is not what the SQL guarantees. Consider either widening the description to allow 'unsubscribed' as a rare third value, or anchoring the percentages with a date so the snapshot context is clear.
| description: The user's overall email subscription status in Braze. Values are 'opted_in' | ||
| (double opt-in confirmed) or 'subscribed' (single opt-in, not yet double-confirmed). |
There was a problem hiding this comment.
nitpick: This email_subscribe description ("...or 'subscribed' (single opt-in, not yet double-confirmed)") differs from the one on the same column in users_v1/schema.yaml ("...~99%... ~1%"). Since user_profiles_v1 pulls this column from users_v1, keeping the wording aligned across the two schemas would avoid the appearance of two different definitions.
| name: first_touch_impression_at | ||
| type: TIMESTAMP | ||
| description: Timestamp of the user's first attribution touch impression for this | ||
| subscription. Currently always null as first-touch attribution is not yet populated. | ||
| - mode: NULLABLE | ||
| name: first_touch_entrypoint | ||
| type: STRING | ||
| description: The FxA entrypoint slug recorded at the user's first attribution | ||
| touch for this subscription. Currently always null as first-touch attribution | ||
| is not yet populated. | ||
| - mode: NULLABLE | ||
| name: first_touch_entrypoint_experiment | ||
| type: STRING | ||
| description: The experiment slug at the user's first attribution touch for this | ||
| subscription. Currently always null as first-touch attribution is not yet populated. | ||
| - mode: NULLABLE | ||
| name: first_touch_entrypoint_variation | ||
| type: STRING | ||
| description: The experiment variation at the user's first attribution touch for | ||
| this subscription. Currently always null as first-touch attribution is not yet | ||
| populated. | ||
| - mode: NULLABLE | ||
| name: first_touch_utm_campaign | ||
| type: STRING | ||
| description: The UTM campaign parameter at the user's first attribution touch | ||
| for this subscription. Currently always null as first-touch attribution is not | ||
| yet populated. | ||
| - mode: NULLABLE | ||
| name: first_touch_utm_content | ||
| type: STRING | ||
| description: The UTM content parameter at the user's first attribution touch for | ||
| this subscription. Currently always null as first-touch attribution is not yet | ||
| populated. | ||
| - mode: NULLABLE | ||
| name: first_touch_utm_medium | ||
| type: STRING | ||
| description: The UTM medium parameter at the user's first attribution touch for | ||
| this subscription. Currently always null as first-touch attribution is not yet | ||
| populated. | ||
| - mode: NULLABLE | ||
| name: first_touch_utm_source | ||
| type: STRING | ||
| description: The UTM source parameter at the user's first attribution touch for | ||
| this subscription. Currently always null as first-touch attribution is not yet | ||
| populated. | ||
| - mode: NULLABLE | ||
| name: first_touch_utm_term | ||
| type: STRING | ||
| description: The UTM term parameter at the user's first attribution touch for | ||
| this subscription. Currently always null as first-touch attribution is not yet | ||
| populated. |
There was a problem hiding this comment.
suggestion: The nine first_touch_* descriptions all end with "Currently always null as first-touch attribution is not yet populated." That matches the SQL today (the query hardcodes CAST(NULL AS ...) for these), but the phrasing will rot silently as soon as the column starts being populated — schema descriptions tend to outlive the conditions they describe.
Consider a more durable phrasing, e.g. "First-touch attribution is not yet wired up in products_v1/query.sql; the column is reserved for the equivalent of last_touch_* once available." Or just describe what the field semantically is and drop the "currently" claim. The same applies to the duplicates in user_profiles_v1/schema.yaml.
| description: The ISO 4217 currency code used for this subscription's billing (e.g. | ||
| 'usd', 'eur'), sourced from plan_currency. |
There was a problem hiding this comment.
question: The examples here ('usd', 'eur') imply lowercase, but unlike subscription_country_code (which is wrapped in LOWER() in query.sql), subscription_currency is selected as-is from subscription_platform.logical_subscriptions.plan_currency. Worth confirming the upstream casing — if plan_currency is stored uppercase (USD, EUR), the examples in this description will mislead consumers.
There was a problem hiding this comment.
note: The descriptions added here are byte-identical to those added to subscriptions_map_v1/schema.yaml, even though the two tables are loaded differently (v1 is populated by a hardcoded script.sql, v2 is backed by a Google Sheet via the external_data config in its metadata.yaml). The columns mean the same thing so this is probably fine, but if v2 is intended to supersede v1, it may help future readers to mention that this table is the Google-Sheet-backed equivalent in either the field description or the table-level metadata.yaml description.
| or Firefox Accounts record, used to target this re-engagement campaign. | ||
| - name: locale | ||
| type: STRING |
There was a problem hiding this comment.
nitpick: Hard-coding the supported-locale list into the column description duplicates the daily.locale IN (...) filter in query.sql. If that list ever changes, the description will silently drift. Consider phrasing it as "one of the locales supported by the re-engagement campaign (see the locale filter in query.sql)" rather than enumerating the values inline.
Integration report for "chore(schema): add field descriptions for braze_derived"
|
Schema Update:
moz-fx-data-shared-prod.braze_derivedSummary
How Descriptions Were Generated
Fields Updated by Table
fxa_win10_users_historical_v1newsletters_v1products_v1subscriptions_map_v1subscriptions_map_v2subscriptions_v1user_profiles_v1users_v1waitlists_v1Global.yaml Candidates
Fields routed to
global.yaml— consistent meaning across datasets:None
Contradictions Found
None
Skipped
Checklist