Skip to content

Add histogram-based numerical initialization to DP Synth#54

Merged
copybara-service[bot] merged 1 commit into
mainfrom
cl/937156127
Jun 25, 2026
Merged

Add histogram-based numerical initialization to DP Synth#54
copybara-service[bot] merged 1 commit into
mainfrom
cl/937156127

Conversation

@copybara-service

@copybara-service copybara-service Bot commented Jun 24, 2026

Copy link
Copy Markdown

Add histogram-based numerical initialization to DP Synth

Introduces sufficient_statistics.py in local_mode/, providing DP
quantile computation from pre-aggregated sparse histograms. This enables
numerical attribute initialization without access to raw data, supporting
the sufficient-statistics pipeline.

New public functions:

  • quantiles_from_histogram: recursive median splits via discrete
    exponential mechanism, matching DPQuantiles budget allocation.
  • column_measurement_from_histogram: convenience wrapper that chains
    quantile computation with edge post-processing to produce a
    ColumnMeasurement.

Also extracts _edges_to_column_measurement from NumericalInitializer
as a private helper shared between data-based and histogram-based paths.

@copybara-service copybara-service Bot force-pushed the cl/937156127 branch 5 times, most recently from 95db7f6 to 23b6173 Compare June 25, 2026 13:39
@copybara-service copybara-service Bot changed the title Extract edges_to_column_measurement from NumericalInitializer Add histogram-based numerical initialization to DP Synth Jun 25, 2026
@copybara-service copybara-service Bot force-pushed the cl/937156127 branch 3 times, most recently from c299051 to 6e3dc38 Compare June 25, 2026 16:40
Introduces `sufficient_statistics.py` in `local_mode/`, providing DP
quantile computation from pre-aggregated sparse histograms.  This enables
numerical attribute initialization without access to raw data, supporting
the sufficient-statistics pipeline.

New public functions:
- `quantiles_from_histogram`: recursive median splits via discrete
  exponential mechanism, matching `DPQuantiles` budget allocation.
- `column_measurement_from_histogram`: convenience wrapper that chains
  quantile computation with edge post-processing to produce a
  `ColumnMeasurement`.

Also extracts `_edges_to_column_measurement` from `NumericalInitializer`
as a private helper shared between data-based and histogram-based paths.

PiperOrigin-RevId: 938035670
@copybara-service copybara-service Bot merged commit 4e45c8a into main Jun 25, 2026
@copybara-service copybara-service Bot deleted the cl/937156127 branch June 25, 2026 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants