Skip to content

theta_sketch_* UDFs hang until query timeout on estimation-mode sketches (started ~mid-June 2026, europe-west1) #544

Description

@hubert-seiki

We have theta sketch queries that ran fine for about a year and suddenly started hanging until the 10 minute query timeout, starting around 15-16 June 2026. It reproduces with a two-line query that doesn't touch any of our data, so I'm fairly sure this is a regression on the BigQuery side rather than in the UDF code.

Repro (region europe-west1, dataset bqutil.datasketches_europe_west1):

-- hangs until the query timeout. 100k distinct values, so estimation mode (theta < 1)
SELECT `bqutil.datasketches_europe_west1.theta_sketch_get_estimate`(
  (SELECT `bqutil.datasketches_europe_west1.theta_sketch_agg_int64`(x)
   FROM UNNEST(GENERATE_ARRAY(1, 100000)) x));

-- returns instantly. 10 distinct values, so exact mode (theta = 1)
SELECT `bqutil.datasketches_europe_west1.theta_sketch_get_estimate`(
  (SELECT `bqutil.datasketches_europe_west1.theta_sketch_agg_int64`(x)
   FROM UNNEST(GENERATE_ARRAY(1, 10)) x));

The cutoff is the exact/estimation transition. GENERATE_ARRAY(1, 4096) already hangs, anything smaller comes back straight away.

What's affected: anything that builds or merges a sketch once it goes past the nominal k (4096) hangs. That covers theta_sketch_agg_int64 and theta_sketch_agg_string (build), theta_sketch_union, theta_sketch_intersection, and theta_sketch_agg_union. Read-only calls on an already-built estimation-mode sketch are fine: theta_sketch_get_estimate, theta_sketch_get_theta and theta_sketch_to_string all return in a few seconds.

What a hung job looks like: it gets cancelled with Job execution was cancelled: Job timed out after 10 min 0 sec. The inputs are a few KB, so it isn't data volume. In the execution timeline it does a couple of seconds of real work and then the stage just sits there for the remaining ~595s with completed units flat while still consuming slot time, i.e. it's spinning in a loop inside the UDF worker rather than waiting on anything. That fits the sketch's compaction step, which only runs once retained entries exceed k (estimation mode), getting stuck.

Why I don't think it's the UDF code: nothing in the stack changed. In bqutil.datasketches_europe_west1 the SQL wrappers and the JS implementations (theta_sketch_*_seed / _lgk_seed) all report lastModifiedTime of 2025-06-03, and the imported wasm files at gs://bqutil-lib-europe-west1/datasketches/theta_sketch.{js,mjs,wasm} were last modified on the same date. Our SQL hasn't changed either. The only thing that moved is the runtime the UDFs run in.

Impact: real cardinalities are almost always above 4096, so estimation mode is the normal case. This breaks any theta-based cardinality, intersection or reach query, and it shows up as a 10 minute timeout rather than an error, which makes it easy to misdiagnose as a slow query.

Questions:

  1. Did a JS UDF runtime change roll out around mid-June 2026?
  2. Can the theta wasm library be rebuilt against a compatible toolchain, or the runtime change reverted/guarded?

Possibly related but a different symptom: #453.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions