Skip to content

Latest commit

 

History

History
84 lines (62 loc) · 3.92 KB

File metadata and controls

84 lines (62 loc) · 3.92 KB

dqt Detector Benchmarks

Auto-generated by benchmarks/run_benchmarks.py. Do not edit — re-run to update.

Methodology

  • Trials: 30 independent runs (seeds 0-29)
  • Sample size: N=2,000 per fixture per trial
  • Fixtures: 8 synthetic scenarios (normal mean-shift, lognormal tail-shift, 5% outlier injection, 10% null injection, variance explosion, gradual ramp drift, combined drift+nulls, heavy-tail contamination)
  • Confidence intervals: 95% via normal approximation (mean +/- 1.96 x std / sqrt(n_trials))
  • Anomaly rate: 50% (8 clean / 8 anomalous per trial)
  • Interpretation: Detectors are grouped by intended use case. Do not compare across families (an outlier detector is not competing with a distribution drift detector).

Fixture Descriptions

ID Signal type Difficulty
normal_mean_shift N(50,10) to N(80,10) Easy
lognormal_tail_shift LN(5.0,0.5) to LN(5.5,0.5) Moderate
outliers_injected_5pct 5% extreme point anomalies Moderate
nulls_injected_10pct 10% null injection Easy
variance_explosion N(50,10) to N(50,20) Moderate
gradual_drift Ramp drift +20 over batch Hard
mixed_drift_and_nulls Mean shift + 10% nulls Moderate
heavy_tail_switch 20% contamination at 4x spread Hard

Baselines

A well-calibrated detector should beat _always_alert (F1 > 0.670) and _random_50pct (F1 > 0.500).

Detector Description F1 mean Recall FPR
_always_alert Always fires — upper ceiling at 50% anomaly rate 0.667 1.000 1.000
_never_alert Never fires — lower bound 0.000 0.000 0.000
_random_50pct 50% random alerting 0.486 0.500 0.508
_naive_zscore Batch-mean z-score > 3 threshold 0.141 0.079 0.000

Outlier Detectors

Detector F1 mean F1 std 95% CI Recall Precision FPR
auto_outlier 0.926 0.023 [0.917, 0.934] 0.863 1.000 0.000
zscore_outlier_fraction 0.877 0.011 [0.873, 0.881] 0.875 0.879 0.121
adjusted_boxplot_fraction 0.860 0.052 [0.842, 0.879] 0.758 1.000 0.000
iqr_fence 0.841 0.036 [0.828, 0.854] 0.738 0.980 0.017
double_mad_outlier_fraction 0.536 0.037 [0.523, 0.549] 0.367 1.000 0.000
grubbs 0.526 0.078 [0.498, 0.554] 0.421 0.711 0.179
generalized_esd 0.398 0.064 [0.375, 0.420] 0.254 0.958 0.017
mad_outlier_fraction 0.222 0.000 [0.222, 0.222] 0.125 1.000 0.000

Distribution Drift Detectors

Detector F1 mean F1 std 95% CI Recall Precision FPR
wasserstein_1 0.933 0.000 [0.933, 0.933] 0.875 1.000 0.000
ks_pvalue 0.920 0.033 [0.908, 0.932] 0.879 0.968 0.033
js_divergence 0.778 0.027 [0.768, 0.788] 0.637 1.000 0.000
psi 0.775 0.022 [0.767, 0.783] 0.633 1.000 0.000
kl_divergence 0.769 0.000 [0.769, 0.769] 0.625 1.000 0.000
mmd 0.708 0.051 [0.689, 0.726] 0.550 1.000 0.000

Time-Series Detectors

Detector F1 mean F1 std 95% CI Recall Precision FPR
holt_winters 0.933 0.000 [0.933, 0.933] 0.875 1.000 0.000
cusum 0.884 0.043 [0.868, 0.899] 0.800 0.990 0.008
page_hinkley 0.776 0.088 [0.744, 0.807] 0.792 0.771 0.254
monotonicity 0.667 0.000 [0.667, 0.667] 1.000 0.500 1.000
stl_residual_zscore 0.545 0.000 [0.545, 0.545] 0.750 0.429 1.000

Distribution Diagnostic Detectors

Detector F1 mean F1 std 95% CI Recall Precision FPR
benford_law_fit 0.667 0.000 [0.667, 0.667] 1.000 0.500 1.000

Raw results (with full CI columns): examples/benchmarks/results.csv