Based on the current GitHub repo (lzxmerengues-dev/randominzed_final_project, commit as of 2026-04-19). Companion to PROJECT_PLAN.md.
randominzed_final_project/
├── fd.py # patched — 2ℓ buffer, doubling
├── sfd.py # rewritten — nnz batching + real VerifySpectral
├── adaptive.py # NEW — per-batch FD/SFD selector
├── calibrate.py # NEW — hardware cost-model calibration
├── metrics.py # NEW — covariance_error, relative_error, ρ_eff
├── datasets.py # patched — MovieLens 1M loader + mixed-stream
├── benchmark.py # refactored — orchestrator only
├── experiments/
│ ├── exp1_density_sweep.py
│ ├── exp2_rho_eff_trajectory.py # D
│ ├── exp3_mixed_stream.py # A
│ ├── exp4_real_data.py
│ └── exp5_cpu_vs_gpu.py # A + D on GPU
├── figures/ # generated
├── results/ # JSON dumps, reproducible
├── tests/
│ ├── test_fd.py
│ ├── test_sfd.py
│ ├── test_adaptive.py
│ └── test_metrics.py
├── PROJECT_PLAN.md
├── IMPLEMENTATION.md
└── README.md
Everything in single-module files except experiments are split per figure. Each experiment dumps its raw results to results/expN.json so figures can be regenerated without re-running.
Current bug (fd.py:26-41): buffer size is ℓ, forcing SVD every ℓ rows. Liberty's original uses 2ℓ buffer with doubling — this halves the SVD count, which matters for fair comparison against SFD.
Target signature — unchanged:
def frequent_directions(A, ell: int) -> np.ndarray: ...Changes:
- Buffer shape
(2*ell, d), trackzero_rowup to2*ell. - SVD fires when
zero_row == 2*ell. - After shrink,
zero_row = ell(top ℓ rows hold the shrunk sketch; bottom ℓ rows are free). - Keep the
sp.issparse → toarrayconversion — FD is supposed to be density-oblivious.
Acceptance test (tests/test_fd.py):
def test_fd_covariance_guarantee():
rng = np.random.default_rng(0)
A = rng.standard_normal((500, 80))
ell, k = 20, 5
B = frequent_directions(A, ell)
lhs = np.linalg.norm(A.T @ A - B.T @ B, 2)
_, s, _ = np.linalg.svd(A, full_matrices=False)
tail = float(np.sum(s[k:] ** 2))
assert lhs <= tail / (ell - k) + 1e-6 # Liberty boundThree concurrent fixes:
(a) nnz-based buffer filling. Current (sfd.py:101-104) fills ℓ rows. Paper fills until nnz(buffer) ≥ ℓd. Implementation:
def _collect_batch(A_csr, start: int, ell: int, d: int) -> tuple[sp.csr_matrix, int]:
"""Accumulate rows from A_csr[start:] until nnz reaches ell*d (or end)."""
target = ell * d
nnz_so_far, end = 0, start
row_ptr = A_csr.indptr
n = A_csr.shape[0]
while end < n and nnz_so_far < target:
nnz_so_far += row_ptr[end + 1] - row_ptr[end]
end += 1
return A_csr[start:end], end(b) Real VerifySpectral + BoostedSparseShrink. Paper 6.1 — estimate ‖M⊤M − shrunk⊤·shrunk‖_2 via randomized power iteration, compare to tolerance, retry if above.
def _verify_spectral(
dense_sketch: np.ndarray,
sparse_batch: sp.spmatrix,
B_candidate: np.ndarray,
tol: float,
n_probes: int = 10,
) -> bool:
"""
True iff ||M^T M - B^T B||_2 ≤ tol, estimated via block power iteration.
M = [dense_sketch; sparse_batch] implicit — never materialized.
"""
d = B_candidate.shape[1]
Omega = np.random.standard_normal((d, n_probes))
# y_M = M^T M @ Omega
y_dense = dense_sketch @ Omega
y_sparse = sparse_batch @ Omega
MtM_Om = dense_sketch.T @ y_dense + sparse_batch.T @ y_sparse
BtB_Om = B_candidate.T @ (B_candidate @ Omega)
diff = np.linalg.norm(MtM_Om - BtB_Om, 2)
# Block power lower-bounds the true spectral norm; multiply by safety factor
return diff <= tol * 2.0
def _boosted_shrink(dense_sketch, sparse_batch, ell, n_iter, delta_prob):
"""Retry _simultaneous_iteration_mixed until VerifySpectral passes."""
max_retries = max(1, math.ceil(math.log(1.0 / max(delta_prob, 1e-10))))
# Tolerance from paper: O(sigma_ell^2)
for attempt in range(max_retries):
s, Vt = _simultaneous_iteration_mixed(dense_sketch, sparse_batch, ell, n_iter)
B_cand, zero_row = _shrink(s, Vt, ell)
tol = float(s[ell-1]**2) if len(s) >= ell else 0.0
if _verify_spectral(dense_sketch, sparse_batch, B_cand, tol):
return B_cand, zero_row, attempt + 1
return B_cand, zero_row, max_retries # accept last attempt on budget exhaustionReturn the number of attempts — this is what the ρ̂(δ) theory predicts.
(c) Cleaner _simultaneous_iteration_mixed. The current one (sfd.py:7-49) is correct in spirit but does an extra QR that isn't needed. Replace with standard randomized SVD (Halko-Martinsson-Tropp):
def _simultaneous_iteration_mixed(dense_sketch, sparse_batch, k, n_iter):
d = dense_sketch.shape[1]
k_eff = min(k, d)
Omega = np.random.standard_normal((d, k_eff))
Y = dense_sketch @ Omega + sparse_batch @ Omega # no vstack needed for M^T M
for _ in range(n_iter):
# power step on M^T M = A_d^T A_d + A_s^T A_s
Q, _ = np.linalg.qr(Y)
Y = dense_sketch @ (dense_sketch.T @ Q) + sparse_batch @ (sparse_batch.T @ Q)
Q, _ = np.linalg.qr(Y)
# Project and small SVD
B_small = np.vstack([dense_sketch @ Q, sparse_batch @ Q]) # (rows × k_eff)
U, s, Wt = np.linalg.svd(B_small, full_matrices=False)
Vt = Wt @ Q.T
return s, VtThe power iteration is on M⊤M which is d×d — never formed, always applied as A_d⊤(A_d·) + A_s⊤(A_s·). Complexity: O((sketch_rows·d + nnz(batch)) · k · n_iter).
(d) Top-level loop. Replace sparse_frequent_directions body with:
def sparse_frequent_directions(A, ell, n_iter=4, delta_prob=0.1, instrument=False):
A = sp.csr_matrix(A) if not sp.issparse(A) else A.tocsr()
n, d = A.shape
B = np.zeros((ell, d))
sketch_rows, start = 0, 0
log = [] if instrument else None
while start < n:
batch, end = _collect_batch(A, start, ell, d)
dense_sketch = B[:sketch_rows]
B, sketch_rows, attempts = _boosted_shrink(
dense_sketch, batch, ell, n_iter, delta_prob
)
if log is not None:
log.append(dict(
t=len(log), start=start, end=end,
batch_rows=batch.shape[0], batch_nnz=batch.nnz,
sketch_rows=sketch_rows, attempts=attempts,
rho_eff=(ell * d + batch.nnz) / ((ell + batch.shape[0]) * d),
))
start = end
return (B, log) if instrument else Binstrument=True returns per-shrink trajectory — this is the raw data for Experiment 2 (ρ_eff).
Acceptance test (tests/test_sfd.py):
def test_sfd_bound():
rng = np.random.default_rng(0)
A = sp.random(500, 80, density=0.05, random_state=rng, format="csr")
ell, k = 20, 5
B = sparse_frequent_directions(A, ell)
A_dense = A.toarray()
lhs = np.linalg.norm(A_dense.T @ A_dense - B.T @ B, 2)
_, s, _ = np.linalg.svd(A_dense, full_matrices=False)
tail = float(np.sum(s[k:] ** 2))
alpha = 6.0 / 41.0
assert lhs <= tail / (alpha * ell - k) + 1e-6 # SFD boundFit (α_fd, β_sfd) such that wall-time ≈ α_fd · (rows · d · ell) for FD and β_sfd · (nnz · k · n_iter + rows · ell²) for SFD. One-time, ~30 s on startup.
def calibrate(d: int = 1000, ell: int = 20, n_iter: int = 4, seed: int = 0) -> dict:
"""
Returns {'alpha_fd': float, 'beta_sfd': float, 'device': str}.
Fits coefficients via small grid of (rows, density).
"""
rng = np.random.default_rng(seed)
grid = [(200, 0.5), (200, 0.05), (500, 0.5), (500, 0.05), (1000, 0.01)]
fd_data, sfd_data = [], []
for rows, rho in grid:
A = sp.random(rows, d, density=rho, random_state=rng, format="csr")
# FD on dense
A_d = A.toarray()
t = _time(lambda: _fd_one_shrink(A_d, ell))
fd_data.append((rows * d * ell, t))
# SFD on sparse
t = _time(lambda: _sfd_one_shrink(A, ell, n_iter))
sfd_data.append((A.nnz * ell * n_iter + rows * ell**2, t))
alpha_fd = _fit_slope(fd_data)
beta_sfd = _fit_slope(sfd_data)
return {"alpha_fd": alpha_fd, "beta_sfd": beta_sfd, "device": "cpu"}_fit_slope is simple least-squares through origin: slope = sum(x·y) / sum(x²). No intercept — we care about scaling behavior, and hardware noise floors are captured downstream.
Cache result to ~/.cache/sfd_calibration.json keyed by (platform, d, ell) so subsequent runs skip calibration.
def adaptive_frequent_directions(
A, ell, calib: dict, n_iter: int = 4, delta_prob: float = 0.1,
instrument: bool = False,
) -> np.ndarray | tuple:
"""
Per-batch FD/SFD selector using hardware-calibrated cost model.
Guarantee: correctness identical to FD/SFD (returns valid FD-invariant sketch).
Cost: approximately min(T_FD, T_SFD) per batch, up to calibration error.
"""
A = sp.csr_matrix(A) if not sp.issparse(A) else A.tocsr()
n, d = A.shape
alpha_fd, beta_sfd = calib["alpha_fd"], calib["beta_sfd"]
B = np.zeros((ell, d)); sketch_rows, start = 0, 0
log = [] if instrument else None
while start < n:
batch, end = _collect_batch(A, start, ell, d) # same nnz-based
rows_M = sketch_rows + batch.shape[0]
nnz_M = ell * d + batch.nnz # dense sketch contributes ell*d
c_fd = alpha_fd * rows_M * d * ell
c_sfd = beta_sfd * (nnz_M * ell * n_iter + rows_M * ell**2)
if c_fd < c_sfd:
# FD path: materialize and exact-SVD
M = np.vstack([B[:sketch_rows], batch.toarray()])
s, Vt = _exact_svd_shrink(M, ell)
choice = "fd"; attempts = 1
else:
# SFD path: boosted shrink
_, _, attempts = _boosted_shrink(B[:sketch_rows], batch, ell, n_iter, delta_prob)
choice = "sfd"
B, sketch_rows = _shrink(s, Vt, ell)
if log is not None:
log.append(dict(t=len(log), choice=choice, c_fd=c_fd, c_sfd=c_sfd,
attempts=attempts, rho_eff=nnz_M / (rows_M * d)))
start = end
return (B, log) if instrument else BCorrectness claim: because both subroutines return a valid rank-ℓ sketch satisfying the FD shrink invariant (‖M‖²_F − ‖B‖²_F = ℓ · Δ), the resulting sketch is valid regardless of per-batch choice. Bound degrades to 1 / (α_mix · ℓ − k) with α_mix ∈ [6/41, 1].
Acceptance tests (tests/test_adaptive.py):
- On homogeneous low-density input, Adaptive picks SFD ≥ 80% of batches.
- On homogeneous high-density input, Adaptive picks FD ≥ 80%.
- Adaptive wall-time ≤ 1.1 × min(FD_time, SFD_time) on fixed seed.
def covariance_error(A_dense, B): ...
def relative_error(A_dense, B, k: int): ...
def rho_eff(batch_nnz, ell, d, batch_rows): ...
def theoretical_flops_adaptive(log): ... # sum over per-batch choicesDelete from benchmark.py; import from here.
-
Rename
load_movielens(100k)→load_movielens_100k. -
Add
load_movielens_1mtargetingml-1m/ratings.dat(delimiter::, 6040 × 3706, ρ ≈ 4.5%). -
Add:
def make_mixed_stream(n1, n2, d, rho1, rho2, seed=0) -> sp.csr_matrix: """Stream: n1 rows at density rho1, then n2 rows at density rho2."""
For Experiment 3. The row order is what triggers Adaptive's switch.
-
Add
make_synthetic_lowrank(n, d, rank, rho, noise=0.1, seed=0)that generates a true approximate-low-rank matrix (currentmake_syntheticinbenchmark.py:65-80breaks low-rank structure via elementwise mask — replace with row-sparse or column-block sparse pattern so rank survives).
Reduce to:
run_once(A, ell, algo, seed)→ dict of metrics + logrun_seeds(A, ell, algo, n_seeds)→ dict of {median, IQR}- CLI dispatcher that calls experiments.
No more plotting in this file — figures are per-experiment.
Each experiment is a standalone script under experiments/. Each: (1) loads/generates data, (2) runs with ≥5 seeds, (3) dumps results/expN.json, (4) produces figure to figures/expN.pdf.
Basically the existing experiment but correct:
- n = 10⁴, d = 10³, ℓ = 20.
- ρ ∈ {0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 0.5}.
- 5 seeds each.
- Compare FD, SFD, Adaptive wall-time; report median ± IQR.
- Plot with paper-ρ*, drift-ρ̂, empirical crossover lines.
- Run SFD with
instrument=Trueon three inputs: ρ ∈ {0.005, 0.05, 0.2}. - Plot ρ_eff(t) vs. t (shrink index) for each.
- Annotate the ℓ/(ℓ + r_t) lower bound.
- Annotate input ρ as dashed horizontal.
- Expected visualization: ρ_eff trajectories converge upward regardless of input ρ — the theoretical punchline.
- Input:
make_mixed_stream(5000, 5000, 1000, 0.01, 0.3)— sparse phase then dense. - Run all three algorithms; record
logfrom Adaptive. - Figure: left = wall-time bar chart (FD/SFD/Adaptive), right = Adaptive's per-batch choice over time (stacked sparse/dense phases visible).
- Expected: Adaptive close to min at every phase; pure algorithms wasteful on the mismatched phase.
- MovieLens 1M, 20 Newsgroups.
- ℓ ∈ {10, 30, 100}.
- 5 seeds (only Adaptive and SFD are randomized; FD deterministic).
- Table output: wall-time, relative covariance error, Adaptive's FD/SFD mix ratio.
- Requires cupy (pip install cupy-cuda11x; run in Colab if no local GPU).
- Port FD and SFD to cupy (
cupyx.scipy.sparse,cupy.linalg.svd). - Re-run exp1 on GPU.
- Calibrate α_fd, β_sfd separately on GPU — expect α_fd·d·ℓ ≪ CPU (near-free dense GEMM) and β_sfd only slightly reduced (sparse matvec is memory-bound).
- Figure: two panels (CPU, GPU), each showing empirical ρ̂. Expected result: GPU ρ̂ ≪ CPU ρ̂, possibly near 0 (never use SFD on GPU).
Five paper figures correspond 1:1 to experiments. All using matplotlib with consistent style (figures/style.mplstyle).
Paper outline:
- Intro, FD/SFD recap
- §3 Sketch-drift effective density (theoretical)
- §4 Adaptive FD/SFD (algorithmic)
- §5 Experiments
- §6 Hardware discussion (CPU vs GPU)
- §7 Related work, conclusion
6 pages is enough; not trying to pad.
Strict dependency order — each task unlocks the next:
| # | Task | File | Blocker for |
|---|---|---|---|
| 1 | Pull out metrics into metrics.py |
metrics.py | everything |
| 2 | Patch fd.py 2ℓ buffer |
fd.py | exp1 |
| 3 | Rewrite sfd.py (nnz batch + BoostedShrink) |
sfd.py | exp1, exp2 |
| 4 | Write tests for FD/SFD | tests/ | merge confidence |
| 5 | Upgrade datasets.py (MovieLens 1M + mixed-stream + lowrank) |
datasets.py | exp1, exp3, exp4 |
| 6 | Build calibrate.py |
calibrate.py | adaptive.py |
| 7 | Build adaptive.py |
adaptive.py | exp3, exp4, exp5 |
| 8 | Tests for adaptive | tests/ | merge |
| 9 | exp1 density sweep | experiments/ | figure 1 |
| 10 | exp2 ρ_eff trajectory | experiments/ | figure 2 |
| 11 | exp3 mixed stream | experiments/ | figure 3 |
| 12 | exp4 real data | experiments/ | figure 4 |
| 13 | GPU port + exp5 | experiments/ | figure 5 |
| 14 | Write paper | — | submission |
Tasks 1–5 in parallel with 6–7 (adaptive doesn't block FD/SFD fixes). Days 1–3 can do 1+2+3 in one session.
Before calling any milestone done:
- All tests pass (
pytest tests/). - No random-seed flakiness: every experiment script runs
numpy.random.default_rng(seed)andscipy.sparse.random(..., random_state=rng)— no implicit global state. - Results are reproducible:
python experiments/expN.pyregeneratesresults/expN.jsonbit-identical on the same seed. - Every plot traces back to raw data: no figure is generated from in-memory arrays; always from
results/expN.json. - Bounds hold in tests, not just in the paper.
- Calibration cache is portable: explicit platform key, no reliance on cpuinfo quirks.
- One-command reproduction:
make all(orbash run_all.sh) reproduces every figure from scratch in ≤ 4 hours on a laptop.
If any of the above fails, the milestone is not done — iterate.