This repository contains the Computer Vision 2025.2 project on sparse-view indoor 3D reconstruction with MV-DUSt3R+. It includes the proposal slide deck, Kaggle experiment scripts, ablation notes, and the supervised extension plan for occlusion-aware reliability and repeated-structure disambiguation.
This is my main computer vision research project. The goal is to make sparse-view indoor 3D reconstruction more reliable when only a few views are available and scenes contain occlusion, repeated structures, or weak overlap.
The repository is organized as a staged experiment log rather than a single notebook dump. Runs 1-27 cover baseline evaluation, confidence thresholding, view selection, fusion ablations, occlusion filtering, repeated-structure analysis, supervised reliability proxies, and hard-case mining.
- Built a reproducible Kaggle experiment sequence around MV-DUSt3R+.
- Compared confidence filtering, view selection, and fusion strategies.
- Tested occlusion-aware and repeated-structure reliability ideas through ablation scripts.
- Added supervised proxy experiments for OARH/RSDH-style reliability heads.
- Documented negative results where learned or heuristic filters hurt reconstruction quality.
- Designed the staged run order and experiment tracking structure.
- Implemented the Kaggle scripts for baseline, ablation, validation-gated, and supervised extension runs.
- Wrote the project documentation, method notes, and slide/report assets.
- Reported limitations explicitly instead of presenting every learned extension as a final improvement.
| Area | Finding |
|---|---|
| Baseline reliability | Fixed confidence thresholding remains the safest final policy. |
| View selection | Diversity/overlap-aware selection is useful for sparse views. |
| OARH proxy | Helps selected larger-view cases but can regress 2/3-view cases. |
| RSDH proxy | Run 24 beats image-only match baselines, but Runs 25--26 show it should stay out of reconstruction because the geometry gain is matched by non-learned candidate retention. |
| Final direction | Use validation-gated learned reliability instead of unconditional learned filtering. |
.
+-- docs/
| +-- experiments/ # run order, Kaggle guide, result summaries
| +-- method/ # supervised OARH/RSDH extension notes
| +-- proposal/ # original proposal sources and team PDF
+-- notebooks/ # notebook-based sanity checks
+-- pdf/ # LaTeX Beamer source and generated main.pdf
+-- scripts/
| +-- kaggle/ # reproducible staged Kaggle experiments
+-- tools/ # local maintenance helpers
Generated Kaggle submission folders, downloaded outputs, local credentials, and
the cloned upstream mvdust3r/ repository are intentionally ignored.
Local Hugging Face token files such as HF.json and .env are also ignored;
use Kaggle Secrets (HF_TOKEN) for runs that download from the Hub.
The staged Kaggle scripts live in scripts/kaggle/:
kaggle_run1_run2_eval_baseline.py: evaluator smoke test and B0 baselinekaggle_run3_confidence_sweep.py: confidence threshold sweepkaggle_run4_view_selection.py: random/diversity/overlap/hybrid view ablationkaggle_run5_basic_fusion.py: F0/F1/F2/F3 fusion ablationkaggle_run6_occlusion_fusion.py: occlusion-aware filtering ablationkaggle_run7_repeated_structure_filtering.py: repeated-structure ablationkaggle_run8_full_pipeline.py: B0/B1/V/F/O/A/Full comparisonkaggle_run9_final_stress_test.py: case-specific stress testkaggle_run10_sensitivity_visualization.py: confidence sensitivity figureskaggle_run11_final_validation_3seeds.py: fixed-threshold B0 vs Final over 3 seeds, with a P100 Torch compatibility fallbackkaggle_run12_supervised_reliability.py: frozen-backbone OARH proxy trainingkaggle_run13_match_disambiguation.py: RSDH proxy match-validity trainingkaggle_run14_validation_gated_learned_pipeline.py: validation-gated OARH fallback policykaggle_run15_mast3r_reciprocal_features.py: MASt3R reciprocal match feature extraction with explicit fallback loggingkaggle_run16_rsdh_descriptor_cycle.py: RSDH descriptor/cycle-feature training from Run 15 kernel-source featureskaggle_run17_light_finetune_decision.py: validation-based fine-tune gatekaggle_run18_learned_full_evaluation_summary.py: final learned-extension summarykaggle_run19_supervised_label_cache.py: scalable visibility/occlusion label cache for OARH v2 and RSDH v2kaggle_run20_occlusion_ambiguity_subset_mining.py: mines occlusion-heavy, low-overlap, and hard-negative subsets from Run 19kaggle_run21_oarh_v2_multitask.py: trains an OARH v2 keep/visibility/depth-residual multitask head from Run 20 labelskaggle_run22_oarh_v2_reconstruction_integration.py: evaluates whether Run 21 OARH v2 improves reconstruction F-score on Run 20 final-eval groupskaggle_run23_reconstruction_candidate_calibration.py: retrains reliability on actual MV-DUSt3R reconstruction candidates after Run 22 exposed proxy-to-reconstruction domain shiftkaggle_run24_rsdh_v2_image_only.py: trains an image-only RSDH v2 match-validity head from Run 20 hard-negative labelskaggle_run25_rsdh_v2_reconstruction_integration.py: integrates the Run 24 RSDH v2 checkpoint into reconstruction candidate scoring and gates it against fixed confidencekaggle_run26_rsdh_v2_diagnostic_gate.py: reruns the RSDH integration with all-candidate and confidence top-k baselines plus exact top-k tie handlingkaggle_run27_joint_candidate_acceptance.py: trains a joint candidate acceptance head on actual MV-DUSt3R candidates using confidence, self-geometry support, and Run 24 image-only RSDH scores
The notebook sanity check is in notebooks/kaggle_run0_mvdust3r_sanity.ipynb.
The strongest verified pipeline is view selection plus fixed confidence thresholding and baseline fusion. Heuristic occlusion and ambiguity filters were kept out of the final pipeline because they reduced F-score/completeness in the ablations.
Run 12 shows that the frozen-backbone OARH proxy is not safe as an unconditional
replacement for confidence thresholding: it helps only some larger-view cases
and hurts 2/3-view reconstruction. Run 13 shows strong proxy match-validity
learning, but it should be reported as a supervised proxy result rather than a
full MASt3R-based repeated-structure solution. Run 14 therefore tests a
validation-gated learned policy that falls back to confidence-only filtering
when the learned reliability head does not win on validation. The first Run 14
log shows that gating avoids the large 2/3/5-view OARH regressions but still
slightly overfits at 4 views, so the final tested reconstruction policy remains
fixed confidence thresholding rather than unconditional learned reliability.
Run 15 successfully used the MASt3R backend for reciprocal match extraction.
Run 16 then trained RSDH from the Run 15 kernel-source features and reached
near-perfect held-out proxy match F1, but this should be reported as a
GT-depth-assisted proxy/upper-bound result because the features include direct
3D disagreement derived from available depth. The next phase starts with Run 19,
which builds a scene-level supervised label cache for visibility, occlusion, and
wrong-depth candidates before retraining OARH/RSDH with stronger held-out
evidence.
Run 20 consumes that cache and produces balanced OARH v2/RSDH v2 manifests so
the next training runs can target occlusion-heavy and hard-negative cases
instead of sampling generic points.
Run 21 trains the OARH v2 multitask head from the Run 20 balanced label file
and reports split-level plus group-level metrics for occlusion-heavy held-out
groups. It intentionally excludes direct label-leakage inputs such as
candidate_type and visibility_label; reconstruction-level proof is still
reserved for the follow-up integration run.
The pasted Run 21 log confirms strong held-out proxy metrics, with test
keep_f1 near 0.9996 and test occluded_f1 near 0.9795. Run 22 therefore
integrates the Run 21 checkpoint into reconstruction candidate filtering and
compares it with the fixed-confidence final policy on the Run 20 final-eval
groups.
Run 22 showed that the proxy head does not transfer: validation F-score fell
from 0.6716 with fixed confidence to 0.2160 for the best learned OARH
variant, and test F-score fell from 0.6033 to 0.1936. Run 23 therefore
trains on actual MV-DUSt3R reconstruction candidates and selects any learned
ranking policy only through validation reconstruction F-score.
Run 23 nearly recovers the fixed-confidence policy only when it keeps 99.5%
of points, but the validation gate still selects fixed confidence (0.6674
versus 0.6618). Run 24 therefore moves to the remaining repeated-structure
limit by training RSDH v2 from image-only patch/coordinate features and Run 20
hard-negative match labels. Run 24 passes its validation gate: the image-only
RSDH MLP reaches validation match F1 0.6954 versus the best image-only patch
baseline 0.6212, and test F1 0.6596 versus 0.5517. Run 25 then tests
that match-validity signal on actual MV-DUSt3R candidate points, but its
validation gain is only +0.0035, below the 0.005 gate margin. Run 26
confirms the apparent reconstruction gain is a candidate-retention effect:
validation selects all_candidates with F-score 0.1463, while the best
learned RSDH policy ties that score only at selected_ratio = 1.0. Therefore
the final reconstruction policy should keep fixed confidence / candidate
retention baselines and report RSDH v2 as a useful proxy result, not a solved
image-only repeated-structure module.
Run 27 starts the new attempt to solve the two remaining limits directly: it
trains a joint candidate acceptance head on actual reconstruction candidates and
gates it against all_candidates plus confidence top-k baselines, so it can
only pass if it improves geometry beyond candidate retention.
See docs/experiments/experiment_results_summary.md and
docs/method/supervised_extension_run_order.md.
- Read
scripts/kaggle/README.mdanddocs/experiments/experiment_run_order.md. - Run the staged Kaggle scripts in order, starting from the baseline and confidence sweep.
- Compare the generated summaries with
docs/experiments/experiment_results_summary.md.
The project expects the upstream model/data setup to be handled in Kaggle. Local credentials and downloaded model artifacts are intentionally ignored.
Latest Kaggle kernels:
- Run 12 Supervised Reliability
- Run 13 Match Disambiguation
- Run 14 Validation-Gated Learned Pipeline
- Run 15 MASt3R Reciprocal Features
- Run 16 RSDH Descriptor Cycle
- Run 17 Light Finetune Decision
- Run 18 Learned Full Evaluation Summary
- Run 19 Supervised Label Cache
- Run 20 Occlusion Ambiguity Subset Mining
- Run 21 OARH v2 Multitask
- Run 22 OARH v2 Reconstruction Integration
- Run 23 Reconstruction Candidate Calibration
- Run 24 RSDH v2 Image Only
- Run 25 RSDH v2 Reconstruction Integration
- Run 26 RSDH v2 Diagnostic Gate
- Run 27 Joint Candidate Acceptance
From pdf/, use the required cleanup workflow:
latexmk -pdf main.tex
latexmk -c main.tex
find . -maxdepth 1 -type f \( \
-name "*.aux" -o -name "*.log" -o -name "*.out" -o -name "*.toc" -o \
-name "*.fls" -o -name "*.fdb_latexmk" -o -name "*.synctex.gz" -o \
-name "*.bbl" -o -name "*.blg" \
\) -deleteThe generated deck is pdf/main.pdf.
On Windows PowerShell, the same workflow is wrapped by:
.\tools\build_pdf.ps1For local maintenance, tools/push_to_github.ps1 stages the curated project
files, verifies that LaTeX intermediate files are absent, commits with a
Conventional Commit message, and pushes to origin main.