From 4667533595b90eccb9d2a82d82e7c625467cabf9 Mon Sep 17 00:00:00 2001
From: xuewei cao <36172337+xueweic@users.noreply.github.com>
Date: Sun, 24 May 2026 11:18:15 -0500
Subject: [PATCH 1/5] [Important update] Design Shared pecotmr QC and
 ColocBoost Analysis Work

Final Report

1. Interface Separation

Rationale:
Direct ColocBoost users and xQTL protocol users need different entrypoints. Mixing both workflows in one function made the API confusing and encouraged duplicated QC logic.

Key changes:
- Added/standardized colocboost_analysis() as the direct user-facing wrapper.
  - Accepts the same core inputs as colocboost() through ....
  - Adds optional QC/imputation parameters.
  - If no QC is requested, it behaves as a minimal call-through to colocboost().
  - If QC is requested, it prepares cleaned inputs first, then calls colocboost().
- Added/standardized colocboost_pipeline() as the protocol-oriented wrapper.
  - Starts from region_data produced by load_multitask_regional_data().
  - Designed for xQTL-protocol workflows.
  - Runs QC once, then reuses cleaned inputs across xQTL-only, joint-GWAS, and separate-GWAS analyses.

2. RegionalData Conversion Layer

Rationale:
Loaded regional data should be converted into reusable analysis inputs without reloading files or duplicating conversion logic.

Key changes:
- Added reusable adapters:
  - region_data_to_rss_input() for summary-stat/RSS workflows.
  - region_data_to_ind_input() for individual-level workflows.
  - region_data_to_colocboost_input() for ColocBoost-specific assembly.
- Preserved shared heavy objects such as X, LD, and X_ref.
- Used dictionaries such as dict_YX and dict_sumstatLD to avoid duplicated matrix storage.

3. Shared QC Refactor

Rationale:
QC logic should be reusable across ColocBoost, SuSiE RSS, and individual-level SuSiE workflows instead of being embedded separately in each pipeline.

Key changes:
- Added qc_individual_data() for individual-level QC.
- Expanded summary_stats_qc() to support combined RSS/ColocBoost summary-stat QC while preserving its historical LD-mismatch QC behavior.
- Clarified qc_method behavior:
  - NULL, "none", and "rss_qc" mean basic summary-stat harmonization only.
  - "slalom" and "dentist" add optional LD-mismatch outlier QC.
- Added QC progress messages for stage tracking, retained counts, skips, and warnings.

4. X_ref-Based Imputation

Rationale:
When genotype reference X_ref is available, imputation should use the genotype reference directly rather than deriving LD blocks, because LD block partitioning can alter imputation behavior.

Key changes:
- Final behavior uses whole-region scaled X_ref with RAISS genotype/SVD imputation.
- The temporary scaled matrix is used only inside imputation.
- The original X_ref remains the reference passed into the final colocboost() call.
- The implementation avoids full LD construction for X_ref imputation.
- Added tests showing:
  - raw X_ref SVD can differ from LD-based RAISS;
  - scaled X_ref SVD matches LD scale in a controlled single-matrix case;
  - colocboost_analysis(..., X_ref=..., impute=TRUE) uses scaled genotype/SVD and does not call compute_LD().

Important report observation:
- Old LD-block RAISS retained fewer variants for ENSG00000155304: 166 -> 7 and 167 -> 7.
- Whole-region scaled X_ref SVD retained more: 166 -> 16 and 167 -> 13.
- This difference is expected and should be highlighted as evidence that LD block partitioning changes imputation behavior.

5. Protocol Compatibility and Minimal Loader Fixes

Rationale:
The implementation should support the uploaded chr21 test data and xQTL-protocol use case while keeping loader changes minimal.

Key changes:
- Kept all changes within pecotmr; no changes were made to the core colocboost() function.
- Made only minimal loader-related fixes required by smoke testing:
  - multi-genotype-source loading via match_geno_pheno;
  - header handling for tabix-backed summary-stat files;
  - whitespace-padded covariate numeric parsing.
- Updated protocol-facing naming to use colocboost_pipeline().

6. Documentation and Tests

Rationale:
The new interfaces, QC behavior, and imputation decision need clear user-facing documentation and regression coverage so future changes do not reintroduce ambiguity.

Key changes:
- Updated documentation/man pages and vignette references for the new ColocBoost interfaces.
- Added chr21 smoke-test deliverable notes emphasizing the X_ref imputation decision.
- Added focused regression tests for:
  - colocboost_analysis() direct-input behavior;
  - colocboost_pipeline() protocol-level behavior;
  - reusable summary-stat QC behavior;
  - RAISS raw/scaled genotype-reference behavior;
  - X_ref-backed imputation avoiding compute_LD().
- Focused tests passed:
  - test_raiss.R
  - test_sumstats_qc.R
  - test_colocboost_pipeline.R
- Chr21 smoke tests completed on two genes across six scenarios, covering no-AD xQTL-only, basic QC, imputation, PIP screening, SLALOM QC, and combined modes.
---
 NAMESPACE                                     |    7 +-
 R/colocboost_pipeline.R                       | 1560 ++++++++++++-----
 R/file_utils.R                                |  271 ++-
 R/sumstats_qc.R                               |  437 ++++-
 R/univariate_pipeline.R                       |  112 +-
 man/colocboost_analysis.Rd                    |  114 ++
 ...sis_pipeline.Rd => colocboost_pipeline.Rd} |   32 +-
 man/qc_individual_data.Rd                     |   40 +
 man/region_data_to_colocboost_input.Rd        |   18 +
 man/region_data_to_ind_input.Rd               |   18 +
 man/region_data_to_rss_input.Rd               |   17 +
 man/rss_analysis_pipeline.Rd                  |    9 +-
 man/rss_basic_qc.Rd                           |   13 +-
 man/summary_stats_qc.Rd                       |   84 +-
 tests/testthat/test_colocboost_pipeline.R     | 1341 ++++++++++++--
 tests/testthat/test_file_utils.R              |  161 +-
 tests/testthat/test_raiss.R                   |   64 +-
 tests/testthat/test_sumstats_qc.R             |  117 ++
 tests/testthat/test_univariate_pipeline.R     |   22 +-
 vignettes/colocboost-pipeline.Rmd             |   14 +-
 20 files changed, 3760 insertions(+), 691 deletions(-)
 create mode 100644 man/colocboost_analysis.Rd
 rename man/{colocboost_analysis_pipeline.Rd => colocboost_pipeline.Rd} (60%)
 create mode 100644 man/qc_individual_data.Rd
 create mode 100644 man/region_data_to_colocboost_input.Rd
 create mode 100644 man/region_data_to_ind_input.Rd
 create mode 100644 man/region_data_to_rss_input.Rd

diff --git a/NAMESPACE b/NAMESPACE
index 369bf50a..513ba822 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -22,7 +22,8 @@ export(check_ld)
 export(clean_context_names)
 export(coloc_post_processor)
 export(coloc_wrapper)
-export(colocboost_analysis_pipeline)
+export(colocboost_analysis)
+export(colocboost_pipeline)
 export(compute_LD)
 export(compute_qtl_enrichment)
 export(compute_sldsc_M_ref)
@@ -107,9 +108,13 @@ export(parse_variant_id)
 export(postprocess_finemapping_fits)
 export(prs_cs)
 export(prs_cs_weights)
+export(qc_individual_data)
 export(raiss)
 export(read_afreq)
 export(read_sldsc_trait)
+export(region_data_to_colocboost_input)
+export(region_data_to_ind_input)
+export(region_data_to_rss_input)
 export(region_to_df)
 export(rss_analysis_pipeline)
 export(rss_basic_qc)
diff --git a/R/colocboost_pipeline.R b/R/colocboost_pipeline.R
index 9d921a7d..61c4874d 100644
--- a/R/colocboost_pipeline.R
+++ b/R/colocboost_pipeline.R
@@ -19,7 +19,7 @@ build_ld_args <- function(ld_list, subset = NULL) {
 .run_colocboost <- function(label, ...) {
   t1 <- Sys.time()
   res <- tryCatch(
-    colocboost(...),
+    colocboost_analysis(...),
     error = function(e) {
       message(label, " failed: ", conditionMessage(e))
       NULL
@@ -28,9 +28,313 @@ build_ld_args <- function(ld_list, subset = NULL) {
   list(result = res, time = Sys.time() - t1)
 }
 
-#' Multi-trait colocalization analysis pipeline
+.cb_call_colocboost <- function(args, dots) {
+  if (!requireNamespace("colocboost", quietly = TRUE)) {
+    stop("The colocboost package is required for colocboost_analysis().")
+  }
+  do.call(colocboost::colocboost, c(args, dots))
+}
+
+#' Convert loaded regional data to ColocBoost inputs
+#'
+#' @param region_data A list returned by \code{load_multitask_regional_data()}.
+#' @return A structured list containing \code{colocboost_input},
+#'   \code{qc_input}, and \code{source_info}.
+#' @export
+region_data_to_colocboost_input <- function(region_data) {
+  ind_records_from_input <- function(input) {
+    X <- .cb_as_named_list(input$X, "individual")
+    Y <- .cb_as_named_list(input$Y, "individual")
+    contexts <- intersect(names(X), names(Y))
+    records <- list()
+    for (context in contexts) {
+      if (is.null(X[[context]]) || .cb_y_ncol(Y[[context]]) == 0) next
+      records[[context]] <- list(
+        X = X[[context]],
+        Y = Y[[context]],
+        maf = .cb_list_value(input$maf, context),
+        X_variance = .cb_list_value(input$X_variance, context)
+      )
+    }
+    records
+  }
+
+  ind_input <- region_data_to_ind_input(region_data)
+  rss_input <- region_data_to_rss_input(region_data)
+
+  ind_records <- ind_records_from_input(ind_input)
+  ind_args <- .cb_format_individual(ind_records)
+
+  sumstat_records <- lapply(names(rss_input$rss_input), function(study) {
+    list(rss_input = rss_input$rss_input[[study]],
+         LD_matrix = rss_input$LD_data[[study]]$LD_matrix)
+  })
+  names(sumstat_records) <- names(rss_input$rss_input)
+  sumstat_args <- .cb_format_sumstat(sumstat_records)
+
+  outcome_names <- c(ind_args$outcome_names, names(sumstat_args$sumstat))
+  ind_args$outcome_names <- NULL
+  colocboost_input <- .cb_merge_args(ind_args, sumstat_args)
+  if (length(outcome_names) > 0) colocboost_input$outcome_names <- outcome_names
+
+  list(
+    colocboost_input = Filter(Negate(is.null), colocboost_input),
+    qc_input = list(
+      individual = ind_input[c("X", "Y", "maf", "X_variance")],
+      sumstat = rss_input[c("rss_input", "LD_data")]
+    ),
+    source_info = list(individual = ind_input$source_info,
+                       sumstat = rss_input$source_info)
+  )
+}
+
+#' ColocBoost analysis with optional pipeline QC
+#'
+#' This wrapper keeps the direct \code{colocboost()} argument surface. All
+#' ColocBoost inputs and model parameters are supplied through \code{...}. When
+#' no QC options are requested, the call is passed directly to
+#' \code{colocboost::colocboost()}. When QC options are requested, the wrapper
+#' inspects named \code{X}/\code{Y} and/or \code{sumstat}/\code{LD}/\code{X_ref}
+#' arguments in \code{...}, runs the relevant reusable QC step, and then calls
+#' ColocBoost on the cleaned inputs. If the required named inputs are not
+#' available, QC is skipped with a warning and the original ColocBoost call is
+#' used.
+#'
+#' @details
+#' Use \code{colocboost_analysis()} the same way you would use
+#' \code{colocboost::colocboost()}: pass the native ColocBoost arguments by
+#' name, for example \code{X}, \code{Y}, \code{sumstat}, \code{LD},
+#' \code{X_ref}, \code{dict_YX}, \code{dict_sumstatLD},
+#' \code{outcome_names}, \code{focal_outcome_idx}, \code{effect_est},
+#' \code{effect_se}, \code{effect_n}, \code{M}, and other ColocBoost model or
+#' post-processing options. These arguments are forwarded unchanged unless one
+#' or more QC controls are requested.
+#'
+#' Individual-level QC is only attempted when at least one individual QC control
+#' is non-\code{NULL} and named \code{X} and \code{Y} inputs are available in
+#' \code{...}. Summary-statistic QC is only attempted when \code{qc_method},
+#' \code{pip_cutoff_to_skip_sumstat}, \code{impute = TRUE}, or
+#' \code{LD_reference_info} is supplied and named \code{sumstat} plus either
+#' \code{LD}, \code{X_ref}, or \code{LD_reference_info} are available.
+#' \code{qc_method = "none"} and \code{qc_method = "rss_qc"} mean run basic
+#' allele/variant harmonization only; they do not run SLALOM/DENTIST
+#' LD-mismatch QC. RAISS imputation is controlled separately by
+#' \code{impute = TRUE}.
+#'
+#' If no QC controls are supplied, this function is a thin direct call to
+#' \code{colocboost::colocboost(...)}.
+#' When QC removes outcomes, \code{outcome_names} and \code{focal_outcome_idx}
+#' are updated to match the post-QC outcome order. If the requested focal outcome
+#' is removed by QC, \code{focal_outcome_idx} is set to \code{NULL} with a
+#' warning.
 #'
-#' This function perform a multi-trait colocalization using ColocBoost
+#' @param ... Arguments passed to \code{colocboost::colocboost()}, including
+#'   data inputs such as \code{X}, \code{Y}, \code{sumstat}, \code{LD},
+#'   \code{X_ref}, \code{dict_YX}, \code{dict_sumstatLD},
+#'   \code{outcome_names}, and all ColocBoost model/post-processing options.
+#'   QC can only inspect inputs that are supplied by name.
+#' @param missing_rate_thresh,maf_cutoff,xvar_cutoff,ld_reference_meta_file,pip_cutoff_to_skip_ind
+#'   Individual-level QC controls. If all are \code{NULL}, individual-level QC
+#'   is not run.
+#' @param keep_indel,pip_cutoff_to_skip_sumstat,qc_method,impute,impute_opts
+#'   Summary-statistic QC controls. \code{qc_method = "none"} and
+#'   \code{qc_method = "rss_qc"} run basic allele harmonization without
+#'   LD-mismatch outlier detection. Imputation is only run when
+#'   \code{impute = TRUE}.
+#' @param LD_reference_info Optional LD reference information for
+#'   summary-statistic QC. This is only needed when the native \code{LD} matrix
+#'   row/column names or \code{X_ref} column names are missing or are not
+#'   parseable genomic variant IDs. It can be a .bim/.pvar/.pvar.zst file path,
+#'   a data.frame with variant metadata, or a \code{load_LD_matrix()} result.
+#'   This is a QC-only argument and is not passed to
+#'   \code{colocboost::colocboost()}.
+#' @param variant_convention Allele order used by native ColocBoost-style
+#'   \code{sumstat$variant} and LD/X_ref names when deriving QC inputs:
+#'   \code{"A2_A1"} for pecotmr canonical \code{chr:pos:A2:A1}, or
+#'   \code{"A1_A2"} for \code{chr:pos:A1:A2}.
+#' @return The object returned by \code{colocboost::colocboost()}.
+#' @examples
+#' \dontrun{
+#' # Direct ColocBoost call without QC.
+#' fit <- colocboost_analysis(X = X, Y = Y, M = 500)
+#'
+#' # Summary-statistic input with basic allele/variant harmonization only.
+#' fit <- colocboost_analysis(sumstat = sumstat, LD = LD,
+#'                            qc_method = "none", M = 500)
+#'
+#' # Summary-statistic input with LD-mismatch QC and RAISS imputation.
+#' fit <- colocboost_analysis(sumstat = sumstat, LD = LD,
+#'                            qc_method = "slalom", impute = TRUE)
+#'
+#' # Use richer LD metadata from load_LD_matrix() for QC, while still passing
+#' # ColocBoost's native LD input.
+#' ld_data <- load_LD_matrix(ld_meta_file, region)
+#' fit <- colocboost_analysis(sumstat = sumstat, LD = ld_data$LD_matrix,
+#'                            LD_reference_info = ld_data, qc_method = "none")
+#'
+#' # Individual-level input with explicit genotype QC thresholds.
+#' fit <- colocboost_analysis(X = X, Y = Y,
+#'                            missing_rate_thresh = 0.1,
+#'                            maf_cutoff = 0.0005)
+#' }
+#' @export
+colocboost_analysis <- function(...,
+                                # individual QC
+                                missing_rate_thresh = NULL,
+                                maf_cutoff = NULL,
+                                xvar_cutoff = NULL,
+                                ld_reference_meta_file = NULL,
+                                pip_cutoff_to_skip_ind = NULL,
+                                # sumstat QC
+                                keep_indel = TRUE,
+                                pip_cutoff_to_skip_sumstat = NULL,
+                                qc_method = NULL,
+                                impute = FALSE,
+                                impute_opts = list(rcond = 0.01, R2_threshold = 0.6,
+                                                   minimum_ld = 5, lamb = 0.01),
+                                LD_reference_info = NULL,
+                                variant_convention = c("A2_A1", "A1_A2")) {
+  variant_convention <- match.arg(variant_convention)
+  direct_args <- list(...)
+  pre_qc_data_outcomes <- .cb_colocboost_outcome_names(direct_args, prefer_supplied = FALSE)
+  pre_qc_display_outcomes <- .cb_colocboost_outcome_names(direct_args, prefer_supplied = TRUE)
+  if (!is.null(qc_method)) qc_method <- .resolve_summary_qc_method(qc_method)
+
+  individual_qc_requested <- !is.null(missing_rate_thresh) ||
+    !is.null(maf_cutoff) || !is.null(xvar_cutoff) ||
+    !is.null(ld_reference_meta_file) || !is.null(pip_cutoff_to_skip_ind)
+  sumstat_qc_requested <- !is.null(qc_method) || isTRUE(impute) ||
+    !is.null(pip_cutoff_to_skip_sumstat) || !is.null(LD_reference_info)
+  qc_requested <- individual_qc_requested || sumstat_qc_requested
+  if (!qc_requested) {
+    return(.cb_call_colocboost(direct_args, list()))
+  }
+
+  X <- direct_args$X
+  Y <- direct_args$Y
+  sumstat <- direct_args$sumstat
+  LD <- direct_args$LD
+  X_ref <- direct_args$X_ref
+  dict_YX <- direct_args$dict_YX
+  dict_sumstatLD <- direct_args$dict_sumstatLD
+
+  qc_skip_messages <- character()
+  individual_qc_input <- NULL
+  if (individual_qc_requested) {
+    if (!is.null(X) && !is.null(Y)) {
+      individual_qc_input <- .cb_individual_qc_input_from_colocboost(X, Y, dict_YX)
+    } else {
+      qc_skip_messages <- c(qc_skip_messages,
+                            "Individual-level QC requested but named X and Y were not both supplied.")
+    }
+  }
+
+  sumstat_qc_input <- NULL
+  if (sumstat_qc_requested) {
+    if (!is.null(sumstat) && (!is.null(LD) || !is.null(X_ref) || !is.null(LD_reference_info))) {
+      sumstat_qc_input <- tryCatch(
+        .cb_sumstat_qc_input_from_colocboost(
+          sumstat, LD, X_ref, dict_sumstatLD,
+          LD_reference_info = LD_reference_info,
+          variant_convention = variant_convention
+        ),
+        error = function(e) {
+          qc_skip_messages <<- c(
+            qc_skip_messages,
+            paste("Summary-statistic QC input could not be prepared:", conditionMessage(e))
+          )
+          NULL
+        }
+      )
+      if (!is.null(sumstat_qc_input)) {
+        if (length(sumstat_qc_input$skip_reasons) > 0) {
+          qc_skip_messages <- c(qc_skip_messages, sumstat_qc_input$skip_reasons)
+        }
+        if (length(sumstat_qc_input$rss_input) == 0) {
+          sumstat_qc_input <- NULL
+        }
+      }
+    } else {
+      qc_skip_messages <- c(qc_skip_messages,
+                            "Summary-statistic QC requested but named sumstat plus LD, X_ref, or LD_reference_info were not supplied.")
+    }
+  }
+  if (is.null(individual_qc_input) && is.null(sumstat_qc_input)) {
+    warning("QC requested but required QC inputs are unavailable. Calling colocboost() directly. ",
+            paste(qc_skip_messages, collapse = " "))
+    return(.cb_call_colocboost(direct_args, list()))
+  }
+  if (length(qc_skip_messages) > 0) {
+    warning(paste(qc_skip_messages, collapse = " "), " Skipping unavailable QC branch.")
+  }
+
+  qc_args <- tryCatch({
+    args <- list()
+    if (!is.null(individual_qc_input)) {
+      message("QC track: processing individual-level inputs before ColocBoost.")
+      ind <- qc_individual_data(
+        X = individual_qc_input$X,
+        Y = individual_qc_input$Y,
+        maf = individual_qc_input$maf,
+        X_variance = individual_qc_input$X_variance,
+        missing_rate_thresh = missing_rate_thresh,
+        maf_cutoff = maf_cutoff,
+        xvar_cutoff = .cb_default(xvar_cutoff, 0),
+        ld_reference_meta_file = ld_reference_meta_file,
+        keep_indel = keep_indel,
+        pip_cutoff_to_skip = .cb_default(pip_cutoff_to_skip_ind, 0)
+      )
+      args <- .cb_merge_args(args, .cb_format_individual(ind))
+    }
+    if (!is.null(sumstat_qc_input) && length(sumstat_qc_input$rss_input) > 0) {
+      message("QC track: processing summary-statistic inputs before ColocBoost.")
+      sumstat_qc <- summary_stats_qc(
+        rss_input = sumstat_qc_input$rss_input,
+        LD_data = sumstat_qc_input$LD_data,
+        keep_indel = keep_indel,
+        pip_cutoff_to_skip = .cb_default(pip_cutoff_to_skip_sumstat, 0),
+        qc_method = if (is.null(qc_method)) "none" else qc_method,
+        impute = impute,
+        impute_opts = impute_opts
+      )
+      args <- .cb_merge_args(args, .cb_format_sumstat(sumstat_qc))
+    }
+    args
+  }, error = function(e) {
+    warning("QC requested but skipped: ", conditionMessage(e),
+            ". Calling colocboost() directly.")
+    NULL
+  })
+
+  if (is.null(qc_args) || length(qc_args) == 0) {
+    return(.cb_call_colocboost(direct_args, list()))
+  }
+  merged_args <- .cb_merge_args(direct_args, qc_args)
+  if (!is.null(qc_args$LD)) merged_args$X_ref <- NULL
+  if (!is.null(qc_args$X_ref)) merged_args$LD <- NULL
+  post_qc_data_outcomes <- .cb_colocboost_outcome_names(merged_args, prefer_supplied = FALSE)
+  if (length(post_qc_data_outcomes) > 0) {
+    merged_args$outcome_names <- .cb_resolve_qc_outcome_names(
+      pre_qc_data_outcomes,
+      pre_qc_display_outcomes,
+      post_qc_data_outcomes
+    )
+    merged_args$focal_outcome_idx <- .cb_remap_focal_outcome_idx(
+      focal_outcome_idx = direct_args$focal_outcome_idx,
+      pre_qc_data_outcomes = pre_qc_data_outcomes,
+      pre_qc_display_outcomes = pre_qc_display_outcomes,
+      post_qc_data_outcomes = post_qc_data_outcomes,
+      post_qc_display_outcomes = merged_args$outcome_names
+    )
+  }
+  .cb_call_colocboost(Filter(Negate(is.null), merged_args), list())
+}
+
+#' Multi-trait colocalization analysis protocol pipeline
+#'
+#' This function performs protocol-level multi-trait colocalization using
+#' ColocBoost. It accepts loaded regional data, performs QC once, then runs the
+#' requested xQTL-only, joint GWAS, and separate GWAS analyses.
 #'
 #' @param region_data A region data loaded from \code{load_regional_data}.
 #' @param focal_trait Name of trait if perform focaled ColocBoost
@@ -38,7 +342,9 @@ build_ld_args <- function(ld_list, subset = NULL) {
 #' @param maf_cutoff A scalar to remove variants with maf < maf_cutoff, dafault is 0.005.
 #' @param pip_cutoff_to_skip_ind A vector of cutoff values for skipping analysis based on PIP values for each context. Default is 0.
 #' @param pip_cutoff_to_skip_sumstat A vector of cutoff values for skipping analysis based on PIP values for each sumstat Default is 0.
-#' @param qc_method Quality control method to use. Options are "slalom" or "dentist" (default: "slalom").
+#' @param qc_method Quality control method to use. Options are "none",
+#'   "rss_qc", "slalom", or "dentist". \code{NULL} and \code{"rss_qc"}
+#'   are treated as \code{"none"} for basic-only summary-stat preprocessing.
 #' @param impute Logical; if TRUE, performs imputation for outliers identified in the analysis (default: TRUE).
 #' @param impute_opts A list of imputation options including rcond, R2_threshold, and minimum_ld (default: list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5)).
 #'
@@ -57,23 +363,28 @@ build_ld_args <- function(ld_list, subset = NULL) {
 #'
 #' @importFrom susieR susie_rss
 #' @export
-colocboost_analysis_pipeline <- function(region_data,
-                                         focal_trait = NULL,
-                                         event_filters = NULL,
-                                         # - analysis
-                                         xqtl_coloc = TRUE,
-                                         joint_gwas = FALSE,
-                                         separate_gwas = FALSE,
-                                         # - individual QC
-                                         maf_cutoff = 0.0005,
-                                         pip_cutoff_to_skip_ind = 0,
-                                         # - sumstat QC
-                                         keep_indel = TRUE,
-                                         pip_cutoff_to_skip_sumstat = 0,
-                                         qc_method = c("slalom", "dentist", "none"),
-                                         impute = TRUE,
-                                         impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01),
-                                         ...) {
+colocboost_pipeline <- function(
+  region_data,
+  focal_trait = NULL,
+  event_filters = NULL,
+  # - analysis
+  xqtl_coloc = TRUE,
+  joint_gwas = FALSE,
+  separate_gwas = FALSE,
+  # - individual QC
+  maf_cutoff = 0.0005,
+  pip_cutoff_to_skip_ind = 0,
+  # - sumstat QC
+  keep_indel = TRUE,
+  pip_cutoff_to_skip_sumstat = 0,
+  qc_method = NULL,
+  impute = TRUE,
+  impute_opts = list(
+    rcond = 0.01, R2_threshold = 0.6,
+    minimum_ld = 5, lamb = 0.01
+  ),
+  ...
+) {
   # - internal function by filtering events based on event_filters
   filter_events <- function(events, filters, condition) {
     # filters is a list of filter specifications
@@ -173,14 +484,23 @@ colocboost_analysis_pipeline <- function(region_data,
         }
       }
     } else {
-      message(if (is.null(phenotypes_init)) "No sumstat data in this region!" else "No sumstat data pass QC.")
+      if (is.null(phenotypes_init)) {
+        message("No sumstat data in this region!")
+      } else if (!is.null(phenotypes_init$sumstat_studies)) {
+        message(paste(
+          "Skipping follow-up analysis for sumstat studies",
+          paste(phenotypes_init$sumstat_studies, collapse = ";"), "after QC."
+        ))
+      } else {
+        message("No sumstat data pass QC.")
+      }
     }
 
     return(phenotypes)
   }
 
   ####### ========= resolve defaults ======== #######
-  qc_method <- match.arg(qc_method)
+  qc_method <- .resolve_summary_qc_method(qc_method)
 
   ####### ========= initial output results before QC ======== #######
   analysis_results <- list("xqtl_coloc" = NULL, "joint_gwas" = NULL, "separate_gwas" = NULL)
@@ -234,78 +554,22 @@ colocboost_analysis_pipeline <- function(region_data,
     impute_opts = impute_opts
   )
   phenotypes_QC <- extract_contexts_studies(region_data, phenotypes_init = phenotypes_init)
+  if (!is.null(phenotypes_init$sumstat_studies) &&
+      is.null(phenotypes_QC$sumstat_studies)) {
+    message("No valid summary statistic studies remain after validation.")
+  }
   t02 <- Sys.time()
   analysis_results$computing_time$QC <- t02 - t01
 
-  ####### ========= organize individual level data ======== ########
-  individual_data <- region_data$individual_data
-  if (!is.null(individual_data)) {
-    X <- individual_data$X
-    Y <- individual_data$Y
-    null_Y <- which(sapply(Y, is.null))
-    if (length(null_Y) != 0 & length(null_Y) != length(Y)) {
-      X <- X[-null_Y]
-      Y <- Y[-null_Y]
-    } else if (length(null_Y) == length(Y)) {
-      X <- NULL
-      Y <- NULL
-    }
-    if (!is.null(Y)) {
-      Y_split <- purrr::imap(Y, function(y, i) {
-        purrr::map(seq_len(ncol(y)), function(j) setNames(y[, j, drop = FALSE], colnames(y)[j]))
-      })
-      dict_YX <- cbind(seq_along(unlist(Y_split, recursive = FALSE)), rep(seq_along(Y_split), purrr::map_int(Y_split, length)))
-      Y <- unlist(Y_split, recursive = FALSE)
-      names(Y) <- sapply(Y, colnames)
-    }
-  } else {
-    X <- Y <- dict_YX <- NULL
-  }
-
-  ####### ========= organize summary statistics ======== ########
-  sumstat_data <- region_data$sumstat_data
-  if (!is.null(sumstat_data$sumstats)) {
-    sumstats <- lapply(sumstat_data$sumstats, function(ss) {
-      z <- ss$sumstats$z
-      # Normalize variant IDs to canonical format (with chr prefix)
-      variant <- normalize_variant_id(as.character(ss$sumstats$variant_id))
-      n <- ss$n
-
-      # Filter out NA values from z-scores and corresponding variants
-      na_mask <- !is.na(z)
-      if (sum(na_mask) == 0) {
-        message("Warning: All z-scores are NA for this summary statistic dataset")
-        return(data.frame("z" = numeric(0), "n" = numeric(0), "variant" = character(0)))
-      }
-
-      data.frame("z" = z[na_mask], "n" = n, "variant" = variant[na_mask])
-    })
-    names(sumstats) <- names(sumstat_data$sumstats)
-    LD_mat <- lapply(sumstat_data$LD_mat, function(ld) {
-      # Normalize LD dimnames to canonical format (with chr prefix)
-      if (!is.null(colnames(ld))) {
-        colnames(ld) <- normalize_variant_id(as.character(colnames(ld)))
-      }
-      if (!is.null(rownames(ld))) {
-        rownames(ld) <- normalize_variant_id(as.character(rownames(ld)))
-      }
-      return(ld)
-    })
-    LD_match <- sumstat_data$LD_match
-    dict_sumstatLD <- cbind(seq_along(sumstats), match(LD_match, names(sumstat_data$LD_mat)))
-    # Validate and filter sumstat entries before analysis
-    filtered <- filter_valid_sumstats(sumstats, LD_mat, LD_match)
-    if (is.null(filtered)) {
-      sumstats <- LD_mat <- dict_sumstatLD <- NULL
-    } else {
-      sumstats <- filtered$sumstats
-      LD_mat <- filtered$LD_mat
-      LD_match <- filtered$LD_match
-      dict_sumstatLD <- filtered$dict_sumstatLD
-    }
-  } else {
-    sumstats <- LD_mat <- dict_sumstatLD <- NULL
-  }
+  ####### ========= convert QC'd regional data to ColocBoost input ======== ########
+  colocboost_input <- region_data_to_colocboost_input(region_data)$colocboost_input
+  X <- colocboost_input$X
+  Y <- colocboost_input$Y
+  dict_YX <- colocboost_input$dict_YX
+  sumstats <- colocboost_input$sumstat
+  dict_sumstatLD <- colocboost_input$dict_sumstatLD
+  LD_mat <- colocboost_input$LD
+  if (is.null(LD_mat)) LD_mat <- colocboost_input$X_ref
 
 
   ####### ========= streamline three types of analyses ======== ########
@@ -323,7 +587,7 @@ colocboost_analysis_pipeline <- function(region_data,
       outcome_names = traits, focal_outcome_idx = focal_outcome_idx,
       output_level = 2, ...
     )
-    analysis_results$xqtl_coloc <- cb_res$result
+    analysis_results["xqtl_coloc"] <- list(cb_res$result)
     analysis_results$computing_time$Analysis$xqtl_coloc <- cb_res$time
   }
   # - run joint GWAS no focaled version of ColocBoost
@@ -336,7 +600,7 @@ colocboost_analysis_pipeline <- function(region_data,
       dict_YX = dict_YX, dict_sumstatLD = dict_sumstatLD,
       outcome_names = traits, focal_outcome_idx = NULL,
       output_level = 2), ld_args, list(...)))
-    analysis_results$joint_gwas <- cb_res$result
+    analysis_results["joint_gwas"] <- list(cb_res$result)
     analysis_results$computing_time$Analysis$joint_gwas <- cb_res$time
   }
   # - run focaled version of ColocBoost for each GWAS
@@ -344,9 +608,9 @@ colocboost_analysis_pipeline <- function(region_data,
     t31 <- Sys.time()
     res_gwas_separate <- analysis_results$separate_gwas
     for (i_gwas in 1:nrow(dict_sumstatLD)) {
-      current_study <- names(sumstats)[i_gwas]
-      message(paste("====== Performing focaled version GWAS-xQTL ColocBoost on", length(Y), "contexts and ", current_study, "GWAS. ====="))
       dict <- dict_sumstatLD[i_gwas, ]
+      current_study <- names(sumstats)[dict[1]]
+      message(paste("====== Performing focaled version GWAS-xQTL ColocBoost on", length(Y), "contexts and ", current_study, "GWAS. ====="))
       traits <- c(names(Y), current_study)
       ld_args_sep <- build_ld_args(LD_mat, subset = dict[2])
       cb_res <- do.call(.run_colocboost, c(
@@ -355,7 +619,7 @@ colocboost_analysis_pipeline <- function(region_data,
           dict_YX = dict_YX,
           outcome_names = traits, focal_outcome_idx = length(traits),
           output_level = 2), ld_args_sep, list(...)))
-      res_gwas_separate[[current_study]] <- cb_res$result
+      res_gwas_separate[current_study] <- list(cb_res$result)
     }
     t32 <- Sys.time()
     analysis_results$separate_gwas <- res_gwas_separate
@@ -365,86 +629,22 @@ colocboost_analysis_pipeline <- function(region_data,
   return(analysis_results)
 }
 
-
-#' Validate a single summary statistics entry
-#'
-#' Checks whether a sumstat data frame (with columns z, n, variant) has
-#' sufficient data for colocboost analysis.
-#'
-#' @param ss_df A data.frame with columns "z", "n", and "variant" as produced
-#'   by the sumstat processing block in \code{colocboost_analysis_pipeline}.
-#' @param min_variants Minimum number of non-NA z-score variants required.
-#'   Default is 2.
-#' @return TRUE if the entry is valid; FALSE otherwise.
-#' @noRd
-is_valid_sumstat_entry <- function(ss_df, min_variants = 2) {
-  if (is.null(ss_df) || !is.data.frame(ss_df)) return(FALSE)
-  if (nrow(ss_df) < min_variants) return(FALSE)
-  if (all(is.na(ss_df$z))) return(FALSE)
-  if (all(ss_df$n <= 0 | is.na(ss_df$n))) return(FALSE)
-  return(TRUE)
-}
-
-
-#' Filter summary statistics to retain only valid entries
+#' Initial QC for the region data loaded from \code{load_regional_data}
 #'
-#' Applies \code{is_valid_sumstat_entry} to each element of the sumstats list
-#' and removes invalid entries. Also updates the LD match mapping and
-#' dict_sumstatLD accordingly.
+#' This compatibility wrapper converts loaded regional data to reusable individual
+#' and RSS inputs, runs the shared QC helpers once, and returns the historical
+#' post-QC structure consumed by \code{colocboost_pipeline()}.
 #'
-#' @param sumstats Named list of summary statistic data frames.
-#' @param LD_mat List of LD matrices.
-#' @param LD_match Character vector mapping each sumstat to an LD matrix name.
-#' @param min_variants Minimum variant count passed to
-#'   \code{is_valid_sumstat_entry}.
-#' @return A list with filtered \code{sumstats}, \code{LD_mat}, \code{LD_match},
-#'   and \code{dict_sumstatLD}, or NULL if no valid entries remain.
-#' @noRd
-filter_valid_sumstats <- function(sumstats, LD_mat, LD_match, min_variants = 2) {
-  valid_idx <- vapply(sumstats, is_valid_sumstat_entry, logical(1),
-                       min_variants = min_variants)
-  if (!any(valid_idx)) {
-    message("No valid summary statistic studies remain after validation.")
-    return(NULL)
-  }
-  removed <- names(sumstats)[!valid_idx]
-  if (length(removed) > 0) {
-    message("Removed invalid sumstat studies: ", paste(removed, collapse = ", "))
-  }
-  sumstats <- sumstats[valid_idx]
-  LD_match <- LD_match[valid_idx]
-  dict_sumstatLD <- cbind(seq_along(sumstats), match(LD_match, names(LD_mat)))
-  list(sumstats = sumstats, LD_mat = LD_mat, LD_match = LD_match,
-       dict_sumstatLD = dict_sumstatLD)
-}
-
-
-#' Initial QC for the region data loading from \code{load_regional_data}
-#'
-#' This function do the initial QC including: check PIP; check maf for individual_data; check QC and impute for sumstat_data
-#'
-#' @section Loading individual level data from multiple corhorts
 #' @param region_data A region data loaded from \code{load_regional_data}.
-#' @param maf_cutoff A scalar to remove variants with maf < maf_cutoff, dafault is 0.005.
-#' @param pip_cutoff_to_skip_ind A vector of cutoff values for skipping analysis based on PIP values for each context. Default is 0.
-#' @param pip_cutoff_to_skip_sumstat A vector of cutoff values for skipping analysis based on PIP values for each sumstat Default is 0.
-#' @param qc_method Quality control method to use. Options are "slalom" or "dentist" (default: "slalom").
-#' @param impute Logical; if TRUE, performs imputation for outliers identified in the analysis (default: TRUE).
-#' @param impute_opts A list of imputation options including rcond, R2_threshold, and minimum_ld (default: list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5)).
-#'
-#'
-#' @return A list containing the individual_data and sumstat_data after QC:
-#' individual_data contains the following components if exist
-#' \itemize{
-#'   \item Y: A list of residualized phenotype values for all tasks.
-#'   \item X: A list of residualized genotype matrices all tasks.
-#' }
-#' sumstat_data contains the following components if exist
-#' \itemize{
-#'   \item sumstats: A list of summary statistics for the matched LD_info, each sublist contains sumstats, n, var_y from \code{load_rss_data}.
-#'   \item LD_info: A list of LD information, each sublist contains LD_variants, LD_matrix, ref_panel  \code{load_LD_matrix}.
-#' }
-#'
+#' @param maf_cutoff A scalar to remove variants with maf < maf_cutoff.
+#' @param pip_cutoff_to_skip_ind A vector of cutoff values for skipping individual contexts.
+#' @param pip_cutoff_to_skip_sumstat A vector of cutoff values for skipping summary-stat studies.
+#' @param qc_method Quality control method to use. Options are "none",
+#'   "rss_qc", "slalom", or "dentist". \code{NULL} and \code{"rss_qc"}
+#'   are treated as \code{"none"} for basic-only summary-stat preprocessing.
+#' @param impute Logical; if TRUE, performs imputation when required metadata are available.
+#' @param impute_opts A list of imputation options.
+#' @return A list containing post-QC \code{individual_data} and \code{sumstat_data}.
 #' @noRd
 qc_regional_data <- function(region_data,
                              # - individual
@@ -453,285 +653,815 @@ qc_regional_data <- function(region_data,
                              # - sumstat
                              keep_indel = TRUE,
                              pip_cutoff_to_skip_sumstat = 0,
-                             qc_method = c("slalom", "dentist", "none"),
+                             qc_method = NULL,
                              impute = TRUE,
                              impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01)) {
-  qc_method <- match.arg(qc_method)
-
-  # Validate and recycle pip_cutoff_to_skip_ind: scalar -> named vector for all contexts
-  if (!is.null(region_data$individual_data)) {
-    ind_context_names <- names(region_data$individual_data$residual_Y)
-    n_ind_contexts <- length(ind_context_names)
-    if (length(pip_cutoff_to_skip_ind) == 1 && is.null(names(pip_cutoff_to_skip_ind))) {
-      pip_cutoff_to_skip_ind <- setNames(rep(pip_cutoff_to_skip_ind, n_ind_contexts), ind_context_names)
-    } else if (!is.null(names(pip_cutoff_to_skip_ind))) {
-      # Named vector: fill missing contexts with 0
-      missing <- setdiff(ind_context_names, names(pip_cutoff_to_skip_ind))
-      if (length(missing) > 0) {
-        pip_cutoff_to_skip_ind <- c(pip_cutoff_to_skip_ind, setNames(rep(0, length(missing)), missing))
+  qc_method <- .resolve_summary_qc_method(qc_method)
+  qced_individual_to_region_data <- function(ind_qc) {
+    if (is.null(ind_qc) || length(ind_qc) == 0) return(NULL)
+    list(
+      X = lapply(ind_qc, `[[`, "X"),
+      Y = lapply(ind_qc, `[[`, "Y")
+    )
+  }
+  qced_sumstat_to_region_data <- function(sumstat_qc) {
+    if (is.null(sumstat_qc) || length(sumstat_qc) == 0) return(NULL)
+    if (!is.null(sumstat_qc$rss_input) && !is.null(sumstat_qc$LD_matrix)) {
+      sumstat_qc <- list(study1 = sumstat_qc)
+    }
+    sumstats <- lapply(sumstat_qc, `[[`, "rss_input")
+    LD_mat <- list()
+    LD_match <- character()
+    ld_variant_index <- list()
+    for (study in names(sumstat_qc)) {
+      ld <- sumstat_qc[[study]]$LD_matrix
+      variant_key <- paste(colnames(ld), collapse = ",")
+      if (variant_key %in% names(ld_variant_index)) {
+        LD_match <- c(LD_match, ld_variant_index[[variant_key]])
+      } else {
+        LD_mat[[study]] <- ld
+        ld_variant_index[[variant_key]] <- study
+        LD_match <- c(LD_match, study)
       }
-    } else if (length(pip_cutoff_to_skip_ind) == n_ind_contexts) {
-      names(pip_cutoff_to_skip_ind) <- ind_context_names
-    } else {
-      stop("pip_cutoff_to_skip_ind must be a scalar, a named vector, or a vector with length equal to the number of individual contexts (", n_ind_contexts, ").")
     }
+    list(sumstats = sumstats, LD_mat = LD_mat, LD_match = LD_match)
   }
 
-  # Validate pip_cutoff_to_skip_sumstat: scalar -> named vector for all studies
-  if (!is.null(region_data$sumstat_data)) {
-    all_study_names <- unlist(lapply(region_data$sumstat_data$sumstats, names))
-    if (length(pip_cutoff_to_skip_sumstat) == 1 && is.null(names(pip_cutoff_to_skip_sumstat))) {
-      pip_cutoff_to_skip_sumstat <- setNames(rep(pip_cutoff_to_skip_sumstat, length(all_study_names)), all_study_names)
-    } else if (!is.null(names(pip_cutoff_to_skip_sumstat))) {
-      # Named vector: fill missing studies with 0
-      missing <- setdiff(all_study_names, names(pip_cutoff_to_skip_sumstat))
-      if (length(missing) > 0) {
-        pip_cutoff_to_skip_sumstat <- c(pip_cutoff_to_skip_sumstat, setNames(rep(0, length(missing)), missing))
-      }
-    }
+  individual_data <- NULL
+  ind_input <- region_data_to_ind_input(region_data)
+  if (isTRUE(ind_input$source_info$has_individual)) {
+    ind_qc <- qc_individual_data(
+      X = ind_input$X,
+      Y = ind_input$Y,
+      maf = ind_input$maf,
+      X_variance = ind_input$X_variance,
+      maf_cutoff = maf_cutoff,
+      pip_cutoff_to_skip = pip_cutoff_to_skip_ind
+    )
+    individual_data <- qced_individual_to_region_data(ind_qc)
   }
 
-  #### related internal functions
-  # Add context names to colname of Y if missing
-  add_context_to_Y <- function(res_Y) {
-    res <- lapply(seq_along(res_Y), function(iy) {
-      y <- res_Y[[iy]]
-      if (is.null(y)) {
-        return(NULL)
-      }
-      if (is.null(colnames(y))) {
-        colnames(y) <- names(res_Y)[iy]
-      } else {
-        colnames(y) <- paste0(names(res_Y)[iy], "_", colnames(y))
-      }
-      return(y)
-    })
-    names(res) <- names(res_Y)
-    return(res)
-  }
-
-  # Initial PIP check for individual level data
-  filter_resY_pip <- function(res_X, res_Y, pip_cutoff = 0, context = NULL) {
-    # Initial PIP check
-    if (pip_cutoff != 0) {
-      if (pip_cutoff < 0) {
-        # automatically determine the cutoff to use
-        pip_cutoff <- 3 * 1 / ncol(res_X)
+  sumstat_data <- NULL
+  rss_input <- region_data_to_rss_input(region_data)
+  if (isTRUE(rss_input$source_info$has_sumstat)) {
+    sumstat_qc <- summary_stats_qc(
+      rss_input = rss_input$rss_input,
+      LD_data = rss_input$LD_data,
+      keep_indel = keep_indel,
+      pip_cutoff_to_skip = pip_cutoff_to_skip_sumstat,
+      qc_method = qc_method,
+      impute = impute,
+      impute_opts = impute_opts
+    )
+    sumstat_data <- qced_sumstat_to_region_data(sumstat_qc)
+  }
+
+  list(individual_data = individual_data, sumstat_data = sumstat_data)
+}
+
+#' Run reusable individual-level QC
+#'
+#' @param X Genotype matrix or named list of genotype matrices.
+#' @param Y Phenotype vector/matrix or named list of phenotype matrices.
+#' @param maf Optional MAF vector or named list.
+#' @param X_variance Optional variant variance vector or named list.
+#' @param missing_rate_thresh Maximum missing genotype rate.
+#' @param maf_cutoff Minimum MAF cutoff.
+#' @param xvar_cutoff Minimum genotype variance cutoff.
+#' @param ld_reference_meta_file Optional LD reference metadata file.
+#' @param keep_indel Whether indel variants are kept during LD-reference
+#'   filtering.
+#' @param pip_cutoff_to_skip Optional single-effect PIP cutoff.
+#' @return A named list of cleaned context-level \code{X}/\code{Y} records, or
+#'   one cleaned record for matrix inputs.
+#' @export
+qc_individual_data <- function(X, Y, maf = NULL, X_variance = NULL,
+                               missing_rate_thresh = NULL,
+                               maf_cutoff = 0.0005,
+                               xvar_cutoff = 0,
+                               ld_reference_meta_file = NULL,
+                               keep_indel = TRUE,
+                               pip_cutoff_to_skip = 0,
+                               context = NULL) {
+  qc_one <- function(X, Y, maf = NULL, X_variance = NULL, context = NULL,
+                     pip_cutoff_to_skip = 0) {
+    if (is.null(X) || is.null(Y)) return(NULL)
+    if (is.null(colnames(X))) stop("X must have variant colnames for individual-level QC.")
+    if (is.vector(Y)) Y <- matrix(Y, ncol = 1)
+    if (is.null(colnames(Y))) colnames(Y) <- .cb_default(context, paste0("outcome", seq_len(ncol(Y))))
+    if (!is.null(context)) colnames(Y) <- paste0(context, "_", colnames(Y))
+
+    message("QC track: starting individual-level QC for ", .cb_default(context, "individual data"), ".")
+    original_variants <- colnames(X)
+    if (!is.null(names(maf)) || length(maf) == length(original_variants)) {
+      if (is.null(names(maf))) names(maf) <- original_variants
+    }
+    if (!is.null(names(X_variance)) || length(X_variance) == length(original_variants)) {
+      if (is.null(names(X_variance))) names(X_variance) <- original_variants
+    }
+    if (!is.null(ld_reference_meta_file)) {
+      reference_filter <- filter_variants_by_ld_reference(original_variants, ld_reference_meta_file,
+                                                          keep_indel = keep_indel)
+      X <- X[, reference_filter$data, drop = FALSE]
+      if (!is.null(names(maf))) maf <- maf[colnames(X)]
+      if (!is.null(names(X_variance))) X_variance <- X_variance[colnames(X)]
+    }
+    X <- filter_X(X, missing_rate_thresh = missing_rate_thresh,
+                  maf_thresh = maf_cutoff, var_thresh = xvar_cutoff,
+                  maf = maf, X_variance = X_variance)
+    if (!is.null(names(maf))) maf <- maf[colnames(X)]
+    if (!is.null(names(X_variance))) X_variance <- X_variance[colnames(X)]
+
+    if (!is.null(pip_cutoff_to_skip) && pip_cutoff_to_skip != 0) {
+      cutoff <- pip_cutoff_to_skip
+      if (cutoff < 0) cutoff <- 3 / ncol(X)
+      keep_y <- logical(ncol(Y))
+      for (j in seq_len(ncol(Y))) {
+        observed <- !is.na(Y[, j])
+        if (sum(observed) < 2) next
+        pip <- susieR::susie(X[observed, , drop = FALSE], Y[observed, j],
+                             L = 1, max_iter = 100)$pip
+        keep_y[j] <- any(pip > cutoff)
       }
-      top_model_pip <- lapply(1:ncol(res_Y), function(i) susieR::susie(res_X, res_Y[, i], L = 1)$pip)
-      check_model_pip <- sapply(top_model_pip, function(pip) any(pip > pip_cutoff))
-      include_idx <- which(check_model_pip)
-      if (length(include_idx) == 0) {
-        message(paste(
-          "Skipping follow-up analysis for individual-context", context,
-          ". No signals above PIP threshold", pip_cutoff, "in initial model screening."
-        ))
+      if (!any(keep_y)) {
+        message("QC track: skipping individual context ", context,
+                ". No outcomes passed PIP threshold ", cutoff, ".")
         return(NULL)
-      } else if (length(include_idx) == ncol(res_Y)) {
-        message(paste("Keep all individual-phenotypes in context", context, "."))
-      } else {
-        exclude_idx <- setdiff(1:ncol(res_Y), include_idx)
-        exclude_pheno <- paste(colnames(res_Y)[exclude_idx], collapse = ";")
-        message(paste(
-          "Skipping follow-up analysis for individual-phenotypes", exclude_pheno, "in context", context,
-          ". No signals above PIP threshold", pip_cutoff, "in initial model screening."
-        ))
-        res_Y <- res_Y[, include_idx, drop = FALSE] %>% .[, order(colnames(.)), drop = FALSE]
       }
+      Y <- Y[, keep_y, drop = FALSE]
+    }
+    message("QC track: retained ", ncol(X), " variants and ", ncol(Y),
+            " outcome(s) for individual context ", .cb_default(context, "input"), ".")
+    list(X = X, Y = Y, maf = maf, X_variance = X_variance)
+  }
+
+  if (is.list(X) && !is.matrix(X) && !is.data.frame(X)) {
+    X <- .cb_as_named_list(X, "individual")
+    Y <- .cb_as_named_list(Y, "individual")
+    contexts <- intersect(names(X), names(Y))
+    if (length(contexts) == 0) stop("No matched X/Y contexts for individual-level QC.")
+    cutoffs <- .cb_named_cutoff(pip_cutoff_to_skip, contexts, "pip_cutoff_to_skip_ind")
+    out <- list()
+    for (context in contexts) {
+      res <- qc_one(
+        X[[context]], Y[[context]],
+        maf = .cb_list_value(maf, context),
+        X_variance = .cb_list_value(X_variance, context),
+        context = context,
+        pip_cutoff_to_skip = cutoffs[[context]]
+      )
+      if (!is.null(res)) out[[context]] <- res
     }
-    return(res_Y)
-  }
-
-  # Initial check for all contexts with individual-level data
-  data_initial_screen_individual <- function(X,
-                                             Y,
-                                             MAF,
-                                             maf_cutoff = 0.0005,
-                                             pip_cutoff_to_skip_ind = 0) {
-    # - add context to colname of Y
-    Y <- add_context_to_Y(Y)
-    results <- purrr::imap(X, function(resX, context) {
-      resY <- Y[[context]]
-      maf <- MAF[[context]]
-      pip_cutoff <- if (context %in% names(pip_cutoff_to_skip_ind)) {
-        pip_cutoff_to_skip_ind[[context]]
+    return(out)
+  }
+  qc_one(
+    X = X, Y = Y, maf = maf, X_variance = X_variance,
+    pip_cutoff_to_skip = pip_cutoff_to_skip,
+    context = context
+  )
+}
+
+
+##### Generic ColocBoost helper functions #####
+
+.cb_default <- function(x, y) if (is.null(x)) y else x
+
+.cb_merge_args <- function(x, y) {
+  for (nm in names(y)) {
+    if (!is.null(y[[nm]])) x[[nm]] <- y[[nm]]
+  }
+  x
+}
+
+.cb_as_named_list <- function(x, default_name) {
+  if (is.null(x)) return(NULL)
+  if (is.list(x) && !is.matrix(x) && !is.data.frame(x)) return(x)
+  stats::setNames(list(x), default_name)
+}
+
+.cb_list_value <- function(x, name, default = NULL) {
+  if (is.null(x)) return(default)
+  if (is.list(x) && !is.matrix(x) && !is.data.frame(x)) {
+    if (name %in% names(x)) return(x[[name]])
+    return(default)
+  }
+  x
+}
+
+.cb_named_cutoff <- function(x, names_to_fill, arg_name) {
+  if (length(x) == 1 && is.null(names(x))) {
+    return(stats::setNames(rep(x, length(names_to_fill)), names_to_fill))
+  }
+  if (!is.null(names(x))) {
+    missing <- setdiff(names_to_fill, names(x))
+    if (length(missing) > 0) x <- c(x, stats::setNames(rep(0, length(missing)), missing))
+    return(x[names_to_fill])
+  }
+  if (length(x) == length(names_to_fill)) {
+    return(stats::setNames(x, names_to_fill))
+  }
+  stop(arg_name, " must be a scalar, named vector, or match the number of inputs.")
+}
+
+.cb_y_ncol <- function(y) {
+  if (is.null(y)) return(0L)
+  if (is.null(dim(y))) return(as.integer(length(y) > 0))
+  ncol(y)
+}
+
+##### Outcome helper functions #####
+
+.cb_colocboost_outcome_names <- function(args, prefer_supplied = TRUE) {
+  if (isTRUE(prefer_supplied) && !is.null(args$outcome_names)) {
+    return(as.character(args$outcome_names))
+  }
+  Y <- args$Y
+  y_outcomes <- character()
+  if (!is.null(Y)) {
+    if (is.data.frame(Y)) Y <- as.matrix(Y)
+    if (is.matrix(Y)) {
+      y_outcomes <- colnames(Y)
+      if (is.null(y_outcomes)) y_outcomes <- paste0("Y", seq_len(ncol(Y)))
+    } else if (is.atomic(Y) && !is.list(Y)) {
+      y_outcomes <- "Y1"
+    } else {
+      y_names <- names(Y)
+      y_cols <- unlist(Map(function(y, nm) {
+        if (is.null(y)) return(character())
+        if (is.data.frame(y)) y <- as.matrix(y)
+        if (!is.null(dim(y))) {
+          cn <- colnames(y)
+          if (is.null(cn) || any(is.na(cn) | cn == "")) {
+            if (!is.null(nm) && !is.na(nm) && nm != "" && ncol(y) == 1) return(nm)
+            return(paste0(.cb_default(nm, "Y"), "_", seq_len(ncol(y))))
+          }
+          return(as.character(cn))
+        }
+        if (!is.null(nm) && !is.na(nm) && nm != "") return(nm)
+        "Y"
+      }, Y, .cb_default(y_names, rep("", length(Y)))), use.names = FALSE)
+      if (length(y_cols) != length(Y) || anyDuplicated(y_cols) == 0) {
+        y_outcomes <- y_cols
       } else {
-        0
+        if (is.null(y_names) || any(is.na(y_names) | y_names == "")) {
+          y_names <- paste0("Y", seq_along(Y))
+        }
+        y_outcomes <- y_names
       }
-      if (is.null(resY)) return(NULL)
-      resX <- filter_X(resX, missing_rate_thresh = NULL, maf_thresh = maf_cutoff, maf = maf)
-      resY <- filter_resY_pip(resX, resY, pip_cutoff = pip_cutoff, context = context)
-      if (is.null(resY)) return(NULL)
-      list(X = resX, Y = resY)
-    }) %>% purrr::compact()
-
-    if (length(results) == 0) {
-      message("Skipping follow-up analysis for all contexts.")
-      return(NULL)
     }
-    keep_contexts <- names(results)
-    message(paste("Region includes the following contexts after inital screening:", paste(keep_contexts, collapse = ";"), "."))
-    list(
-      X = purrr::map(results, "X"),
-      Y = purrr::map(results, "Y")
-    )
   }
 
-  # - individual level data QC
-  individual_data <- region_data$individual_data
-  if (!is.null(individual_data)) {
-    X <- individual_data$residual_X
-    Y <- individual_data$residual_Y
-    MAF <- individual_data$maf
-    # 1. remove maf < maf_cutoff
-    # 2. initial check PIP
-    individual_data <- data_initial_screen_individual(
-      X = X, Y = Y, MAF = MAF,
-      maf_cutoff = maf_cutoff,
-      pip_cutoff_to_skip_ind = pip_cutoff_to_skip_ind
+  sumstat <- args$sumstat
+  effect_est <- args$effect_est
+  sumstat_outcomes <- character()
+  if (!is.null(sumstat)) {
+    if (is.data.frame(sumstat)) {
+      sumstat_outcomes <- "sumstat1"
+    } else {
+      ss_names <- names(sumstat)
+      if (is.null(ss_names) || any(is.na(ss_names) | ss_names == "")) {
+        ss_names <- paste0("sumstat", seq_along(sumstat))
+      }
+      sumstat_outcomes <- as.character(ss_names)
+    }
+  } else if (!is.null(effect_est)) {
+    effect_est <- as.matrix(effect_est)
+    effect_names <- colnames(effect_est)
+    if (is.null(effect_names)) effect_names <- paste0("sumstat", seq_len(ncol(effect_est)))
+    sumstat_outcomes <- as.character(effect_names)
+  }
+  c(as.character(y_outcomes), sumstat_outcomes)
+}
+
+.cb_resolve_qc_outcome_names <- function(pre_qc_data_outcomes,
+                                         pre_qc_display_outcomes,
+                                         post_qc_data_outcomes) {
+  if (length(post_qc_data_outcomes) == 0) return(character())
+  if (length(pre_qc_data_outcomes) == length(pre_qc_display_outcomes) &&
+      length(pre_qc_data_outcomes) > 0 &&
+      !anyDuplicated(pre_qc_data_outcomes)) {
+    idx <- match(post_qc_data_outcomes, pre_qc_data_outcomes)
+    if (all(!is.na(idx))) {
+      return(pre_qc_display_outcomes[idx])
+    }
+  }
+  post_qc_data_outcomes
+}
+
+.cb_remap_focal_outcome_idx <- function(focal_outcome_idx,
+                                        pre_qc_data_outcomes,
+                                        pre_qc_display_outcomes,
+                                        post_qc_data_outcomes,
+                                        post_qc_display_outcomes) {
+  if (is.null(focal_outcome_idx)) return(NULL)
+  if (length(focal_outcome_idx) != 1 || is.na(focal_outcome_idx)) {
+    warning("focal_outcome_idx must be a single non-missing index. Passing it through unchanged.")
+    return(focal_outcome_idx)
+  }
+  focal_outcome_idx <- as.integer(focal_outcome_idx)
+  if (length(post_qc_data_outcomes) == 0) return(NULL)
+  if (focal_outcome_idx < 1 ||
+      focal_outcome_idx > max(length(pre_qc_display_outcomes), length(pre_qc_data_outcomes))) {
+    warning("focal_outcome_idx is outside the pre-QC outcome range. Passing it through unchanged.")
+    return(focal_outcome_idx)
+  }
+
+  focal_display <- if (focal_outcome_idx <= length(pre_qc_display_outcomes)) {
+    pre_qc_display_outcomes[[focal_outcome_idx]]
+  } else {
+    NULL
+  }
+  focal_data <- if (focal_outcome_idx <= length(pre_qc_data_outcomes)) {
+    pre_qc_data_outcomes[[focal_outcome_idx]]
+  } else {
+    NULL
+  }
+
+  candidates <- integer()
+  for (needle in unique(c(focal_display, focal_data))) {
+    if (is.null(needle) || is.na(needle) || needle == "") next
+    candidates <- c(
+      candidates,
+      which(post_qc_display_outcomes == needle),
+      which(post_qc_data_outcomes == needle)
     )
+    suffix_pattern <- paste0("_", needle)
+    candidates <- c(
+      candidates,
+      which(endsWith(post_qc_display_outcomes, suffix_pattern)),
+      which(endsWith(post_qc_data_outcomes, suffix_pattern))
+    )
+  }
+  candidates <- unique(candidates)
+  if (length(candidates) > 0) {
+    return(candidates[[1]])
+  }
+  if (length(pre_qc_data_outcomes) == length(post_qc_data_outcomes) ||
+      length(pre_qc_display_outcomes) == length(post_qc_display_outcomes)) {
+    return(focal_outcome_idx)
   }
 
+  focal_label <- .cb_default(focal_display, .cb_default(focal_data, focal_outcome_idx))
+  warning("focal_outcome_idx refers to outcome ", focal_label,
+          ", which is not present after QC. Setting focal_outcome_idx to NULL.")
+  NULL
+}
 
-  # sumstat_data QC, imputation using raiss
-  # return A list of sumstat_data after initial checking and QC:
-  # \itemize{
-  #   \item sumstats: A list of summary statistics and ready to do analysis.
-  #   \item LD_mat: A list of LD matrix and ready to do analysis.
-  #   \item LD_match: A vector of strings to indicating sumstats and LD matching (save space since multiple sumstats may link to the same LD matrix).
-  # }
-  summary_stats_qc_multitask <- function(sumstat_data,
-                                         keep_indel = TRUE,
-                                         pip_cutoff_to_skip_sumstat = 0,
-                                         qc_method = c("slalom", "dentist", "none"),
-                                         impute = TRUE,
-                                         impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01)) {
-    n_LD <- length(sumstat_data$LD_info)
-    # Collect results into lists and flatten at the end
-    collected_sumstats <- list()
-    collected_LD <- list()
-    collected_LD_match <- character()
-    # Track LD matrices by variant signature for O(1) deduplication
-    ld_variant_index <- list()
+##### Individual-level ColocBoost helper functions #####
 
-    for (i in 1:n_LD) {
-      LD_data <- sumstat_data$LD_info[[i]]
-      sumstats <- sumstat_data$sumstats[[i]]
-      has_genotype <- isTRUE(LD_data$is_genotype)
-
-      # When source is genotype X, derive R only where needed (QC, imputation).
-      # Keep X as primary for colocboost X_ref.
-      LD_data_for_qc <- LD_data
-      if (has_genotype) {
-        LD_data_for_qc$LD_matrix <- compute_LD(LD_data$LD_matrix, method = "sample")
-        LD_data_for_qc$is_genotype <- FALSE
+.cb_individual_qc_input_from_colocboost <- function(X, Y, dict_YX = NULL) {
+  bind_y_for_qc <- function(Y_list, y_names = NULL) {
+    if (is.null(y_names)) y_names <- names(Y_list)
+    if (is.null(y_names)) y_names <- rep("", length(Y_list))
+    mats <- Map(function(y, nm) {
+      if (is.null(dim(y))) y <- matrix(y, ncol = 1)
+      if (is.null(colnames(y))) {
+        colnames(y) <- if (!is.null(nm) && !is.na(nm) && nm != "") nm else paste0("Y", seq_len(ncol(y)))
       }
+      y
+    }, Y_list, y_names)
+    do.call(cbind, mats)
+  }
 
-      # Pre-compute LD partition once per block (shared across all GWAS studies)
-      if (impute) {
-        LD_matrix_partitioned <- partition_LD_matrix(LD_data_for_qc)
+  X_list <- .cb_as_named_list(X, "X1")
+  Y_list <- .cb_as_named_list(Y, "Y1")
+  matched <- intersect(names(X_list), names(Y_list))
+  if (length(matched) > 0 && is.null(dict_YX)) {
+    return(list(X = X_list[matched], Y = Y_list[matched]))
+  }
+
+  if (!is.null(dict_YX)) {
+    dict <- as.matrix(dict_YX)
+    if (ncol(dict) < 2) stop("dict_YX must have at least two columns.")
+    X_qc <- list()
+    Y_qc <- list()
+    for (x_idx in unique(dict[, 2])) {
+      if (is.na(x_idx) || x_idx < 1 || x_idx > length(X_list)) next
+      y_idx <- dict[dict[, 2] == x_idx, 1]
+      y_idx <- y_idx[!is.na(y_idx) & y_idx >= 1 & y_idx <= length(Y_list)]
+      if (length(y_idx) == 0) next
+      context <- names(X_list)[x_idx]
+      if (is.null(context) || is.na(context) || context == "") context <- paste0("X", x_idx)
+      X_qc[[context]] <- X_list[[x_idx]]
+      Y_qc[[context]] <- bind_y_for_qc(Y_list[y_idx], names(Y_list)[y_idx])
+    }
+    if (length(X_qc) > 0) return(list(X = X_qc, Y = Y_qc))
+  }
+
+  if (length(X_list) == 1 && length(Y_list) > 0) {
+    context <- names(X_list)[1]
+    if (is.null(context) || is.na(context) || context == "") context <- "X1"
+    return(list(
+      X = stats::setNames(list(X_list[[1]]), context),
+      Y = stats::setNames(list(bind_y_for_qc(Y_list, names(Y_list))), context)
+    ))
+  }
+
+  if (length(X_list) == length(Y_list)) {
+    if (is.null(names(X_list))) names(X_list) <- paste0("X", seq_along(X_list))
+    names(Y_list) <- names(X_list)
+    return(list(X = X_list, Y = Y_list))
+  }
+
+  list(X = X_list, Y = Y_list)
+}
+
+.cb_format_individual <- function(ind) {
+  if (length(ind) == 0) return(list())
+  if (!is.null(ind$X) && !is.null(ind$Y)) {
+    ind <- list(individual = ind)
+  }
+  ind <- Filter(function(x) {
+    !is.null(x$X) && .cb_y_ncol(x$Y) > 0
+  }, ind)
+  if (length(ind) == 0) return(list())
+  X <- lapply(ind, `[[`, "X")
+  Y <- lapply(ind, `[[`, "Y")
+  unique_X <- list()
+  X_match <- integer(length(X))
+  for (i in seq_along(X)) {
+    matched <- names(unique_X)[vapply(unique_X, identical, logical(1), X[[i]])]
+    if (length(matched) > 0) {
+      X_match[[i]] <- match(matched[[1]], names(unique_X))
+    } else {
+      unique_X[[names(ind)[i]]] <- X[[i]]
+      X_match[[i]] <- length(unique_X)
+    }
+  }
+  Y_split <- list()
+  dict <- matrix(nrow = 0, ncol = 2)
+  y_index <- 0L
+  all_outcome_names <- unlist(lapply(Y, colnames), use.names = FALSE)
+  duplicated_outcome_names <- unique(all_outcome_names[
+    duplicated(all_outcome_names) | duplicated(all_outcome_names, fromLast = TRUE)
+  ])
+  for (i in seq_along(Y)) {
+    y <- Y[[i]]
+    if (is.null(dim(y))) y <- matrix(y, ncol = 1)
+    for (j in seq_len(ncol(y))) {
+      y_index <- y_index + 1L
+      outcome <- colnames(y)[j]
+      if (is.null(outcome) || is.na(outcome) || outcome == "") {
+        outcome <- paste0("outcome", y_index)
+      }
+      context <- names(ind)[i]
+      if (outcome %in% duplicated_outcome_names &&
+          !is.null(context) && !is.na(context) && context != "") {
+        outcome <- paste0(context, "_", outcome)
       }
+      if (outcome %in% names(Y_split)) {
+        outcome <- make.unique(c(names(Y_split), outcome))[length(Y_split) + 1]
+      }
+      Y_split[[outcome]] <- y[, j, drop = FALSE]
+      dict <- rbind(dict, c(y_index, X_match[[i]]))
+    }
+  }
+  colnames(dict) <- c("Y", "X")
+  list(X = unique_X, Y = Y_split, dict_YX = dict, outcome_names = names(Y_split))
+}
 
-      for (ii in seq_along(sumstats)) {
-        sumstat <- sumstats[[ii]]
-        if (nrow(sumstat$sumstats) == 0) next
-        n <- sumstat$n
-        conditions_sumstat <- names(sumstats)[ii]
-        pip_cutoff_to_skip_ld <- if (conditions_sumstat %in% names(pip_cutoff_to_skip_sumstat)) {
-          as.numeric(pip_cutoff_to_skip_sumstat[conditions_sumstat])
-        } else {
-          0
-        }
+##### Summary-statistic ColocBoost helper functions #####
 
-        # Preprocess: allele QC + variant subsetting (needs R for [variants, variants] indexing)
-        preprocess_results <- rss_basic_qc(sumstat$sumstats, LD_data_for_qc, keep_indel = keep_indel)
-        sumstat$sumstats <- preprocess_results$sumstats
-        R_mat <- preprocess_results$LD_mat
-
-        # Initial PIP checking (uses X when available, R otherwise)
-        if (pip_cutoff_to_skip_ld != 0) {
-          pip_vars <- sumstat$sumstats$variant_id
-          if (has_genotype) {
-            pip <- susie_rss(z = sumstat$sumstats$z,
-              X = LD_data$LD_matrix[, pip_vars, drop = FALSE],
-              L = 1, L_greedy = NULL, max_iter = 1, n = n)$pip
-          } else {
-            pip <- susie_rss(z = sumstat$sumstats$z, R = R_mat, L = 1, L_greedy = NULL, max_iter = 1, n = n)$pip
-          }
-          if (pip_cutoff_to_skip_ld < 0) {
-            pip_cutoff_to_skip_ld <- 3 * 1 / length(pip_vars)
-          }
-          if (!any(pip > pip_cutoff_to_skip_ld)) {
-            message(paste(
-              "Skipping follow-up analysis for sumstat study", conditions_sumstat,
-              ". No signals above PIP threshold", pip_cutoff_to_skip_ld, "in initial model screening."
-            ))
-            next
-          } else {
-            message(paste("Keep summary study", conditions_sumstat, "."))
-          }
-        }
+.cb_sumstat_qc_input_from_colocboost <- function(sumstat, LD, X_ref, dict_sumstatLD,
+                                                LD_reference_info = NULL,
+                                                variant_convention = c("A2_A1", "A1_A2")) {
+  is_ld_data <- function(x) {
+    is.list(x) && !is.null(x$LD_matrix)
+  }
+  as_reference_info_list <- function(x) {
+    if (is.null(x)) return(NULL)
+    if (is_ld_data(x) || is.data.frame(x) || is.character(x)) {
+      return(list(LD = x))
+    }
+    if (is.list(x) && length(x) > 0) return(x)
+    stop("LD_reference_info must be a .bim/.pvar path, data.frame, load_LD_matrix() result, or a list of these.")
+  }
+  reference_info_for_index <- function(reference_info, index) {
+    if (is.null(reference_info)) return(NULL)
+    if (length(reference_info) == 1) return(reference_info[[1]])
+    reference_info[[min(index, length(reference_info))]]
+  }
+  validate_sumstat_for_qc <- function(sumstat, study) {
+    if (is.null(sumstat) || !is.data.frame(sumstat)) {
+      return(paste0("Summary-statistic QC for study ", study,
+                    " requires each sumstat input to be a data.frame."))
+    }
+    missing_cols <- setdiff(c("variant", "z"), colnames(sumstat))
+    if (length(missing_cols) > 0) {
+      return(paste0("Summary-statistic QC for study ", study,
+                    " requires sumstat columns: ", paste(missing_cols, collapse = ", "), "."))
+    }
+    NULL
+  }
+  ld_variant_names <- function(ld) {
+    if (!is.matrix(ld)) return(NULL)
+    if (nrow(ld) == ncol(ld)) rownames(ld) else colnames(ld)
+  }
 
-        # Quality control - remove outlier variants (needs R)
-        if (!is.null(qc_method) && qc_method != "none") {
-          qc_results <- summary_stats_qc(sumstat$sumstats, LD_data_for_qc, n = n, method = qc_method)
-          sumstat$sumstats <- qc_results$sumstats
-          R_mat <- qc_results$LD_mat
-        }
-        # Imputation (needs R via partitioned LD)
-        if (impute) {
-          impute_results <- raiss(LD_data_for_qc$ref_panel, sumstat$sumstats, LD_matrix_partitioned,
-            rcond = impute_opts$rcond,
-            R2_threshold = impute_opts$R2_threshold, minimum_ld = impute_opts$minimum_ld, lamb = impute_opts$lamb
-          )
-          sumstat$sumstats <- impute_results$result_filter
-          R_mat <- impute_results$LD_mat
-        }
+  variant_convention <- match.arg(variant_convention)
+  sumstat <- .cb_as_named_list(sumstat, "sumstat")
+  using_x_ref <- is.null(LD) && !is.null(X_ref)
+  ld_source <- .cb_as_named_list(if (!is.null(LD)) LD else X_ref, "LD")
+  reference_info <- as_reference_info_list(LD_reference_info)
+  if (is.null(dict_sumstatLD)) {
+    dict_sumstatLD <- cbind(seq_along(sumstat), pmin(seq_along(sumstat), length(ld_source)))
+  }
+  rss_input <- list()
+  LD_data <- list()
+  skip_reasons <- character()
+  for (i in seq_along(sumstat)) {
+    study <- names(sumstat)[i]
+    ld_index <- dict_sumstatLD[i, 2]
+    validation_reason <- validate_sumstat_for_qc(sumstat[[i]], study)
+    if (!is.null(validation_reason)) {
+      skip_reasons <- c(skip_reasons, validation_reason)
+      next
+    }
+    ld_mat <- ld_source[[ld_index]]
+    ref_info <- reference_info_for_index(reference_info, ld_index)
+    if (is.null(ref_info)) {
+      message("QC track: checking LD/X_ref variant names for summary-stat study ", study, ".")
+      ref_ids <- ld_variant_names(ld_mat)
+      if (is.null(ref_ids)) {
+        skip_reasons <- c(
+          skip_reasons,
+          paste0("Summary-statistic QC for study ", study,
+                 " requires LD row/column names or X_ref column names parseable as genomic variant IDs; ",
+                 "provide LD_reference_info for QC. Skipping summary-statistic QC for this study.")
+        )
+        next
+      }
+      ld_data <- .cb_make_ld_data(
+        ld_mat,
+        is_genotype = using_x_ref,
+        variant_convention = variant_convention
+      )
+      if (is.null(ld_data)) {
+        skip_reasons <- c(
+          skip_reasons,
+          paste0("Summary-statistic QC for study ", study,
+                 " could not parse LD/X_ref names as genomic variant IDs; ",
+                 "provide LD_reference_info for QC. Skipping summary-statistic QC for this study.")
+        )
+        next
+      }
+      message("QC track: LD/X_ref names are parseable for summary-stat study ", study, ".")
+    } else if (is_ld_data(ref_info)) {
+      message("QC track: using supplied LD_reference_info LD data for summary-stat study ", study, ".")
+      ld_data <- .normalize_ld_data_for_qc(ref_info)
+    } else {
+      message("QC track: using supplied LD_reference_info variant metadata for summary-stat study ", study, ".")
+      ld_data <- .cb_make_ld_data(
+        ld_mat,
+        is_genotype = using_x_ref,
+        reference_info = ref_info,
+        variant_convention = variant_convention
+      )
+    }
+    parsed <- tryCatch(
+      .cb_parse_variants_for_qc(sumstat[[i]]$variant, variant_convention),
+      error = function(e) {
+        skip_reasons <<- c(
+          skip_reasons,
+          paste0("Summary-statistic QC for study ", study,
+                 " could not parse sumstat$variant as genomic variant IDs: ",
+                 conditionMessage(e), " Skipping summary-statistic QC for this study.")
+        )
+        NULL
+      }
+    )
+    if (is.null(parsed)) next
+    variant_id <- format_variant_id(parsed$chrom, parsed$pos, parsed$A2, parsed$A1)
+    ss <- data.frame(parsed, z = sumstat[[i]]$z,
+                     variant_id = variant_id,
+                     stringsAsFactors = FALSE)
+    n <- if ("n" %in% colnames(sumstat[[i]])) unique(sumstat[[i]]$n)[1] else NULL
+    var_y <- if ("var_y" %in% colnames(sumstat[[i]])) unique(sumstat[[i]]$var_y)[1] else 1
+    rss_input[[study]] <- list(sumstats = ss, n = n, var_y = var_y)
+    LD_data[[study]] <- ld_data
+  }
+  list(rss_input = rss_input, LD_data = LD_data, skip_reasons = skip_reasons)
+}
 
-        # Store: X subset if genotype source, R otherwise
-        final_vars <- sumstat$sumstats$variant_id
-        if (has_genotype) {
-          missing <- setdiff(final_vars, colnames(LD_data$LD_matrix))
-          if (length(missing) > 0) {
-            stop("BUG: ", length(missing), " QC'd variants not found in genotype matrix X. ",
-                 "First few: ", paste(head(missing, 3), collapse = ", "))
-          }
-          mat_to_store <- LD_data$LD_matrix[, final_vars, drop = FALSE]
+.cb_format_sumstat <- function(sumstat_qc) {
+  valid_sumstat_entry <- function(ss_df, min_variants = 2) {
+    if (is.null(ss_df) || !is.data.frame(ss_df)) return(FALSE)
+    if (nrow(ss_df) < min_variants) return(FALSE)
+    if (all(is.na(ss_df$z))) return(FALSE)
+    if (all(ss_df$n <= 0 | is.na(ss_df$n))) return(FALSE)
+    TRUE
+  }
+  filter_valid_sumstats <- function(sumstats, LD_mat, min_variants = 2) {
+    dedupe_ld <- function(LD_mat, studies) {
+      unique_ld <- list()
+      LD_match <- character()
+      for (study in studies) {
+        ld <- LD_mat[[study]]
+        matched <- names(unique_ld)[vapply(unique_ld, identical, logical(1), ld)]
+        if (length(matched) > 0) {
+          LD_match <- c(LD_match, matched[[1]])
         } else {
-          mat_to_store <- R_mat
+          unique_ld[[study]] <- ld
+          LD_match <- c(LD_match, study)
         }
+      }
+      list(LD_mat = unique_ld, LD_match = LD_match)
+    }
+
+    valid_idx <- vapply(sumstats, valid_sumstat_entry, logical(1),
+                        min_variants = min_variants)
+    if (!any(valid_idx)) {
+      message("No valid summary statistic studies remain after validation.")
+      return(NULL)
+    }
+    removed <- names(sumstats)[!valid_idx]
+    if (length(removed) > 0) {
+      message("Removed invalid sumstat studies: ", paste(removed, collapse = ", "))
+    }
+    sumstats <- sumstats[valid_idx]
+    LD_mat <- LD_mat[valid_idx]
+    deduped <- dedupe_ld(LD_mat, names(sumstats))
+    LD_mat <- deduped$LD_mat
+    LD_match <- deduped$LD_match
+    dict_sumstatLD <- cbind(seq_along(sumstats), match(LD_match, names(LD_mat)))
+    list(sumstats = sumstats, LD_mat = LD_mat, LD_match = LD_match,
+         dict_sumstatLD = dict_sumstatLD)
+  }
+  if (length(sumstat_qc) == 0) return(list())
+  if (!is.null(sumstat_qc$rss_input) && !is.null(sumstat_qc$LD_matrix)) {
+    sumstat_qc <- list(sumstat = sumstat_qc)
+  }
+  sumstat <- lapply(sumstat_qc, function(x) {
+    ss <- x$rss_input$sumstats
+    variant_id <- if ("variant_id" %in% colnames(ss)) {
+      ss$variant_id
+    } else {
+      format_variant_id(ss$chrom, ss$pos, ss$A2, ss$A1)
+    }
+    data.frame(z = ss$z, n = x$rss_input$n,
+               variant = normalize_variant_id(variant_id),
+               stringsAsFactors = FALSE)
+  })
+  LD_mat <- lapply(sumstat_qc, `[[`, "LD_matrix")
+  filtered <- filter_valid_sumstats(sumstat, LD_mat)
+  if (is.null(filtered)) return(list())
+  c(
+    list(
+      sumstat = filtered$sumstats,
+      dict_sumstatLD = filtered$dict_sumstatLD
+    ),
+    build_ld_args(filtered$LD_mat)
+  )
+}
 
-        # Collect sumstat
-        collected_sumstats[[conditions_sumstat]] <- sumstat
 
-        # Deduplicate LD using variant signature hash
-        variant_key <- paste(colnames(mat_to_store), collapse = ",")
-        if (variant_key %in% names(ld_variant_index)) {
-          collected_LD_match <- c(collected_LD_match, ld_variant_index[[variant_key]])
-        } else {
-          collected_LD[[conditions_sumstat]] <- mat_to_store
-          ld_variant_index[[variant_key]] <- conditions_sumstat
-          collected_LD_match <- c(collected_LD_match, conditions_sumstat)
-        }
+##### LD/reference QC helper functions #####
+
+.cb_make_ld_data <- function(ld, is_genotype = NULL,
+                             reference_info = NULL,
+                             variant_convention = c("A2_A1", "A1_A2")) {
+  reference_info_to_ref_panel <- function(reference_info) {
+    if (is.character(reference_info) && length(reference_info) == 1) {
+      if (!file.exists(reference_info)) {
+        stop("LD_reference_info file does not exist: ", reference_info)
       }
+      reference_info <- read_variant_metadata(reference_info)
+    }
+    if (!is.data.frame(reference_info)) {
+      stop("LD_reference_info must be a .bim/.pvar path or data.frame when it is not a load_LD_matrix() result.")
     }
-    return(list(sumstats = collected_sumstats, LD_mat = collected_LD, LD_match = collected_LD_match))
+    info <- as.data.frame(reference_info, stringsAsFactors = FALSE)
+    names(info) <- sub("^#", "", names(info))
+    names(info) <- sub("^chr$", "chrom", names(info), ignore.case = TRUE)
+    names(info) <- sub("^CHROM$", "chrom", names(info))
+    names(info) <- sub("^POS$", "pos", names(info))
+    names(info) <- sub("^ID$", "id", names(info))
+    names(info) <- sub("^REF$", "A2", names(info))
+    names(info) <- sub("^ALT$", "A1", names(info))
+    names(info) <- sub("^a0$", "A2", names(info))
+    names(info) <- sub("^a1$", "A1", names(info))
+    names(info) <- sub("^rsid$", "id", names(info), ignore.case = TRUE)
+
+    if ("variants" %in% names(info) && !"variant_id" %in% names(info)) {
+      info$variant_id <- normalize_variant_id(as.character(info$variants))
+    }
+    if (!"variant_id" %in% names(info)) {
+      missing_cols <- setdiff(c("chrom", "pos", "A2", "A1"), names(info))
+      if (length(missing_cols) > 0) {
+        stop("LD_reference_info must contain variant_id or columns chrom, pos, A2, A1. Missing: ",
+             paste(missing_cols, collapse = ", "), ".")
+      }
+      info$variant_id <- normalize_variant_id(format_variant_id(info$chrom, info$pos, info$A2, info$A1))
+    } else {
+      info$variant_id <- normalize_variant_id(as.character(info$variant_id))
+    }
+    if (!all(c("chrom", "pos", "A2", "A1") %in% names(info))) {
+      parsed <- parse_variant_id(info$variant_id)
+      info$chrom <- parsed$chrom
+      info$pos <- parsed$pos
+      info$A2 <- parsed$A2
+      info$A1 <- parsed$A1
+    }
+    keep_cols <- intersect(c("chrom", "id", "pos", "A2", "A1", "variant_id",
+                             "allele_freq", "variance", "n_nomiss"), names(info))
+    info[, keep_cols, drop = FALSE]
   }
 
+  align_ref_panel_to_ld <- function(ref_panel, ld_names, n_variants) {
+    if (!is.null(ld_names)) {
+      match_idx <- rep(NA_integer_, length(ld_names))
+      if ("id" %in% names(ref_panel)) {
+        match_idx <- match(ld_names, ref_panel$id)
+      }
+      if (any(is.na(match_idx))) {
+        variant_match <- match(normalize_variant_id(ld_names), ref_panel$variant_id)
+        match_idx[is.na(match_idx)] <- variant_match[is.na(match_idx)]
+      }
+      if (all(!is.na(match_idx))) {
+        ref_panel <- ref_panel[match_idx, , drop = FALSE]
+        rownames(ref_panel) <- NULL
+        return(ref_panel)
+      }
+      if (length(ld_names) == nrow(ref_panel)) {
+        message("QC track: LD_reference_info could not be matched by LD names; using LD_reference_info row order.")
+        return(ref_panel[seq_along(ld_names), , drop = FALSE])
+      }
+      stop("LD_reference_info could not be matched to LD/X_ref names. ",
+           "Provide an id/variant_id column matching LD names, or provide rows in LD matrix order.")
+    }
+    if (nrow(ref_panel) != n_variants) {
+      stop("LD_reference_info has ", nrow(ref_panel), " variants but LD/X_ref has ",
+           n_variants, " columns. Provide LD_reference_info in LD matrix order or with matching LD names.")
+    }
+    message("QC track: LD/X_ref names are missing; using LD_reference_info row order.")
+    ref_panel
+  }
 
-  # - summary statistics QC
-  sumstat_data <- region_data$sumstat_data
-  if (!is.null(sumstat_data)) {
-    # - initial check PIP, qc or impute
-    sumstat_data <- summary_stats_qc_multitask(sumstat_data,
-      keep_indel = keep_indel,
-      pip_cutoff_to_skip_sumstat = pip_cutoff_to_skip_sumstat,
-      qc_method = qc_method,
-      impute = impute, impute_opts = impute_opts
+  variant_convention <- match.arg(variant_convention)
+  if (is.null(is_genotype)) is_genotype <- is.matrix(ld) && nrow(ld) != ncol(ld)
+  ref_ids <- if (is.matrix(ld) && nrow(ld) == ncol(ld)) rownames(ld) else colnames(ld)
+
+  if (!is.null(reference_info)) {
+    ref_panel <- reference_info_to_ref_panel(reference_info)
+    ref_panel <- align_ref_panel_to_ld(ref_panel, ref_ids, ncol(ld))
+    variant_ids <- ref_panel$variant_id
+    if (is.matrix(ld) && nrow(ld) == ncol(ld)) {
+      rownames(ld) <- colnames(ld) <- variant_ids
+    } else if (is.matrix(ld)) {
+      colnames(ld) <- variant_ids
+    }
+    return(list(
+      LD_matrix = ld,
+      LD_variants = variant_ids,
+      ref_panel = ref_panel,
+      block_metadata = if (!isTRUE(is_genotype)) .infer_single_ld_block_metadata(ref_panel) else NULL,
+      is_genotype = isTRUE(is_genotype)
+    ))
+  }
+
+  if (is.null(ref_ids)) {
+    return(NULL)
+  }
+
+  parsed <- if (!is.null(ref_ids)) {
+    tryCatch(
+      .cb_parse_variants_for_qc(ref_ids, variant_convention),
+      error = function(e) NULL
     )
+  } else {
+    NULL
+  }
+  if (is.null(parsed)) return(NULL)
+  variant_ids <- if (!is.null(parsed)) format_variant_id(parsed$chrom, parsed$pos, parsed$A2, parsed$A1) else NULL
+  if (!is.null(variant_ids)) {
+    if (is.matrix(ld) && nrow(ld) == ncol(ld)) {
+      rownames(ld) <- colnames(ld) <- variant_ids
+    } else if (is.matrix(ld)) {
+      colnames(ld) <- variant_ids
+    }
+    parsed$variant_id <- variant_ids
+  }
+  list(
+    LD_matrix = ld,
+    LD_variants = variant_ids,
+    ref_panel = parsed,
+    block_metadata = if (!isTRUE(is_genotype) && !is.null(parsed)) .infer_single_ld_block_metadata(parsed) else NULL,
+    is_genotype = isTRUE(is_genotype)
+  )
+}
+
+.cb_parse_variants_for_qc <- function(ids, variant_convention = c("A2_A1", "A1_A2")) {
+  variant_convention <- match.arg(variant_convention)
+  parsed <- parse_variant_id(ids)
+  if (any(is.na(parsed$chrom)) || any(is.na(parsed$pos)) ||
+      any(is.na(parsed$A1)) || any(is.na(parsed$A2))) {
+    stop("QC requires variant IDs parseable as chr:pos:A2:A1 by default. ",
+         "If the input uses chr:pos:A1:A2, set variant_convention = 'A1_A2'.")
+  }
+  if (identical(variant_convention, "A1_A2")) {
+    parsed <- data.frame(chrom = parsed$chrom, pos = parsed$pos,
+                         A2 = parsed$A1, A1 = parsed$A2,
+                         stringsAsFactors = FALSE)
   }
-  return(list(
-    individual_data = individual_data,
-    sumstat_data = sumstat_data
-  ))
+  parsed
 }
diff --git a/R/file_utils.R b/R/file_utils.R
index e8a9df85..6583695d 100644
--- a/R/file_utils.R
+++ b/R/file_utils.R
@@ -858,8 +858,19 @@ load_genotype_region <- function(genotype, region = NULL, keep_indel = TRUE,
 #' @importFrom magrittr %>%
 #' @noRd
 read_single_covariate <- function(path) {
-  df <- read_delim(path, "\t", col_types = cols()) %>% select(-1)
-  non_numeric <- names(df)[!sapply(df, is.numeric)]
+  raw_df <- read_delim(path, "\t", col_types = cols(.default = "c")) %>% select(-1)
+  df <- raw_df
+  non_numeric <- character()
+  for (nm in names(df)) {
+    values <- trimws(as.character(df[[nm]]))
+    converted <- suppressWarnings(as.numeric(values))
+    bad <- !is.na(values) & values != "" & is.na(converted)
+    if (any(bad)) {
+      non_numeric <- c(non_numeric, nm)
+    } else {
+      df[[nm]] <- converted
+    }
+  }
   if (length(non_numeric) > 0) {
     stop("Non-numeric columns found in covariate file ", path, ": ",
          paste(non_numeric, collapse = ", "),
@@ -1766,16 +1777,34 @@ load_multitask_regional_data <- function(region, # a string of chr:start-end for
   # - if exist individual level data
   individual_data <- NULL
   if (!is.null(genotype_list)) {
+    if (length(phenotype_list) != length(covariate_list)) {
+      stop("Data load error. 'phenotype_list' and 'covariate_list' must have the same length.")
+    }
+    if (is.null(conditions_list_individual)) {
+      conditions_list_individual <- paste0("condition", seq_along(phenotype_list))
+      warning("Data load warning. 'conditions_list_individual' is not supplied; using default condition names. ",
+              "Provide 'conditions_list_individual' to preserve cohort or cell-type labels.")
+    }
+    if (length(conditions_list_individual) != length(phenotype_list)) {
+      stop("Data load error. 'conditions_list_individual' must have the same length as 'phenotype_list'.")
+    }
     #### FIXME: later if we have mulitple genotype list
     if (length(genotype_list) != 1 & is.null(match_geno_pheno)) {
       stop("Data load error. Please make sure 'match_geno_pheno' exists if you load data from multiple individual-level data.")
     } else if (length(genotype_list) == 1 & is.null(match_geno_pheno)) {
       match_geno_pheno <- rep(1, length(phenotype_list))
     }
+    if (length(match_geno_pheno) != length(phenotype_list)) {
+      stop("Data load error. 'match_geno_pheno' must have the same length as 'phenotype_list'.")
+    }
+    if (any(is.na(match_geno_pheno)) ||
+        any(match_geno_pheno < 1 | match_geno_pheno > length(genotype_list))) {
+      stop("Data load error. 'match_geno_pheno' must contain valid indices into 'genotype_list'.")
+    }
 
     # - load individual data from multiple datasets
-    n_dataset <- unique(match_geno_pheno) ### FIXME
-    for (i_data in 1:n_dataset) {
+    n_dataset <- unique(match_geno_pheno)
+    for (i_data in n_dataset) {
       # extract genotype file name
       genotype <- genotype_list[i_data]
       # extract phenotype and covariate file names
@@ -1783,6 +1812,10 @@ load_multitask_regional_data <- function(region, # a string of chr:start-end for
       phenotype <- phenotype_list[pos]
       covariate <- covariate_list[pos]
       conditions <- conditions_list_individual[pos]
+      extract_region_name_i <- extract_region_name
+      if (is.list(extract_region_name) && length(extract_region_name) == length(phenotype_list)) {
+        extract_region_name_i <- extract_region_name[pos]
+      }
       dat <- load_regional_univariate_data(
         genotype = genotype, phenotype = phenotype,
         covariate = covariate,
@@ -1792,7 +1825,7 @@ load_multitask_regional_data <- function(region, # a string of chr:start-end for
         maf_cutoff = maf_cutoff, mac_cutoff = mac_cutoff,
         imiss_cutoff = imiss_cutoff, keep_indel = keep_indel,
         keep_samples = keep_samples, keep_variants = keep_variants,
-        extract_region_name = extract_region_name,
+        extract_region_name = extract_region_name_i,
         phenotype_header = phenotype_header,
         region_name_col = region_name_col,
         scale_residuals = scale_residuals
@@ -1800,9 +1833,9 @@ load_multitask_regional_data <- function(region, # a string of chr:start-end for
       if (is.null(individual_data)) {
         individual_data <- dat
       } else {
-        individual_data <- lapply(names(dat), function(k) {
+        individual_data <- stats::setNames(lapply(names(dat), function(k) {
           c(individual_data[[k]], dat[[k]])
-        })
+        }), names(dat))
         individual_data$chrom <- dat$chrom
         individual_data$grange <- dat$grange
       }
@@ -1866,6 +1899,219 @@ load_multitask_regional_data <- function(region, # a string of chr:start-end for
   ))
 }
 
+#' Convert loaded regional data to individual-level inputs
+#'
+#' @param region_data A list returned by \code{load_multitask_regional_data()}.
+#' @return A list containing \code{X}, \code{Y}, \code{maf},
+#'   \code{X_variance}, and source information.
+#' @export
+region_data_to_ind_input <- function(region_data) {
+  first_non_null <- function(...) {
+    values <- list(...)
+    for (value in values) {
+      if (!is.null(value)) return(value)
+    }
+    NULL
+  }
+
+  align_individual_contexts <- function(X, Y) {
+    cbind_y <- function(Y, fallback_names) {
+      keep <- !vapply(Y, is.null, logical(1))
+      if (!any(keep)) return(NULL)
+      Y <- Y[keep]
+      fallback_names <- fallback_names[keep]
+      mats <- Map(function(y, nm) {
+        if (is.null(dim(y))) y <- matrix(y, ncol = 1)
+        if (is.null(colnames(y))) colnames(y) <- nm
+        y
+      }, Y, fallback_names)
+      do.call(cbind, mats)
+    }
+
+    if (!is.list(X) || is.matrix(X) || is.data.frame(X) ||
+        !is.list(Y) || is.matrix(Y) || is.data.frame(Y) ||
+        is.null(names(X)) || is.null(names(Y)) ||
+        length(intersect(names(X), names(Y))) > 0) {
+      return(list(X = X, Y = Y))
+    }
+    x_names <- names(X)
+    y_names <- names(Y)
+    grouped <- list()
+    for (context in x_names) {
+      matched <- y_names[y_names == context | startsWith(y_names, paste0(context, "_"))]
+      if (length(matched) > 0) {
+        y_group <- cbind_y(Y[matched], matched)
+        if (!is.null(y_group)) grouped[[context]] <- y_group
+      }
+    }
+    if (length(grouped) == 0 && length(X) == 1 && length(Y) > 0) {
+      y_group <- cbind_y(Y, y_names)
+      if (!is.null(y_group)) grouped[[x_names[[1]]]] <- y_group
+    }
+    if (length(grouped) == 0) {
+      return(list(X = X, Y = Y))
+    }
+    list(X = X[names(grouped)], Y = grouped)
+  }
+
+  individual_data <- region_data$individual_data
+  if (is.null(individual_data)) {
+    return(list(X = NULL, Y = NULL, maf = NULL, X_variance = NULL,
+                source_info = list(has_individual = FALSE, contexts = character())))
+  }
+
+  X <- first_non_null(individual_data$residual_X, individual_data$X)
+  Y <- first_non_null(individual_data$residual_Y, individual_data$Y)
+  if (is.list(X) && !is.matrix(X) && !is.data.frame(X) &&
+      is.null(names(X)) && !is.null(names(Y)) && length(X) == length(Y)) {
+    names(X) <- names(Y)
+  }
+  if (is.list(Y) && !is.matrix(Y) && !is.data.frame(Y) &&
+      is.null(names(Y)) && !is.null(names(X)) && length(Y) == length(X)) {
+    names(Y) <- names(X)
+  }
+  if (is.matrix(X) && is.list(Y) && !is.null(names(Y))) {
+    X <- stats::setNames(rep(list(X), length(Y)), names(Y))
+  }
+  aligned <- align_individual_contexts(X, Y)
+  X <- aligned$X
+  Y <- aligned$Y
+  maf <- individual_data$maf
+  X_variance <- individual_data$X_variance
+  if (is.list(maf) && is.null(names(maf)) && !is.null(names(X)) && length(maf) == length(X)) {
+    names(maf) <- names(X)
+  }
+  if (is.list(X_variance) && is.null(names(X_variance)) && !is.null(names(X)) &&
+      length(X_variance) == length(X)) {
+    names(X_variance) <- names(X)
+  }
+  contexts <- unique(c(names(X), names(Y)))
+  list(
+    X = X,
+    Y = Y,
+    maf = maf,
+    X_variance = X_variance,
+    source_info = list(has_individual = !is.null(X) && !is.null(Y),
+                       contexts = contexts)
+  )
+}
+
+#' Convert loaded regional data to RSS inputs
+#'
+#' @param region_data A list returned by \code{load_multitask_regional_data()}.
+#' @return A list containing named RSS inputs, matched LD data, and source
+#'   information.
+#' @export
+region_data_to_rss_input <- function(region_data) {
+  make_ld_data_from_matrix <- function(ld, variant_ids = NULL) {
+    is_genotype <- is.matrix(ld) && nrow(ld) != ncol(ld)
+    if (!is.null(variant_ids) && is.matrix(ld)) {
+      if (is.null(colnames(ld)) && length(variant_ids) == ncol(ld)) {
+        colnames(ld) <- variant_ids
+      }
+      if (!is_genotype && is.null(rownames(ld)) && length(variant_ids) == nrow(ld)) {
+        rownames(ld) <- variant_ids
+      }
+    }
+    ids <- if (is.matrix(ld) && !is_genotype) rownames(ld) else colnames(ld)
+    parsed <- NULL
+    if (!is.null(ids) && length(ids) > 0) {
+      parsed <- tryCatch(parse_variant_id(ids), error = function(e) NULL)
+      if (!is.null(parsed)) {
+        ids <- format_variant_id(parsed$chrom, parsed$pos, parsed$A2, parsed$A1)
+        if (!is_genotype && is.matrix(ld)) rownames(ld) <- colnames(ld) <- ids
+        if (is_genotype && is.matrix(ld)) colnames(ld) <- ids
+        parsed$variant_id <- ids
+      }
+    }
+    list(
+      LD_matrix = ld,
+      LD_variants = ids,
+      ref_panel = parsed,
+      block_metadata = if (!is_genotype && !is.null(parsed)) .infer_single_ld_block_metadata(parsed) else NULL,
+      is_genotype = isTRUE(is_genotype)
+    )
+  }
+
+  rss_input_from_qced_sumstat <- function(sumstat_data) {
+    variant_ids_from_rss <- function(rss) {
+      ss <- rss$sumstats
+      if (is.null(ss)) return(character())
+      if ("variant_id" %in% colnames(ss)) return(normalize_variant_id(as.character(ss$variant_id)))
+      if (all(c("chrom", "pos", "A2", "A1") %in% colnames(ss))) {
+        return(format_variant_id(ss$chrom, ss$pos, ss$A2, ss$A1))
+      }
+      character()
+    }
+
+    rss_input <- sumstat_data$sumstats
+    LD_mat <- sumstat_data$LD_mat
+    LD_match <- sumstat_data$LD_match
+    studies <- names(rss_input)
+    LD_data <- list()
+    ld_group <- character()
+    for (i in seq_along(studies)) {
+      study <- studies[[i]]
+      ld_name <- if (!is.null(LD_match) && length(LD_match) >= i) LD_match[[i]] else study
+      if (is.null(ld_name) || is.na(ld_name) || !ld_name %in% names(LD_mat)) {
+        ld_name <- names(LD_mat)[min(i, length(LD_mat))]
+      }
+      ld <- LD_mat[[ld_name]]
+      rss <- rss_input[[study]]
+      variant_ids <- variant_ids_from_rss(rss)
+      LD_data[[study]] <- make_ld_data_from_matrix(ld, variant_ids)
+      ld_group[[study]] <- ld_name
+    }
+    list(
+      rss_input = rss_input,
+      LD_data = LD_data,
+      source_info = list(has_sumstat = length(rss_input) > 0,
+                         studies = names(rss_input),
+                         ld_group = ld_group)
+    )
+  }
+
+  sumstat_data <- region_data$sumstat_data
+  if (is.null(sumstat_data) || is.null(sumstat_data$sumstats)) {
+    return(list(rss_input = list(), LD_data = list(),
+                source_info = list(has_sumstat = FALSE, studies = character(),
+                                   ld_group = character())))
+  }
+  if (!is.null(sumstat_data$LD_mat)) {
+    return(rss_input_from_qced_sumstat(sumstat_data))
+  }
+
+  rss_input <- list()
+  LD_data <- list()
+  ld_group <- character()
+
+  for (i in seq_along(sumstat_data$sumstats)) {
+    studies <- sumstat_data$sumstats[[i]]
+    ld_index <- min(i, length(sumstat_data$LD_info))
+    group_name <- names(sumstat_data$LD_info)[ld_index]
+    if (is.null(group_name) || is.na(group_name) || group_name == "") {
+      group_name <- paste0("LD", ld_index)
+    }
+    for (study in names(studies)) {
+      output_name <- study
+      if (output_name %in% names(rss_input)) {
+        output_name <- make.unique(c(names(rss_input), output_name))[length(rss_input) + 1]
+      }
+      rss_input[[output_name]] <- studies[[study]]
+      LD_data[[output_name]] <- sumstat_data$LD_info[[ld_index]]
+      ld_group[[output_name]] <- group_name
+    }
+  }
+
+  list(
+    rss_input = rss_input,
+    LD_data = LD_data,
+    source_info = list(has_sumstat = length(rss_input) > 0,
+                       studies = names(rss_input),
+                       ld_group = ld_group)
+  )
+}
+
 #' Load and filter tabular data with optional region subsetting
 #'
 #' This function loads summary statistics data from tabular files (TSV, TXT).
@@ -1909,6 +2155,17 @@ load_tsv_region <- function(file_path, region = NULL, extract_region_name = NULL
         if (length(hdr) > 0) {
           last_hdr <- hdr[length(hdr)]
           col_names_vec <- strsplit(sub("^#", "", last_hdr), "\t")[[1]]
+        } else {
+          header_con <- gzfile(file_path, "rt")
+          first_line <- readLines(header_con, n = 1)
+          close(header_con)
+          first_fields <- strsplit(sub("^#", "", first_line), "\t")[[1]]
+          header_tokens <- c("chrom", "chr", "#chrom", "pos", "bp", "snp",
+                             "variant_id", "a1", "a2", "beta", "se", "z",
+                             "p", "pvalue")
+          if (any(tolower(first_fields) %in% header_tokens)) {
+            col_names_vec <- first_fields
+          }
         }
 
         txt <- paste(lines, collapse = "\n")
diff --git a/R/sumstats_qc.R b/R/sumstats_qc.R
index 22e73066..aca54aea 100644
--- a/R/sumstats_qc.R
+++ b/R/sumstats_qc.R
@@ -8,6 +8,10 @@
 #' @param LD_data A list containing combined LD variants data that is generated by load_LD_matrix.
 #' @param skip_region A character vector specifying regions to be skipped in the analysis (optional).
 #'                    Each region should be in the format "chrom:start-end" (e.g., "1:1000000-2000000").
+#' @param return_LD_mat Logical; if \code{FALSE}, return only harmonized
+#'   summary statistics and skip LD-matrix subsetting. This is useful when the
+#'   reference input is genotype-backed \code{X_ref}. Defaults to \code{TRUE}
+#'   for backwards compatibility.
 #'
 #' @return A list containing the processed summary statistics and LD matrix.
 #'   - sumstats: A data frame containing the processed summary statistics.
@@ -18,7 +22,8 @@
 #' @importFrom tidyr separate
 #' @importFrom magrittr %>%
 #' @export
-rss_basic_qc <- function(sumstats, LD_data, skip_region = NULL, keep_indel = TRUE) {
+rss_basic_qc <- function(sumstats, LD_data, skip_region = NULL, keep_indel = TRUE,
+                         return_LD_mat = TRUE) {
   # Check if required columns are present in sumstats
   required_cols <- c("chrom", "pos", "A1", "A2")
   missing_cols <- setdiff(required_cols, colnames(sumstats))
@@ -63,6 +68,10 @@ rss_basic_qc <- function(sumstats, LD_data, skip_region = NULL, keep_indel = TRU
 
   sumstats_processed <- allele_flip$target_data_qced %>% arrange(pos)
 
+  if (!isTRUE(return_LD_mat)) {
+    return(list(sumstats = sumstats_processed, LD_mat = NULL))
+  }
+
   # Align and subset LD by mapping core IDs (strip trailing build suffix) to exact LD IDs
   ld_mat <- LD_data$LD_matrix
   ld_ids <- tryCatch(rownames(ld_mat), error = function(e) NULL)
@@ -138,41 +147,425 @@ ld_mismatch_qc <- function(zScore, R = NULL, X = NULL, nSample = NULL,
   }
 }
 
+.resolve_summary_qc_method <- function(qc_method) {
+  if (is.null(qc_method)) return("none")
+  qc_method <- match.arg(qc_method, c("none", "rss_qc", "slalom", "dentist"))
+  if (identical(qc_method, "rss_qc")) "none" else qc_method
+}
+
 #' Perform Quality Control on Summary Statistics
 #'
-#' This function performs quality control on the processed summary statistics using the specified method.
-#' It wraps \code{\link{ld_mismatch_qc}} and handles subsetting of the summary statistics and LD matrix.
+#' This function performs quality control on the processed summary statistics
+#' using the specified method. It wraps \code{\link{ld_mismatch_qc}} and handles
+#' subsetting of the summary statistics and LD matrix.
 #'
 #' @param sumstats A data frame containing the processed summary statistics.
-#' @param LD_data A list containing the combined LD variants data generated by load_LD_matrix.
+#' @param LD_data A list containing the combined LD variants data generated by
+#'   \code{load_LD_matrix()}.
 #' @param n Sample size for the LD reference panel (used by dentist method).
-#' @param method The quality control method to use. Options are "slalom" or "dentist" (default: "slalom").
+#' @param method The quality control method to use. Options are \code{"slalom"}
+#'   or \code{"dentist"} (default: \code{"slalom"}).
+#' @param rss_input Optional loaded RSS input, either one \code{load_rss_data()}
+#'   result or a named list of them. Supplying this selects the additional
+#'   combined RSS/ColocBoost QC workflow. A single RSS record is detected by
+#'   structure, not by the name \code{sumstats}, so a multi-study list may safely
+#'   include a study named \code{"sumstats"}.
+#' @param keep_indel,skip_region,pip_cutoff_to_skip,qc_method,impute,impute_opts,study,var_y,return_on_skip,R_finite,R_mismatch
+#'   Additional controls for the combined RSS/ColocBoost QC workflow. They are
+#'   ignored by the historical LD-mismatch-only call unless \code{rss_input} or
+#'   combined-QC options are supplied.
 #'
-#' @return A list containing the quality-controlled summary statistics and updated LD matrix.
-#'   - sumstats: The quality-controlled summary statistics data frame.
-#'   - LD_mat: The updated LD matrix after quality control.
-#'   - outlier_number: The number of outlier variants removed.
+#' @return A list containing the quality-controlled summary statistics and
+#'   updated LD matrix for the historical call:
+#'   \itemize{
+#'     \item sumstats: The quality-controlled summary statistics data frame.
+#'     \item LD_mat: The updated LD matrix after quality control.
+#'     \item outlier_number: The number of outlier variants removed.
+#'   }
+#'   When \code{rss_input} or combined-QC controls are supplied, returns a
+#'   cleaned RSS/LD record for one RSS record, or a named list of records for a
+#'   list of RSS records.
 #'
-#' @details This function applies the specified quality control method to the processed summary statistics
-#'   via \code{\link{ld_mismatch_qc}}, then subsets the summary statistics and LD matrix to keep
-#'   only non-outlier variants.
+#' @details This function applies the specified quality control method to the
+#'   processed summary statistics via \code{\link{ld_mismatch_qc}}, then subsets
+#'   the summary statistics and LD matrix to keep only non-outlier variants.
+#'
+#'   As an additional workflow for ColocBoost/RSS pipelines, callers may supply
+#'   \code{rss_input} or combined-QC controls. That path first runs
+#'   \code{rss_basic_qc()}, optional PIP screening, optional LD-mismatch QC
+#'   through this same function, and optional RAISS imputation. The combined
+#'   path normalizes one or many RSS records to the same loop internally; only
+#'   true single-record input is unwrapped on return.
+#'   \code{qc_method = NULL}, \code{"none"}, and \code{"rss_qc"} all mean
+#'   basic-only summary-stat preprocessing without SLALOM/DENTIST outlier QC.
+#'
+#'   When the combined path receives genotype-backed reference data
+#'   (\code{X_ref}), basic harmonization avoids constructing an LD matrix. PIP
+#'   screening uses an LD-independent single-effect screen with
+#'   \code{susie_rss(R = diag(p), L = 1, max_iter = 1)}, and LD-mismatch QC
+#'   computes only the filtered local correlation matrix required by
+#'   SLALOM/DENTIST. RAISS imputation temporarily centers/scales
+#'   genotype-backed \code{X_ref} before using the whole-region genotype/SVD
+#'   path. This avoids LD-block partition artifacts while matching the LD scale
+#'   used by \code{compute_LD(X_ref)}. Final ColocBoost calls still keep the
+#'   original \code{X_ref} as the reference input.
 #'
 #' @examples
 #' # Perform SLALOM quality control (default)
 #' qc_results <- summary_stats_qc(sumstats, LD_data, method = "slalom")
 #'
+#' # Additional combined basic-only RSS QC.
+#' qc_results <- summary_stats_qc(rss_input = rss_input, LD_data = LD_data,
+#'                                qc_method = "none")
+#'
 #' @importFrom dplyr mutate row_number filter pull
 #' @export
-summary_stats_qc <- function(sumstats, LD_data, n = NULL, method = c("slalom", "dentist")) {
-  method <- match.arg(method)
-  LD_extract <- LD_data$LD_matrix[sumstats$variant_id, sumstats$variant_id, drop = FALSE]
+summary_stats_qc <- function(sumstats, LD_data, n = NULL,
+                             method = c("slalom", "dentist"),
+                             rss_input = NULL,
+                             keep_indel = TRUE,
+                             skip_region = NULL,
+                             pip_cutoff_to_skip = 0,
+                             qc_method = NULL,
+                             impute = FALSE,
+                             impute_opts = list(rcond = 0.01, R2_threshold = 0.6,
+                                                minimum_ld = 5, lamb = 0.01),
+                             study = NULL,
+                             var_y = NULL,
+                             return_on_skip = c("null", "preprocess"),
+                             R_finite = NULL,
+                             R_mismatch = NULL) {
+  if (missing(sumstats)) sumstats <- NULL
+  return_on_skip <- match.arg(return_on_skip)
+  if (is.null(rss_input) && is.data.frame(sumstats) && is.null(qc_method) &&
+      isFALSE(impute) && identical(pip_cutoff_to_skip, 0) && is.null(skip_region)) {
+    method <- match.arg(method)
+    LD_extract <- LD_data$LD_matrix[sumstats$variant_id, sumstats$variant_id, drop = FALSE]
+    qc_results <- ld_mismatch_qc(zScore = sumstats$z, R = LD_extract,
+                                 nSample = n, method = method)
+    keep_index <- which(!qc_results$outlier)
+    sumstats_qc <- sumstats[keep_index, , drop = FALSE]
+    LD_mat_qc <- LD_extract[sumstats_qc$variant_id, sumstats_qc$variant_id, drop = FALSE]
+    outlier_number <- nrow(sumstats) - nrow(sumstats_qc)
+    return(list(sumstats = sumstats_qc, LD_mat = LD_mat_qc, outlier_number = outlier_number))
+  }
+
+  qc_method <- .resolve_summary_qc_method(qc_method)
+
+  if (is.null(rss_input)) {
+    if (is.null(sumstats)) stop("summary_stats_qc requires sumstats or rss_input.")
+    rss_input <- if (is.data.frame(sumstats)) {
+      list(sumstats = sumstats, n = n, var_y = if (is.null(var_y)) 1 else var_y)
+    } else {
+      sumstats
+    }
+  }
+
+  is_rss_record <- function(x) {
+    is.list(x) && is.data.frame(x$sumstats)
+  }
+  is_ld_record <- function(x) {
+    is.matrix(x) || (is.list(x) && !is.null(x$LD_matrix))
+  }
+  first_ld_record <- function(x, study_name = NULL) {
+    if (is_ld_record(x)) return(x)
+    if (!is.null(study_name) && study_name %in% names(x)) return(x[[study_name]])
+    if (is.list(x) && length(x) == 1) return(x[[1]])
+    x
+  }
+
+  single_rss_input <- is_rss_record(rss_input)
+  if (single_rss_input) {
+    studies <- if (is.null(study)) "input" else study
+    rss_records <- stats::setNames(list(rss_input), studies)
+    LD_records <- stats::setNames(list(first_ld_record(LD_data, studies)), studies)
+  } else {
+    if (!is.list(rss_input) || length(rss_input) == 0) {
+      stop("No summary-statistic studies supplied for QC.")
+    }
+    rss_records <- rss_input
+    studies <- names(rss_records)
+    if (is.null(studies)) studies <- rep("", length(rss_records))
+    empty_names <- is.na(studies) | studies == ""
+    if (any(empty_names)) {
+      studies[empty_names] <- paste0("study", which(empty_names))
+      names(rss_records) <- studies
+    }
+    if (!all(vapply(rss_records, is_rss_record, logical(1)))) {
+      stop("rss_input must be one RSS record or a list of RSS records with a sumstats data frame.")
+    }
+
+    if (is_ld_record(LD_data)) {
+      LD_records <- stats::setNames(rep(list(LD_data), length(studies)), studies)
+    } else if (is.list(LD_data) && length(LD_data) == 1) {
+      LD_records <- stats::setNames(rep(list(LD_data[[1]]), length(studies)), studies)
+    } else if (is.list(LD_data) && all(studies %in% names(LD_data))) {
+      LD_records <- LD_data[studies]
+    } else if (is.list(LD_data) && length(LD_data) == length(studies)) {
+      LD_records <- LD_data
+      names(LD_records) <- studies
+    } else {
+      stop("LD_data must be one LD object or a list matching rss_input studies.")
+    }
+  }
+
+  if (length(pip_cutoff_to_skip) == 1 && is.null(names(pip_cutoff_to_skip))) {
+    cutoffs <- stats::setNames(rep(pip_cutoff_to_skip, length(studies)), studies)
+  } else if (!is.null(names(pip_cutoff_to_skip))) {
+    missing <- setdiff(studies, names(pip_cutoff_to_skip))
+    if (length(missing) > 0) {
+      pip_cutoff_to_skip <- c(pip_cutoff_to_skip, stats::setNames(rep(0, length(missing)), missing))
+    }
+    cutoffs <- pip_cutoff_to_skip[studies]
+  } else if (length(pip_cutoff_to_skip) == length(studies)) {
+    cutoffs <- stats::setNames(pip_cutoff_to_skip, studies)
+  } else {
+    stop("pip_cutoff_to_skip must be a scalar, named vector, or match the number of studies.")
+  }
+
+  out <- list()
+  for (study_name in studies) {
+    result <- .summary_stats_qc_single_study(
+      rss_records[[study_name]], LD_records[[study_name]],
+      keep_indel = keep_indel,
+      skip_region = skip_region,
+      pip_cutoff_to_skip = cutoffs[[study_name]],
+      qc_method = qc_method,
+      impute = impute,
+      impute_opts = impute_opts,
+      study = study_name,
+      return_on_skip = return_on_skip,
+      R_finite = R_finite,
+      R_mismatch = R_mismatch
+    )
+    if (!is.null(result)) out[[study_name]] <- result
+  }
+  if (single_rss_input) out[[studies]] else out
+}
+
+.infer_single_ld_block_metadata <- function(ref_panel) {
+  n_variants <- nrow(ref_panel)
+  data.frame(
+    block_id = 1L,
+    chrom = as.character(ref_panel$chrom[1]),
+    block_start = suppressWarnings(min(ref_panel$pos, na.rm = TRUE)),
+    block_end = suppressWarnings(max(ref_panel$pos, na.rm = TRUE)),
+    size = n_variants,
+    start_idx = 1L,
+    end_idx = n_variants,
+    stringsAsFactors = FALSE
+  )
+}
+
+.normalize_ld_data_for_qc <- function(LD_data) {
+  if (is.null(LD_data)) return(NULL)
+  if (is.matrix(LD_data)) {
+    ids <- if (nrow(LD_data) == ncol(LD_data)) rownames(LD_data) else colnames(LD_data)
+    LD_data <- list(LD_matrix = LD_data, LD_variants = ids)
+  }
+  if (is.data.frame(LD_data$LD_variants)) {
+    variant_ids <- if ("variant_id" %in% colnames(LD_data$LD_variants)) {
+      as.character(LD_data$LD_variants$variant_id)
+    } else {
+      format_variant_id(LD_data$LD_variants$chrom, LD_data$LD_variants$pos,
+                        LD_data$LD_variants$A2, LD_data$LD_variants$A1)
+    }
+    LD_data$LD_variants <- variant_ids
+  }
+  if (is.data.frame(LD_data$ref_panel) && !"variant_id" %in% colnames(LD_data$ref_panel)) {
+    LD_data$ref_panel$variant_id <- format_variant_id(
+      LD_data$ref_panel$chrom, LD_data$ref_panel$pos,
+      LD_data$ref_panel$A2, LD_data$ref_panel$A1
+    )
+  }
+  if (is.null(LD_data$LD_variants) && is.data.frame(LD_data$ref_panel) &&
+      "variant_id" %in% colnames(LD_data$ref_panel)) {
+    LD_data$LD_variants <- as.character(LD_data$ref_panel$variant_id)
+  }
+  if (is.null(LD_data$ref_panel) && !is.null(LD_data$LD_variants)) {
+    parsed <- tryCatch(parse_variant_id(LD_data$LD_variants), error = function(e) NULL)
+    if (!is.null(parsed)) {
+      LD_data$ref_panel <- parsed
+      LD_data$ref_panel$variant_id <- LD_data$LD_variants
+    }
+  }
+  if (!is.null(LD_data$LD_variants) && is.matrix(LD_data$LD_matrix)) {
+    if (nrow(LD_data$LD_matrix) == ncol(LD_data$LD_matrix) &&
+        length(LD_data$LD_variants) == nrow(LD_data$LD_matrix)) {
+      rownames(LD_data$LD_matrix) <- colnames(LD_data$LD_matrix) <- LD_data$LD_variants
+    } else if (length(LD_data$LD_variants) == ncol(LD_data$LD_matrix)) {
+      colnames(LD_data$LD_matrix) <- LD_data$LD_variants
+    }
+  }
+  LD_data
+}
 
-  qc_results <- ld_mismatch_qc(zScore = sumstats$z, R = LD_extract, nSample = n,
-                                method = method)
-  keep_index <- which(!qc_results$outlier)
-  sumstats_qc <- sumstats[keep_index, , drop = FALSE]
-  LD_mat_qc <- LD_extract[sumstats_qc$variant_id, sumstats_qc$variant_id, drop = FALSE]
-  outlier_number <- nrow(sumstats) - nrow(sumstats_qc)
+.summary_stats_qc_single_study <- function(rss_input, LD_data, keep_indel, skip_region,
+                                           pip_cutoff_to_skip, qc_method, impute,
+                                           impute_opts, study, return_on_skip,
+                                           R_finite = NULL, R_mismatch = NULL) {
+  skipped_result <- function(sumstats, LD_mat, reason) {
+    if (!identical(return_on_skip, "preprocess")) return(NULL)
+    list(
+      rss_input = list(sumstats = sumstats, n = rss_input$n, var_y = rss_input$var_y),
+      LD_matrix = LD_mat,
+      preprocess = list(sumstats = sumstats, LD_mat = LD_mat),
+      outlier_number = 0L,
+      skipped = TRUE,
+      skip_reason = reason
+    )
+  }
 
-  return(list(sumstats = sumstats_qc, LD_mat = LD_mat_qc, outlier_number = outlier_number))
+  if (is.null(rss_input) || is.null(LD_data)) return(NULL)
+  message("QC track: starting basic allele harmonization for summary-stat study ", study, ".")
+  message("QC track: basic summary-stat QC requires sumstat$variant and LD_data$LD_variants ",
+          "or LD/X_ref variant names for study ", study, ".")
+  LD_data_for_qc <- .normalize_ld_data_for_qc(LD_data)
+  has_genotype <- isTRUE(LD_data_for_qc$is_genotype) ||
+    (is.matrix(LD_data_for_qc$LD_matrix) &&
+       nrow(LD_data_for_qc$LD_matrix) != ncol(LD_data_for_qc$LD_matrix))
+  if (has_genotype && is.data.frame(LD_data_for_qc$ref_panel) &&
+      all(c("chrom", "pos", "A2", "A1") %in% colnames(LD_data_for_qc$ref_panel))) {
+    canonical_ids <- format_variant_id(LD_data_for_qc$ref_panel$chrom,
+                                       LD_data_for_qc$ref_panel$pos,
+                                       LD_data_for_qc$ref_panel$A2,
+                                       LD_data_for_qc$ref_panel$A1)
+    LD_data_for_qc$ref_panel$variant_id <- canonical_ids
+    LD_data_for_qc$LD_variants <- canonical_ids
+    if (is.matrix(LD_data_for_qc$LD_matrix) &&
+        length(canonical_ids) == ncol(LD_data_for_qc$LD_matrix)) {
+      colnames(LD_data_for_qc$LD_matrix) <- canonical_ids
+    }
+  }
+  X_ref <- if (has_genotype) LD_data_for_qc$LD_matrix else NULL
+  basic <- rss_basic_qc(rss_input$sumstats, LD_data_for_qc,
+                        skip_region = skip_region, keep_indel = keep_indel,
+                        return_LD_mat = !has_genotype)
+  sumstats <- basic$sumstats
+  R_mat <- basic$LD_mat
+  n <- rss_input$n
+  var_y <- rss_input$var_y
+  reference_for_variants <- function(variants) {
+    if (has_genotype) {
+      ref_names <- colnames(X_ref)
+      ref_idx <- match(variants, ref_names)
+      if (anyNA(ref_idx)) {
+        ref_idx <- match(strip_chr_prefix(strip_build_suffix(variants)),
+                         strip_chr_prefix(strip_build_suffix(ref_names)))
+      }
+      if (anyNA(ref_idx)) {
+        missing_variants <- variants[is.na(ref_idx)]
+        stop("Genotype reference is missing ", length(missing_variants),
+             " post-QC variant(s): ", paste(utils::head(missing_variants, 3), collapse = ", "))
+      }
+      X_subset <- X_ref[, ref_idx, drop = FALSE]
+      colnames(X_subset) <- variants
+      return(X_subset)
+    }
+    if (is.null(R_mat)) return(NULL)
+    R_mat[variants, variants, drop = FALSE]
+  }
+  ld_data_with_local_R <- function(current_sumstats) {
+    if (!has_genotype) return(LD_data_for_qc)
+    variants <- current_sumstats$variant_id
+    X_local <- reference_for_variants(variants)
+    R_local <- compute_LD(X_local, method = "sample")
+    rownames(R_local) <- colnames(R_local) <- variants
+    ref_panel <- LD_data_for_qc$ref_panel
+    if (is.data.frame(ref_panel) && "variant_id" %in% colnames(ref_panel)) {
+      ref_idx <- match(variants, ref_panel$variant_id)
+      if (anyNA(ref_idx)) {
+        ref_panel <- parse_variant_id(variants)
+        ref_panel$variant_id <- variants
+      } else {
+        ref_panel <- ref_panel[ref_idx, , drop = FALSE]
+      }
+    } else {
+      ref_panel <- parse_variant_id(variants)
+      ref_panel$variant_id <- variants
+    }
+    ld_data <- LD_data_for_qc
+    ld_data$LD_matrix <- R_local
+    ld_data$LD_variants <- variants
+    ld_data$ref_panel <- ref_panel
+    ld_data$block_metadata <- .infer_single_ld_block_metadata(ref_panel)
+    ld_data$is_genotype <- FALSE
+    R_mat <<- R_local
+    ld_data
+  }
+  message("QC track: basic harmonization retained ", nrow(sumstats),
+          " variants for summary-stat study ", study, ".")
+  if (nrow(sumstats) < 2) {
+    return(skipped_result(sumstats, reference_for_variants(sumstats$variant_id),
+                          "fewer than two variants remain after basic QC"))
+  }
+
+  preprocess <- list(sumstats = sumstats, LD_mat = R_mat)
+  if (!is.null(pip_cutoff_to_skip) && pip_cutoff_to_skip != 0) {
+    cutoff <- pip_cutoff_to_skip
+    if (cutoff < 0) cutoff <- 3 / nrow(sumstats)
+    message("QC track: running LD-independent single-effect initial screen for summary-stat study ",
+            study, ".")
+    pip_args <- list(z = sumstats$z, R = diag(nrow(sumstats)),
+                     L = 1, max_iter = 1, n = n)
+    pip <- do.call(susie_rss, pip_args)$pip
+    if (!any(pip > cutoff)) {
+      message("Skipping follow-up analysis: No signals above PIP threshold ", cutoff)
+      message("QC track: skipping summary-stat study ", study,
+              ". No signals above PIP threshold ", cutoff, ".")
+      return(skipped_result(sumstats, reference_for_variants(sumstats$variant_id),
+                            "no signals above PIP threshold"))
+    }
+    message("Follow-up on region: signals above PIP threshold ", cutoff, " detected.")
+  }
+
+  outlier_number <- 0L
+  if (!is.null(qc_method) && !qc_method %in% c("none", "rss_qc")) {
+    message("QC track: running ", qc_method, " LD-mismatch QC for summary-stat study ", study, ".")
+    qc <- summary_stats_qc(sumstats = sumstats, LD_data = ld_data_with_local_R(sumstats),
+                            n = n, method = qc_method)
+    sumstats <- qc$sumstats
+    R_mat <- qc$LD_mat
+    outlier_number <- qc$outlier_number
+    message("QC track: removed ", outlier_number,
+            " LD-mismatch outlier(s) for summary-stat study ", study, ".")
+  }
+  if (isTRUE(impute)) {
+    if (is.null(LD_data_for_qc$ref_panel)) {
+      warning("Skipping imputation for summary-stat study ", study,
+              ": LD_data does not include ref_panel.")
+    } else {
+      message("QC track: running imputation for summary-stat study ", study, ".")
+      imputed <- if (has_genotype) {
+        X_ref_scaled <- scale(X_ref)
+        X_ref_scaled[is.na(X_ref_scaled)] <- 0
+        colnames(X_ref_scaled) <- colnames(X_ref)
+        raiss(LD_data_for_qc$ref_panel, sumstats,
+              genotype_matrix = X_ref_scaled,
+              svd_tol = if (is.null(impute_opts$svd_tol)) 1e-12 else impute_opts$svd_tol,
+              R2_threshold = impute_opts$R2_threshold,
+              minimum_ld = impute_opts$minimum_ld,
+              lamb = impute_opts$lamb)
+      } else {
+        if (is.null(LD_data_for_qc$block_metadata)) {
+          LD_data_for_qc$block_metadata <- .infer_single_ld_block_metadata(LD_data_for_qc$ref_panel)
+        }
+        raiss(LD_data_for_qc$ref_panel, sumstats, partition_LD_matrix(LD_data_for_qc),
+              rcond = impute_opts$rcond,
+              R2_threshold = impute_opts$R2_threshold,
+              minimum_ld = impute_opts$minimum_ld,
+              lamb = impute_opts$lamb)
+      }
+      sumstats <- imputed$result_filter
+      if (!is.null(imputed$LD_mat)) R_mat <- imputed$LD_mat
+    }
+  }
+  final_vars <- sumstats$variant_id
+  list(
+    rss_input = list(sumstats = sumstats, n = n, var_y = var_y),
+    LD_matrix = if (has_genotype) reference_for_variants(final_vars) else R_mat,
+    preprocess = preprocess,
+    outlier_number = outlier_number,
+    skipped = FALSE
+  )
 }
diff --git a/R/univariate_pipeline.R b/R/univariate_pipeline.R
index 87391d3b..f92b70ff 100644
--- a/R/univariate_pipeline.R
+++ b/R/univariate_pipeline.R
@@ -193,7 +193,8 @@ load_study_LD <- function(ld_path, region) {
 #' @param LD_data A list from load_LD_matrix containing LD_matrix, LD_variants,
 #'   ref_panel, block_metadata, and is_genotype flag. When is_genotype=TRUE
 #'   (from return_genotype=TRUE), LD_matrix contains genotype X and susie_rss
-#'   uses the z+X interface. R is computed internally for QC/imputation.
+#'   uses the z+X interface. Local R is computed only for QC stages that
+#'   require a correlation matrix.
 #' @param n_sample Sample size. If 0, retrieved from the sumstat file.
 #' @param n_case Number of cases (for case-control studies).
 #' @param n_control Number of controls (for case-control studies).
@@ -201,7 +202,9 @@ load_study_LD <- function(ld_path, region) {
 #' @param skip_region Character vector of regions to skip (format "chrom:start-end").
 #' @param extract_region_name Gene/phenotype name to subset.
 #' @param region_name_col Column to filter for extract_region_name.
-#' @param qc_method QC method: "slalom" or "dentist".
+#' @param qc_method Summary-statistic QC method. \code{"slalom"} and
+#'   \code{"dentist"} run basic allele harmonization plus LD-mismatch QC;
+#'   \code{"rss_qc"} and \code{"none"} run basic allele harmonization only.
 #' @param finemapping_method One of "susie_rss", "single_effect", "bayesian_conditional_regression".
 #' @param finemapping_opts List of fine-mapping options (L, L_greedy, coverage,
 #'   signal_cutoff, min_abs_corr).
@@ -224,7 +227,7 @@ rss_analysis_pipeline <- function(
     sumstat_path, column_file_path, LD_data,
     n_sample = 0, n_case = 0, n_control = 0, region = NULL, skip_region = NULL,
     extract_region_name = NULL, region_name_col = NULL,
-    qc_method = c("slalom", "dentist"),
+    qc_method = c("slalom", "dentist", "rss_qc", "none"),
     finemapping_method = c("susie_rss", "single_effect", "bayesian_conditional_regression"),
     finemapping_opts = list(
       L = 20, L_greedy = 5,
@@ -240,10 +243,15 @@ rss_analysis_pipeline <- function(
   use_X <- isTRUE(LD_data$is_genotype) || is_X_list
   if (use_X) {
     X_data <- LD_data$LD_matrix
-    # Compute R from first panel (or single panel) for QC/imputation
-    X_for_R <- if (is_X_list) X_data[[1]] else X_data
-    LD_data$LD_matrix <- compute_LD(X_for_R, method = "sample")
-    LD_data$is_genotype <- FALSE
+    LD_data$is_genotype <- TRUE
+  }
+  subset_X_data <- function(variants) {
+    if (!use_X) return(NULL)
+    if (is_X_list) {
+      lapply(X_data, function(Xk) Xk[, variants, drop = FALSE])
+    } else {
+      X_data[, variants, drop = FALSE]
+    }
   }
   res <- list()
   rss_input <- load_rss_data(
@@ -261,44 +269,52 @@ rss_analysis_pipeline <- function(
     return(list(rss_data_analyzed = sumstats))
   }
 
-  # Preprocess: QC and imputation require LD_data with correlation matrix R.
-  # When using X path, compute R from X for QC/imputation, then pass X to susie_rss.
-  preprocess_results <- rss_basic_qc(sumstats, LD_data, skip_region = skip_region, keep_indel = keep_indel)
-  sumstats <- preprocess_results$sumstats
-  LD_mat <- preprocess_results$LD_mat
+  qc_method_arg <- if (is.null(qc_method)) NULL else match.arg(qc_method)
+  qc_method <- qc_method_arg
+  qc_record <- summary_stats_qc(
+    rss_input = rss_input,
+    LD_data = LD_data,
+    keep_indel = keep_indel,
+    skip_region = skip_region,
+    pip_cutoff_to_skip = pip_cutoff_to_skip,
+    qc_method = if (is.null(qc_method_arg)) "none" else qc_method_arg,
+    impute = impute,
+    impute_opts = impute_opts,
+    return_on_skip = "preprocess",
+    R_finite = R_finite,
+    R_mismatch = R_mismatch
+  )
+  if (!is.null(qc_record$rss_input)) {
+    preprocess_results <- qc_record$preprocess
+    sumstats <- qc_record$rss_input$sumstats
+    LD_mat <- qc_record$LD_matrix
+    qc_results <- list(outlier_number = qc_record$outlier_number)
+  } else {
+    # Compatibility for tests or callers that mock the historical
+    # LD-mismatch-only summary_stats_qc() return shape.
+    preprocess_results <- list(sumstats = qc_record$sumstats, LD_mat = qc_record$LD_mat)
+    sumstats <- qc_record$sumstats
+    LD_mat <- qc_record$LD_mat
+    qc_results <- qc_record
+    if (isTRUE(impute)) {
+      LD_matrix <- partition_LD_matrix(LD_data)
+      impute_results <- raiss(LD_data$ref_panel, sumstats, LD_matrix,
+                              rcond = impute_opts$rcond,
+                              R2_threshold = impute_opts$R2_threshold,
+                              minimum_ld = impute_opts$minimum_ld,
+                              lamb = impute_opts$lamb)
+      sumstats <- impute_results$result_filter
+      LD_mat <- impute_results$LD_mat
+    }
+    qc_record$skipped <- FALSE
+  }
 
   if (nrow(sumstats) == 0) {
     message("No variants left after preprocessing. Returning empty results.")
     return(list(rss_data_analyzed = sumstats))
   }
-
-  # PIP screening (always uses R)
-  if (pip_cutoff_to_skip != 0) {
-    if (pip_cutoff_to_skip < 0) pip_cutoff_to_skip <- 3 / nrow(sumstats)
-    top_model_pip <- susie_rss(z = sumstats$z, R = LD_mat, L = 1, L_greedy = NULL, max_iter = 1,
-      n = n, var_y = var_y, R_finite = R_finite, R_mismatch = R_mismatch)$pip
-    if (!any(top_model_pip > pip_cutoff_to_skip)) {
-      message("Skipping follow-up analysis: No signals above PIP threshold ", pip_cutoff_to_skip)
-      return(list(rss_data_analyzed = sumstats))
-    }
-    message("Follow-up on region: signals above PIP threshold ", pip_cutoff_to_skip, " detected.")
-  }
-
-  # Quality control (always uses R)
-  if (!is.null(qc_method)) {
-    qc_results <- summary_stats_qc(sumstats, LD_data, n = n, method = qc_method)
-    sumstats <- qc_results$sumstats
-    LD_mat <- qc_results$LD_mat
-  }
-
-  # Imputation (always uses R)
-  if (impute) {
-    LD_matrix <- partition_LD_matrix(LD_data)
-    impute_results <- raiss(LD_data$ref_panel, sumstats, LD_matrix,
-                            rcond = impute_opts$rcond, R2_threshold = impute_opts$R2_threshold,
-                            minimum_ld = impute_opts$minimum_ld, lamb = impute_opts$lamb)
-    sumstats <- impute_results$result_filter
-    LD_mat <- impute_results$LD_mat
+  if (isTRUE(qc_record$skipped)) {
+    return(list(rss_data_analyzed = sumstats))
   }
 
   # Fine-mapping: use X_mat if available, otherwise R
@@ -306,17 +322,7 @@ rss_analysis_pipeline <- function(
     pri_coverage <- finemapping_opts$coverage[1]
     sec_coverage <- if (length(finemapping_opts$coverage) > 1) finemapping_opts$coverage[-1] else NULL
 
-    # When using X path, subset X to QCed/imputed variants.
-    # For mixture panels (list), subset each panel; susie_rss accepts X=list().
-    if (use_X) {
-      if (is_X_list) {
-        X_mat_sub <- lapply(X_data, function(Xk) Xk[, sumstats$variant_id, drop = FALSE])
-      } else {
-        X_mat_sub <- X_data[, sumstats$variant_id, drop = FALSE]
-      }
-    } else {
-      X_mat_sub <- NULL
-    }
+    X_mat_sub <- subset_X_data(sumstats$variant_id)
 
     res <- susie_rss_pipeline(sumstats,
       LD_mat = if (use_X) NULL else LD_mat,
@@ -347,7 +353,9 @@ rss_analysis_pipeline <- function(
   }
 
   .run_reanalysis <- function(sumstats, LD_mat, method, finemapping_opts, pri_coverage, sec_coverage) {
-    susie_rss_pipeline(sumstats, LD_mat,
+    susie_rss_pipeline(sumstats,
+      LD_mat = if (use_X) NULL else LD_mat,
+      X_mat = subset_X_data(sumstats$variant_id),
       n = n, var_y = var_y,
       L = finemapping_opts$L, L_greedy = finemapping_opts$L_greedy,
       analysis_method = method,
diff --git a/man/colocboost_analysis.Rd b/man/colocboost_analysis.Rd
new file mode 100644
index 00000000..e7b23584
--- /dev/null
+++ b/man/colocboost_analysis.Rd
@@ -0,0 +1,114 @@
+% Generated by roxygen2: do not edit by hand
+\name{colocboost_analysis}
+\alias{colocboost_analysis}
+\title{ColocBoost analysis with optional pipeline QC}
+\usage{
+colocboost_analysis(
+  ...,
+  missing_rate_thresh = NULL,
+  maf_cutoff = NULL,
+  xvar_cutoff = NULL,
+  ld_reference_meta_file = NULL,
+  pip_cutoff_to_skip_ind = NULL,
+  keep_indel = TRUE,
+  pip_cutoff_to_skip_sumstat = NULL,
+  qc_method = NULL,
+  impute = FALSE,
+  impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01),
+  LD_reference_info = NULL,
+  variant_convention = c("A2_A1", "A1_A2")
+)
+}
+\arguments{
+\item{...}{Arguments passed to \code{colocboost::colocboost()}, including
+data inputs such as \code{X}, \code{Y}, \code{sumstat}, \code{LD},
+\code{X_ref}, \code{dict_YX}, \code{dict_sumstatLD},
+\code{outcome_names}, and all ColocBoost model/post-processing options.
+QC can only inspect inputs that are supplied by name.}
+
+\item{missing_rate_thresh, maf_cutoff, xvar_cutoff, ld_reference_meta_file, pip_cutoff_to_skip_ind}{Individual-level QC controls.
+If all are \code{NULL}, individual-level QC is not run.}
+
+\item{keep_indel, pip_cutoff_to_skip_sumstat, qc_method, impute, impute_opts}{Summary-statistic QC controls.
+\code{qc_method = "none"} and \code{qc_method = "rss_qc"} run basic allele
+harmonization without LD-mismatch outlier detection. Imputation is only run
+when \code{impute = TRUE}.}
+
+\item{LD_reference_info}{Optional LD reference information for
+summary-statistic QC. This is only needed when the native \code{LD} matrix
+row/column names or \code{X_ref} column names are missing or are not parseable
+genomic variant IDs. It can be a .bim/.pvar/.pvar.zst file path, a data.frame
+with variant metadata, or a \code{load_LD_matrix()} result. This is a QC-only
+argument and is not passed to \code{colocboost::colocboost()}.}
+
+\item{variant_convention}{Allele order used by native ColocBoost-style
+\code{sumstat$variant} and LD/X_ref names when deriving QC inputs:
+\code{"A2_A1"} for pecotmr canonical \code{chr:pos:A2:A1}, or
+\code{"A1_A2"} for \code{chr:pos:A1:A2}.}
+}
+\value{
+The object returned by \code{colocboost::colocboost()}.
+}
+\description{
+This wrapper keeps the direct \code{colocboost()} argument surface. All
+ColocBoost inputs and model parameters are supplied through \code{...}. When
+no QC options are requested, the call is passed directly to
+\code{colocboost::colocboost()}. When QC options are requested, the wrapper
+inspects named \code{X}/\code{Y} and/or \code{sumstat}/\code{LD}/\code{X_ref}
+arguments in \code{...}, runs the relevant reusable QC step, and then calls
+ColocBoost on the cleaned inputs. If the required named inputs are not
+available, QC is skipped with a warning and the original ColocBoost call is
+used.
+}
+\details{
+Use \code{colocboost_analysis()} the same way you would use
+\code{colocboost::colocboost()}: pass the native ColocBoost arguments by
+name, for example \code{X}, \code{Y}, \code{sumstat}, \code{LD},
+\code{X_ref}, \code{dict_YX}, \code{dict_sumstatLD},
+\code{outcome_names}, \code{focal_outcome_idx}, \code{effect_est},
+\code{effect_se}, \code{effect_n}, \code{M}, and other ColocBoost model or
+post-processing options. These arguments are forwarded unchanged unless one
+or more QC controls are requested.
+
+Individual-level QC is only attempted when at least one individual QC control
+is non-\code{NULL} and named \code{X} and \code{Y} inputs are available in
+\code{...}. Summary-statistic QC is only attempted when \code{qc_method},
+\code{pip_cutoff_to_skip_sumstat}, \code{impute = TRUE}, or
+\code{LD_reference_info} is supplied and named \code{sumstat} plus either
+\code{LD}, \code{X_ref}, or \code{LD_reference_info} are available.
+\code{qc_method = "none"} and \code{qc_method = "rss_qc"} mean run basic
+allele/variant harmonization only; they do not run SLALOM/DENTIST
+LD-mismatch QC. RAISS imputation is controlled separately by
+\code{impute = TRUE}.
+
+If no QC controls are supplied, this function is a thin direct call to
+\code{colocboost::colocboost(...)}. When QC removes outcomes,
+\code{outcome_names} and \code{focal_outcome_idx} are updated to match the
+post-QC outcome order. If the requested focal outcome is removed by QC,
+\code{focal_outcome_idx} is set to \code{NULL} with a warning.
+}
+\examples{
+\dontrun{
+# Direct ColocBoost call without QC.
+fit <- colocboost_analysis(X = X, Y = Y, M = 500)
+
+# Summary-statistic input with basic allele/variant harmonization only.
+fit <- colocboost_analysis(sumstat = sumstat, LD = LD,
+                           qc_method = "none", M = 500)
+
+# Summary-statistic input with LD-mismatch QC and RAISS imputation.
+fit <- colocboost_analysis(sumstat = sumstat, LD = LD,
+                           qc_method = "slalom", impute = TRUE)
+
+# Use richer LD metadata from load_LD_matrix() for QC, while still passing
+# ColocBoost's native LD input.
+ld_data <- load_LD_matrix(ld_meta_file, region)
+fit <- colocboost_analysis(sumstat = sumstat, LD = ld_data$LD_matrix,
+                           LD_reference_info = ld_data, qc_method = "none")
+
+# Individual-level input with explicit genotype QC thresholds.
+fit <- colocboost_analysis(X = X, Y = Y,
+                           missing_rate_thresh = 0.1,
+                           maf_cutoff = 0.0005)
+}
+}
diff --git a/man/colocboost_analysis_pipeline.Rd b/man/colocboost_pipeline.Rd
similarity index 60%
rename from man/colocboost_analysis_pipeline.Rd
rename to man/colocboost_pipeline.Rd
index 0268e2dd..a708b883 100644
--- a/man/colocboost_analysis_pipeline.Rd
+++ b/man/colocboost_pipeline.Rd
@@ -1,10 +1,10 @@
 % Generated by roxygen2: do not edit by hand
 % Please edit documentation in R/colocboost_pipeline.R
-\name{colocboost_analysis_pipeline}
-\alias{colocboost_analysis_pipeline}
-\title{Multi-trait colocalization analysis pipeline}
+\name{colocboost_pipeline}
+\alias{colocboost_pipeline}
+\title{Multi-trait colocalization analysis protocol pipeline}
 \usage{
-colocboost_analysis_pipeline(
+colocboost_pipeline(
   region_data,
   focal_trait = NULL,
   event_filters = NULL,
@@ -15,7 +15,7 @@ colocboost_analysis_pipeline(
   pip_cutoff_to_skip_ind = 0,
   keep_indel = TRUE,
   pip_cutoff_to_skip_sumstat = 0,
-  qc_method = c("slalom", "dentist", "none"),
+  qc_method = NULL,
   impute = TRUE,
   impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01),
   ...
@@ -34,25 +34,21 @@ colocboost_analysis_pipeline(
 
 \item{pip_cutoff_to_skip_sumstat}{A vector of cutoff values for skipping analysis based on PIP values for each sumstat Default is 0.}
 
-\item{qc_method}{Quality control method to use. Options are "slalom" or "dentist" (default: "slalom").}
+\item{qc_method}{Quality control method to use. Options are "none", "rss_qc", "slalom", or "dentist". \code{NULL} and \code{"rss_qc"} are treated as \code{"none"} for basic-only summary-stat preprocessing.}
 
 \item{impute}{Logical; if TRUE, performs imputation for outliers identified in the analysis (default: TRUE).}
 
 \item{impute_opts}{A list of imputation options including rcond, R2_threshold, and minimum_ld (default: list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5)).}
+
+\item{...}{Additional arguments passed to the ColocBoost analysis calls.}
 }
 \value{
-A list containing the individual_data and sumstat_data after QC:
-individual_data contains the following components if exist
-\itemize{
-  \item Y: A list of residualized phenotype values for all tasks.
-  \item X: A list of residualized genotype matrices all tasks.
-}
-sumstat_data contains the following components if exist
-\itemize{
-  \item sumstats: A list of summary statistics f or the matched LD_info, each sublist contains sumstats, n, var_y from \code{load_rss_data}.
-  \item LD_info: A list of LD information, each sublist contains LD_variants, LD_matrix, ref_panel  \code{load_LD_matrix}.
-}
+A list containing \code{xqtl_coloc}, \code{joint_gwas}, \code{separate_gwas},
+and \code{computing_time}. Analyses that are not requested or cannot be run
+return \code{NULL} in their corresponding result slot.
 }
 \description{
-This function perform a multi-trait colocalization using ColocBoost
+This function performs protocol-level multi-trait colocalization using
+ColocBoost. It accepts loaded regional data, performs QC once, then runs the
+requested xQTL-only, joint GWAS, and separate GWAS analyses.
 }
diff --git a/man/qc_individual_data.Rd b/man/qc_individual_data.Rd
new file mode 100644
index 00000000..1f1e070f
--- /dev/null
+++ b/man/qc_individual_data.Rd
@@ -0,0 +1,40 @@
+% Generated by roxygen2: do not edit by hand
+\name{qc_individual_data}
+\alias{qc_individual_data}
+\title{Run reusable individual-level QC}
+\usage{
+qc_individual_data(
+  X,
+  Y,
+  maf = NULL,
+  X_variance = NULL,
+  missing_rate_thresh = NULL,
+  maf_cutoff = 5e-04,
+  xvar_cutoff = 0,
+  ld_reference_meta_file = NULL,
+  keep_indel = TRUE,
+  pip_cutoff_to_skip = 0,
+  context = NULL
+)
+}
+\arguments{
+\item{X}{Genotype matrix or named list of genotype matrices.}
+\item{Y}{Phenotype vector/matrix or named list of phenotype matrices.}
+\item{maf}{Optional MAF vector or named list.}
+\item{X_variance}{Optional variant variance vector or named list.}
+\item{missing_rate_thresh}{Maximum missing genotype rate.}
+\item{maf_cutoff}{Minimum MAF cutoff.}
+\item{xvar_cutoff}{Minimum genotype variance cutoff.}
+\item{ld_reference_meta_file}{Optional LD reference metadata file.}
+\item{keep_indel}{Whether indel variants are kept during LD-reference filtering.}
+\item{pip_cutoff_to_skip}{Optional single-effect PIP cutoff.}
+\item{context}{Optional context label for matrix inputs.}
+}
+\value{
+A named list of cleaned context-level \code{X}/\code{Y} records, or one
+cleaned record for matrix inputs.
+}
+\description{
+Runs individual-level preprocessing by reusing existing pecotmr genotype
+filtering helpers and optional single-effect screening.
+}
diff --git a/man/region_data_to_colocboost_input.Rd b/man/region_data_to_colocboost_input.Rd
new file mode 100644
index 00000000..b49f57dc
--- /dev/null
+++ b/man/region_data_to_colocboost_input.Rd
@@ -0,0 +1,18 @@
+% Generated by roxygen2: do not edit by hand
+\name{region_data_to_colocboost_input}
+\alias{region_data_to_colocboost_input}
+\title{Convert loaded regional data to ColocBoost inputs}
+\usage{
+region_data_to_colocboost_input(region_data)
+}
+\arguments{
+\item{region_data}{A list returned by \code{load_multitask_regional_data()}.}
+}
+\value{
+A structured list containing \code{colocboost_input}, \code{qc_input}, and
+\code{source_info}.
+}
+\description{
+Builds direct ColocBoost input records plus QC input records from loaded
+regional data.
+}
diff --git a/man/region_data_to_ind_input.Rd b/man/region_data_to_ind_input.Rd
new file mode 100644
index 00000000..f3ee98e4
--- /dev/null
+++ b/man/region_data_to_ind_input.Rd
@@ -0,0 +1,18 @@
+% Generated by roxygen2: do not edit by hand
+\name{region_data_to_ind_input}
+\alias{region_data_to_ind_input}
+\title{Convert loaded regional data to individual-level inputs}
+\usage{
+region_data_to_ind_input(region_data)
+}
+\arguments{
+\item{region_data}{A list returned by \code{load_multitask_regional_data()}.}
+}
+\value{
+A list containing \code{X}, \code{Y}, \code{maf}, \code{X_variance}, and
+source information.
+}
+\description{
+Extracts individual-level matrices from loaded regional data without running
+QC or model fitting.
+}
diff --git a/man/region_data_to_rss_input.Rd b/man/region_data_to_rss_input.Rd
new file mode 100644
index 00000000..b6006c01
--- /dev/null
+++ b/man/region_data_to_rss_input.Rd
@@ -0,0 +1,17 @@
+% Generated by roxygen2: do not edit by hand
+\name{region_data_to_rss_input}
+\alias{region_data_to_rss_input}
+\title{Convert loaded regional data to RSS inputs}
+\usage{
+region_data_to_rss_input(region_data)
+}
+\arguments{
+\item{region_data}{A list returned by \code{load_multitask_regional_data()}.}
+}
+\value{
+A list containing named RSS inputs, matched LD data, and source information.
+}
+\description{
+Extracts loaded RSS summary-statistic records and their matched LD/reference
+data without running QC or model fitting.
+}
diff --git a/man/rss_analysis_pipeline.Rd b/man/rss_analysis_pipeline.Rd
index 99f27291..df410a0e 100644
--- a/man/rss_analysis_pipeline.Rd
+++ b/man/rss_analysis_pipeline.Rd
@@ -15,7 +15,7 @@ rss_analysis_pipeline(
   skip_region = NULL,
   extract_region_name = NULL,
   region_name_col = NULL,
-  qc_method = c("slalom", "dentist"),
+  qc_method = c("slalom", "dentist", "rss_qc", "none"),
   finemapping_method = c("susie_rss", "single_effect", "bayesian_conditional_regression"),
   finemapping_opts = list(L = 20, L_greedy = 5, coverage = c(0.95, 0.7, 0.5),
     signal_cutoff = 0.025, min_abs_corr = 0.8),
@@ -37,7 +37,8 @@ rss_analysis_pipeline(
 \item{LD_data}{A list from load_LD_matrix containing LD_matrix, LD_variants,
 ref_panel, block_metadata, and is_genotype flag. When is_genotype=TRUE
 (from return_genotype=TRUE), LD_matrix contains genotype X and susie_rss
-uses the z+X interface. R is computed internally for QC/imputation.}
+uses the z+X interface. Local R is computed only for QC stages that
+require a correlation matrix.}
 
 \item{n_sample}{Sample size. If 0, retrieved from the sumstat file.}
 
@@ -53,7 +54,9 @@ uses the z+X interface. R is computed internally for QC/imputation.}
 
 \item{region_name_col}{Column to filter for extract_region_name.}
 
-\item{qc_method}{QC method: "slalom" or "dentist".}
+\item{qc_method}{Summary-statistic QC method. \code{"slalom"} and
+\code{"dentist"} run basic allele harmonization plus LD-mismatch QC;
+\code{"rss_qc"} and \code{"none"} run basic allele harmonization only.}
 
 \item{finemapping_method}{One of "susie_rss", "single_effect", "bayesian_conditional_regression".}
 
diff --git a/man/rss_basic_qc.Rd b/man/rss_basic_qc.Rd
index b67620fc..020d0f03 100644
--- a/man/rss_basic_qc.Rd
+++ b/man/rss_basic_qc.Rd
@@ -4,7 +4,13 @@
 \alias{rss_basic_qc}
 \title{Preprocess input data for RSS analysis}
 \usage{
-rss_basic_qc(sumstats, LD_data, skip_region = NULL, keep_indel = TRUE)
+rss_basic_qc(
+  sumstats,
+  LD_data,
+  skip_region = NULL,
+  keep_indel = TRUE,
+  return_LD_mat = TRUE
+)
 }
 \arguments{
 \item{sumstats}{A data frame containing summary statistics with columns "chrom", "pos", "A1", and "A2".}
@@ -13,6 +19,11 @@ rss_basic_qc(sumstats, LD_data, skip_region = NULL, keep_indel = TRUE)
 
 \item{skip_region}{A character vector specifying regions to be skipped in the analysis (optional).
 Each region should be in the format "chrom:start-end" (e.g., "1:1000000-2000000").}
+
+\item{return_LD_mat}{Logical; if \code{FALSE}, return only harmonized
+summary statistics and skip LD-matrix subsetting. This is useful when the
+reference input is genotype-backed \code{X_ref}. Defaults to \code{TRUE}
+for backwards compatibility.}
 }
 \value{
 A list containing the processed summary statistics and LD matrix.
diff --git a/man/summary_stats_qc.Rd b/man/summary_stats_qc.Rd
index 1aa065da..139c2aa7 100644
--- a/man/summary_stats_qc.Rd
+++ b/man/summary_stats_qc.Rd
@@ -4,34 +4,94 @@
 \alias{summary_stats_qc}
 \title{Perform Quality Control on Summary Statistics}
 \usage{
-summary_stats_qc(sumstats, LD_data, n = NULL, method = c("slalom", "dentist"))
+summary_stats_qc(
+  sumstats,
+  LD_data,
+  n = NULL,
+  method = c("slalom", "dentist"),
+  rss_input = NULL,
+  keep_indel = TRUE,
+  skip_region = NULL,
+  pip_cutoff_to_skip = 0,
+  qc_method = NULL,
+  impute = FALSE,
+  impute_opts = list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5, lamb = 0.01),
+  study = NULL,
+  var_y = NULL,
+  return_on_skip = c("null", "preprocess"),
+  R_finite = NULL,
+  R_mismatch = NULL
+)
 }
 \arguments{
 \item{sumstats}{A data frame containing the processed summary statistics.}
 
-\item{LD_data}{A list containing the combined LD variants data generated by load_LD_matrix.}
+\item{LD_data}{A list containing the combined LD variants data generated by
+\code{load_LD_matrix()}.}
 
 \item{n}{Sample size for the LD reference panel (used by dentist method).}
 
-\item{method}{The quality control method to use. Options are "slalom" or "dentist" (default: "slalom").}
+\item{method}{The quality control method to use. Options are \code{"slalom"}
+or \code{"dentist"} (default: \code{"slalom"}).}
+
+\item{rss_input}{Optional loaded RSS input, either one \code{load_rss_data()}
+result or a named list of them. Supplying this selects the additional
+combined RSS/ColocBoost QC workflow. A single RSS record is detected by
+structure, not by the name \code{sumstats}, so a multi-study list may safely
+include a study named \code{"sumstats"}.}
+
+\item{keep_indel, skip_region, pip_cutoff_to_skip, qc_method, impute, impute_opts, study, var_y, return_on_skip, R_finite, R_mismatch}{Additional controls for the combined RSS/ColocBoost QC workflow.
+They are ignored by the historical LD-mismatch-only call unless
+\code{rss_input} or combined-QC options are supplied.}
 }
 \value{
-A list containing the quality-controlled summary statistics and updated LD matrix.
-  - sumstats: The quality-controlled summary statistics data frame.
-  - LD_mat: The updated LD matrix after quality control.
-  - outlier_number: The number of outlier variants removed.
+A list containing the quality-controlled summary statistics and updated LD
+matrix for the historical call:
+\itemize{
+  \item sumstats: The quality-controlled summary statistics data frame.
+  \item LD_mat: The updated LD matrix after quality control.
+  \item outlier_number: The number of outlier variants removed.
+}
+When \code{rss_input} or combined-QC controls are supplied, returns a cleaned
+RSS/LD record for one RSS record, or a named list of records for a list of RSS
+records.
 }
 \description{
-This function performs quality control on the processed summary statistics using the specified method.
-It wraps \code{\link{ld_mismatch_qc}} and handles subsetting of the summary statistics and LD matrix.
+This function performs quality control on the processed summary statistics
+using the specified method. It wraps \code{\link{ld_mismatch_qc}} and handles
+subsetting of the summary statistics and LD matrix.
 }
 \details{
-This function applies the specified quality control method to the processed summary statistics
-  via \code{\link{ld_mismatch_qc}}, then subsets the summary statistics and LD matrix to keep
-  only non-outlier variants.
+This function applies the specified quality control method to the processed
+summary statistics via \code{\link{ld_mismatch_qc}}, then subsets the summary
+statistics and LD matrix to keep only non-outlier variants.
+
+As an additional workflow for ColocBoost/RSS pipelines, callers may supply
+\code{rss_input} or combined-QC controls. That path first runs
+\code{rss_basic_qc()}, optional PIP screening, optional LD-mismatch QC through
+this same function, and optional RAISS imputation. The combined path normalizes
+one or many RSS records to the same loop internally; only true single-record
+input is unwrapped on return.
+\code{qc_method = NULL}, \code{"none"}, and \code{"rss_qc"} all mean
+basic-only summary-stat preprocessing without SLALOM/DENTIST outlier QC.
+
+When the combined path receives genotype-backed reference data
+(\code{X_ref}), basic harmonization avoids constructing an LD matrix. PIP
+screening uses an LD-independent single-effect screen with
+\code{susie_rss(R = diag(p), L = 1, max_iter = 1)}, and LD-mismatch QC
+computes only the filtered local correlation matrix required by
+SLALOM/DENTIST. RAISS imputation temporarily centers/scales genotype-backed
+\code{X_ref} before using the whole-region genotype/SVD path. This avoids
+LD-block partition artifacts while matching the LD scale used by
+\code{compute_LD(X_ref)}. Final ColocBoost calls still keep the original
+\code{X_ref} as the reference input.
 }
 \examples{
 # Perform SLALOM quality control (default)
 qc_results <- summary_stats_qc(sumstats, LD_data, method = "slalom")
 
+# Additional combined basic-only RSS QC.
+qc_results <- summary_stats_qc(rss_input = rss_input, LD_data = LD_data,
+                               qc_method = "none")
+
 }
diff --git a/tests/testthat/test_colocboost_pipeline.R b/tests/testthat/test_colocboost_pipeline.R
index 969360c1..e9214765 100644
--- a/tests/testthat/test_colocboost_pipeline.R
+++ b/tests/testthat/test_colocboost_pipeline.R
@@ -82,69 +82,6 @@ test_that("pip_cutoff correct length vector works", {
   expect_type(result, "list")
 })
 
-# ---- is_valid_sumstat_entry ----
-test_that("is_valid_sumstat_entry returns TRUE for valid data", {
-  ss <- data.frame(z = c(1.5, -2.3, 0.8), n = c(1000, 1000, 1000),
-                   variant = c("chr1:100:A:G", "chr1:200:T:C", "chr1:300:G:A"))
-  expect_true(pecotmr:::is_valid_sumstat_entry(ss))
-})
-
-test_that("is_valid_sumstat_entry returns FALSE for too few variants", {
-  ss <- data.frame(z = 1.5, n = 1000, variant = "chr1:100:A:G")
-  expect_false(pecotmr:::is_valid_sumstat_entry(ss, min_variants = 2))
-})
-
-test_that("is_valid_sumstat_entry returns FALSE for all-NA z-scores", {
-  ss <- data.frame(z = c(NA, NA, NA), n = c(1000, 1000, 1000),
-                   variant = c("v1", "v2", "v3"))
-  expect_false(pecotmr:::is_valid_sumstat_entry(ss))
-})
-
-test_that("is_valid_sumstat_entry returns FALSE for NULL input", {
-  expect_false(pecotmr:::is_valid_sumstat_entry(NULL))
-})
-
-test_that("is_valid_sumstat_entry returns FALSE for zero sample size", {
-  ss <- data.frame(z = c(1.5, -2.3), n = c(0, 0), variant = c("v1", "v2"))
-  expect_false(pecotmr:::is_valid_sumstat_entry(ss))
-})
-
-# ---- filter_valid_sumstats ----
-test_that("filter_valid_sumstats removes invalid entries", {
-  good <- data.frame(z = c(1.0, 2.0, 3.0), n = c(100, 100, 100),
-                     variant = c("v1", "v2", "v3"))
-  bad <- data.frame(z = c(NA, NA), n = c(0, 0), variant = c("v1", "v2"))
-  sumstats <- list(study_a = good, study_b = bad)
-  LD_mat <- list(ld1 = matrix(1, 3, 3))
-  LD_match <- c("ld1", "ld1")
-  result <- pecotmr:::filter_valid_sumstats(sumstats, LD_mat, LD_match)
-  expect_equal(length(result$sumstats), 1)
-  expect_equal(names(result$sumstats), "study_a")
-})
-
-test_that("filter_valid_sumstats returns NULL when all invalid", {
-  bad1 <- data.frame(z = NA, n = 0, variant = "v1")
-  bad2 <- data.frame(z = NA, n = 0, variant = "v1")
-  sumstats <- list(s1 = bad1, s2 = bad2)
-  LD_mat <- list(ld1 = matrix(1))
-  LD_match <- c("ld1", "ld1")
-  expect_message(
-    result <- pecotmr:::filter_valid_sumstats(sumstats, LD_mat, LD_match),
-    "No valid"
-  )
-  expect_null(result)
-})
-
-test_that("filter_valid_sumstats keeps all when all valid", {
-  make_ss <- function() data.frame(z = c(1, 2, 3), n = c(100, 100, 100),
-                                   variant = c("v1", "v2", "v3"))
-  sumstats <- list(s1 = make_ss(), s2 = make_ss())
-  LD_mat <- list(ld1 = matrix(1, 3, 3))
-  LD_match <- c("ld1", "ld1")
-  result <- pecotmr:::filter_valid_sumstats(sumstats, LD_mat, LD_match)
-  expect_equal(length(result$sumstats), 2)
-})
-
 # ===========================================================================
 # Tests from test_colocboost_pipeline_comprehensive.R
 # ===========================================================================
@@ -227,13 +164,934 @@ make_sumstat_region_data <- function(n_variants = 5, n_studies = 2) {
   )
 }
 
+test_that("qc_regional_data treats NULL qc_method as basic-only none", {
+  region_data <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  captured_qc_method <- NULL
+  LD_mat <- diag(1)
+  rownames(LD_mat) <- colnames(LD_mat) <- "chr1:100:A:G"
+
+  local_mocked_bindings(
+    summary_stats_qc = function(..., qc_method) {
+      captured_qc_method <<- qc_method
+      list(study1 = list(
+        rss_input = list(sumstats = data.frame(variant_id = "chr1:100:A:G")),
+        LD_matrix = LD_mat
+      ))
+    }
+  )
+
+  result <- pecotmr:::qc_regional_data(region_data, qc_method = NULL, impute = FALSE)
+  expect_equal(captured_qc_method, "none")
+  expect_type(result, "list")
+})
+
+test_that("colocboost_pipeline default qc_method resolves to basic-only none", {
+  region_data <- make_individual_region_data(n = 12, p = 5, n_contexts = 1, n_events = 1)
+  captured_qc_method <- NULL
+
+  local_mocked_bindings(
+    qc_regional_data = function(region_data, ..., qc_method) {
+      captured_qc_method <<- qc_method
+      list(
+        individual_data = list(
+          Y = region_data$individual_data$residual_Y,
+          X = region_data$individual_data$residual_X
+        ),
+        sumstat_data = NULL
+      )
+    },
+    .run_colocboost = function(label, ...) {
+      list(result = list(label = label), time = as.difftime(0, units = "secs"))
+    }
+  )
+
+  result <- suppressMessages(colocboost_pipeline(region_data))
+  expect_equal(captured_qc_method, "none")
+  expect_equal(result$xqtl_coloc$label, "xQTL-only ColocBoost")
+})
+
 # ===========================================================================
-# 1. colocboost_analysis_pipeline: no analysis flags returns empty results
+# New direct-input ColocBoost helpers
+# ===========================================================================
+test_that("RegionalData adapters expose individual and RSS inputs", {
+  ind_region <- make_individual_region_data(n = 12, p = 5, n_contexts = 1, n_events = 2)
+  ind_input <- region_data_to_ind_input(ind_region)
+  expect_true(ind_input$source_info$has_individual)
+  expect_equal(names(ind_input$X), "ctx1")
+  expect_equal(names(ind_input$Y), "ctx1")
+
+  rss_region <- make_sumstat_region_data(n_variants = 5, n_studies = 2)
+  rss_input <- region_data_to_rss_input(rss_region)
+  expect_true(rss_input$source_info$has_sumstat)
+  expect_equal(names(rss_input$rss_input), c("study1", "study2"))
+  expect_true(all(names(rss_input$rss_input) %in% names(rss_input$LD_data)))
+})
+
+test_that("RegionalData individual adapter restores context names from residual_Y", {
+  ind_region <- make_individual_region_data(n = 12, p = 5, n_contexts = 2, n_events = 1)
+  names(ind_region$individual_data$residual_X) <- NULL
+  names(ind_region$individual_data$maf) <- NULL
+  ind_region$individual_data$X_variance <- lapply(
+    ind_region$individual_data$residual_X,
+    function(x) matrixStats::colVars(x)
+  )
+
+  ind_input <- region_data_to_ind_input(ind_region)
+  expect_equal(names(ind_input$X), c("ctx1", "ctx2"))
+  expect_equal(names(ind_input$maf), c("ctx1", "ctx2"))
+  expect_equal(names(ind_input$X_variance), c("ctx1", "ctx2"))
+
+  converted <- region_data_to_colocboost_input(ind_region)
+  expect_equal(length(converted$colocboost_input$X), 2)
+  expect_equal(nrow(converted$colocboost_input$dict_YX), 2)
+})
+
+test_that("RegionalData adapters handle missing data without fabricating inputs", {
+  empty_region <- list(individual_data = NULL, sumstat_data = NULL)
+
+  ind_input <- region_data_to_ind_input(empty_region)
+  expect_false(ind_input$source_info$has_individual)
+  expect_null(ind_input$X)
+  expect_null(ind_input$Y)
+
+  rss_input <- region_data_to_rss_input(empty_region)
+  expect_false(rss_input$source_info$has_sumstat)
+  expect_equal(rss_input$rss_input, list())
+  expect_equal(rss_input$LD_data, list())
+
+  converted <- region_data_to_colocboost_input(empty_region)
+  expect_equal(converted$colocboost_input, list())
+  expect_false(converted$source_info$individual$has_individual)
+  expect_false(converted$source_info$sumstat$has_sumstat)
+})
+
+test_that("region_data_to_rss_input keeps duplicate study names unique by LD group", {
+  base_region <- make_sumstat_region_data(n_variants = 4, n_studies = 1)
+  study <- base_region$sumstat_data$sumstats[[1]]
+  ld_info <- base_region$sumstat_data$LD_info[[1]]
+  region_data <- list(
+    individual_data = NULL,
+    sumstat_data = list(
+      sumstats = list(study, study),
+      LD_info = list(ldA = ld_info, ldB = ld_info)
+    )
+  )
+
+  rss_input <- region_data_to_rss_input(region_data)
+  expect_equal(names(rss_input$rss_input), c("study1", "study1.1"))
+  expect_equal(names(rss_input$LD_data), c("study1", "study1.1"))
+  expect_equal(unname(rss_input$source_info$ld_group), c("ldA", "ldB"))
+})
+
+test_that("region_data_to_colocboost_input returns core and QC inputs", {
+  region_data <- make_individual_region_data(n = 12, p = 5, n_contexts = 1, n_events = 2)
+  converted <- region_data_to_colocboost_input(region_data)
+  expect_true("colocboost_input" %in% names(converted))
+  expect_true("qc_input" %in% names(converted))
+  expect_true("source_info" %in% names(converted))
+  expect_equal(length(converted$colocboost_input$X), 1)
+  expect_equal(length(converted$colocboost_input$Y), 2)
+  expect_equal(nrow(converted$colocboost_input$dict_YX), 2)
+})
+
+test_that("region_data_to_colocboost_input preserves loaded X_ref instead of precomputing LD", {
+  region_data <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  variants <- region_data$sumstat_data$sumstats[[1]][[1]]$sumstats$variant_id
+  X_ref <- matrix(rnorm(50), 10, 5)
+  colnames(X_ref) <- variants
+  region_data$sumstat_data$LD_info[[1]]$LD_matrix <- X_ref
+  region_data$sumstat_data$LD_info[[1]]$is_genotype <- TRUE
+
+  converted <- region_data_to_colocboost_input(region_data)
+
+  expect_null(converted$colocboost_input$LD)
+  expect_equal(length(converted$colocboost_input$X_ref), 1)
+  expect_equal(dim(converted$colocboost_input$X_ref[[1]]), c(10, 5))
+  expect_equal(colnames(converted$colocboost_input$X_ref[[1]]), variants)
+})
+
+test_that("region_data_to_colocboost_input preserves duplicated outcome names across contexts", {
+  region_data <- make_individual_region_data(n = 12, p = 5, n_contexts = 2, n_events = 1)
+  converted <- region_data_to_colocboost_input(region_data)
+
+  expect_equal(names(converted$colocboost_input$Y), c("ctx1_event1", "ctx2_event1"))
+  expect_equal(length(converted$colocboost_input$Y), 2)
+  expect_equal(nrow(converted$colocboost_input$dict_YX), 2)
+  expect_equal(converted$colocboost_input$dict_YX[, "X"], c(1, 2))
+})
+
+test_that("region_data_to_colocboost_input deduplicates shared individual X", {
+  region_data <- make_individual_region_data(n = 12, p = 5, n_contexts = 2, n_events = 1)
+  region_data$individual_data$residual_X$ctx2 <- region_data$individual_data$residual_X$ctx1
+  converted <- region_data_to_colocboost_input(region_data)
+
+  expect_equal(length(converted$colocboost_input$X), 1)
+  expect_equal(length(converted$colocboost_input$Y), 2)
+  expect_equal(converted$colocboost_input$dict_YX[, "X"], c(1, 1))
+})
+
+test_that("region_data_to_colocboost_input combines individual and RSS inputs", {
+  ind_region <- make_individual_region_data(n = 12, p = 5, n_contexts = 1, n_events = 2)
+  rss_region <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  region_data <- list(
+    individual_data = ind_region$individual_data,
+    sumstat_data = rss_region$sumstat_data
+  )
+
+  converted <- region_data_to_colocboost_input(region_data)
+  expect_equal(length(converted$colocboost_input$X), 1)
+  expect_equal(length(converted$colocboost_input$Y), 2)
+  expect_equal(nrow(converted$colocboost_input$dict_YX), 2)
+  expect_equal(length(converted$colocboost_input$sumstat), 1)
+  expect_equal(length(converted$colocboost_input$LD), 1)
+  expect_equal(nrow(converted$colocboost_input$dict_sumstatLD), 1)
+  expect_true(converted$source_info$individual$has_individual)
+  expect_true(converted$source_info$sumstat$has_sumstat)
+})
+
+test_that("qc_individual_data uses existing genotype filtering helpers", {
+  region_data <- make_individual_region_data(n = 12, p = 5, n_contexts = 1, n_events = 2)
+  X <- region_data$individual_data$residual_X
+  Y <- region_data$individual_data$residual_Y
+  maf <- region_data$individual_data$maf
+  expect_message(
+    result <- qc_individual_data(X, Y, maf = maf, maf_cutoff = 0),
+    "QC track"
+  )
+  expect_equal(names(result), "ctx1")
+  expect_equal(ncol(result$ctx1$Y), 2)
+})
+
+test_that("qc_individual_data supports direct matrix inputs and keeps context labels", {
+  region_data <- make_individual_region_data(n = 12, p = 5, n_contexts = 1, n_events = 2)
+  X <- region_data$individual_data$residual_X$ctx1
+  Y <- region_data$individual_data$residual_Y$ctx1
+  maf <- stats::setNames(rep(0.2, ncol(X)), colnames(X))
+  maf[1] <- 0.001
+
+  expect_message(
+    result <- qc_individual_data(X, Y, maf = maf, maf_cutoff = 0.05, context = "ctx1"),
+    "retained"
+  )
+  expect_false(names(maf)[1] %in% colnames(result$X))
+  expect_true(all(startsWith(colnames(result$Y), "ctx1_")))
+  expect_equal(ncol(result$Y), ncol(Y))
+})
+
+test_that("summary_stats_qc runs combined basic harmonization when qc_method is none", {
+  region_data <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  rss_input <- region_data_to_rss_input(region_data)
+  expect_message(
+    result <- summary_stats_qc(
+      rss_input = rss_input$rss_input,
+      LD_data = rss_input$LD_data,
+      qc_method = "none",
+      impute = FALSE
+    ),
+    "basic allele harmonization"
+  )
+  expect_equal(names(result), "study1")
+  expect_true(nrow(result$study1$rss_input$sumstats) > 0)
+})
+
+test_that("summary_stats_qc returns one cleaned record for one RSS record", {
+  region_data <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  rss_input <- region_data_to_rss_input(region_data)
+
+  expect_message(
+    result <- summary_stats_qc(
+      rss_input = rss_input$rss_input$study1,
+      LD_data = rss_input$LD_data$study1,
+      qc_method = "none",
+      impute = FALSE
+    ),
+    "basic allele harmonization"
+  )
+  expect_true("rss_input" %in% names(result))
+  expect_true("LD_matrix" %in% names(result))
+  expect_true(nrow(result$rss_input$sumstats) > 0)
+})
+
+test_that("summary_stats_qc treats a study named sumstats as multiple-study input", {
+  region_data <- make_sumstat_region_data(n_variants = 5, n_studies = 2)
+  rss_input <- region_data_to_rss_input(region_data)
+  names(rss_input$rss_input)[1] <- "sumstats"
+  names(rss_input$LD_data)[1] <- "sumstats"
+
+  expect_message(
+    result <- summary_stats_qc(
+      rss_input = rss_input$rss_input,
+      LD_data = rss_input$LD_data,
+      qc_method = "none",
+      impute = FALSE
+    ),
+    "basic allele harmonization"
+  )
+  expect_equal(names(result), c("sumstats", "study2"))
+  expect_true("rss_input" %in% names(result$sumstats))
+  expect_true("rss_input" %in% names(result$study2))
+})
+
+test_that("summary_stats_qc imputes when block metadata can be inferred from LD matrix", {
+  region_data <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  rss_input <- region_data_to_rss_input(region_data)
+
+  expect_message(
+    result <- summary_stats_qc(
+      rss_input = rss_input$rss_input,
+      LD_data = rss_input$LD_data,
+      qc_method = "none",
+      impute = TRUE,
+      impute_opts = list(rcond = 0.01, R2_threshold = -Inf, minimum_ld = -Inf, lamb = 0.01)
+    ),
+    "running imputation"
+  )
+  expect_equal(names(result), "study1")
+  expect_true(nrow(result$study1$rss_input$sumstats) > 0)
+})
+
+test_that("colocboost_analysis directly forwards core inputs without QC", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  X <- matrix(rnorm(20), 5, 4)
+  colnames(X) <- paste0("v", 1:4)
+  Y <- matrix(rnorm(10), 5, 2)
+  colnames(Y) <- c("y1", "y2")
+  result <- colocboost_analysis(X = X, Y = Y, M = 2)
+  expect_identical(result$args$X, X)
+  expect_identical(result$args$Y, Y)
+  expect_equal(result$args$M, 2)
+  expect_length(result$dots, 0)
+})
+
+test_that("colocboost_analysis runs individual QC from colocboost-style inputs", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  region_data <- make_individual_region_data(n = 12, p = 5, n_contexts = 1, n_events = 2)
+  converted <- region_data_to_colocboost_input(region_data)
+  expect_message(
+    result <- do.call(
+      colocboost_analysis,
+      c(converted$colocboost_input, list(missing_rate_thresh = 1, M = 2))
+    ),
+    "individual-level"
+  )
+  expect_equal(length(result$args$X), 1)
+  expect_equal(length(result$args$Y), 2)
+  expect_equal(result$args$M, 2)
+})
+
+test_that("colocboost_analysis deduplicates shared X after individual QC", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  X <- matrix(rnorm(60), 12, 5)
+  colnames(X) <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  Y <- list(
+    ctx1 = matrix(rnorm(12), 12, 1, dimnames = list(NULL, "event1")),
+    ctx2 = matrix(rnorm(12), 12, 1, dimnames = list(NULL, "event2"))
+  )
+  X_list <- list(ctx1 = X, ctx2 = X)
+
+  expect_message(
+    result <- colocboost_analysis(X = X_list, Y = Y, missing_rate_thresh = 1, M = 2),
+    "individual-level"
+  )
+  expect_equal(length(result$args$X), 1)
+  expect_equal(result$args$dict_YX[, "X"], c(1, 1))
+})
+
+test_that("colocboost_analysis refreshes outcome_names after combined QC", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  ind_region <- make_individual_region_data(n = 12, p = 5, n_contexts = 1, n_events = 2)
+  rss_region <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  region_data <- list(
+    individual_data = ind_region$individual_data,
+    sumstat_data = rss_region$sumstat_data
+  )
+  converted <- region_data_to_colocboost_input(region_data)
+
+  expect_message(
+    result <- do.call(
+      colocboost_analysis,
+      c(converted$colocboost_input, list(missing_rate_thresh = 1, qc_method = "none", M = 2))
+    ),
+    "summary-statistic"
+  )
+  expect_equal(
+    result$args$outcome_names,
+    c(names(result$args$Y), names(result$args$sumstat))
+  )
+})
+
+test_that("colocboost_analysis remaps focal_outcome_idx after QC keeps focal outcome", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    },
+    qc_individual_data = function(X, Y, ...) {
+      list(ctx = list(
+        X = X$ctx,
+        Y = Y$ctx[, "raw2", drop = FALSE]
+      ))
+    }
+  )
+  X <- list(ctx = matrix(rnorm(40), 10, 4))
+  colnames(X$ctx) <- paste0("v", 1:4)
+  Y <- list(ctx = matrix(rnorm(20), 10, 2))
+  colnames(Y$ctx) <- c("raw1", "raw2")
+
+  result <- colocboost_analysis(
+    X = X, Y = Y,
+    outcome_names = c("trait1", "trait2"),
+    focal_outcome_idx = 2,
+    missing_rate_thresh = 1,
+    M = 2
+  )
+
+  expect_equal(result$args$outcome_names, "trait2")
+  expect_equal(result$args$focal_outcome_idx, 1)
+})
+
+test_that("colocboost_analysis clears focal_outcome_idx when QC removes focal outcome", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    },
+    qc_individual_data = function(X, Y, ...) {
+      list(ctx = list(
+        X = X$ctx,
+        Y = Y$ctx[, "raw2", drop = FALSE]
+      ))
+    }
+  )
+  X <- list(ctx = matrix(rnorm(40), 10, 4))
+  colnames(X$ctx) <- paste0("v", 1:4)
+  Y <- list(ctx = matrix(rnorm(20), 10, 2))
+  colnames(Y$ctx) <- c("raw1", "raw2")
+
+  expect_warning(
+    result <- colocboost_analysis(
+      X = X, Y = Y,
+      outcome_names = c("trait1", "trait2"),
+      focal_outcome_idx = 1,
+      missing_rate_thresh = 1,
+      M = 2
+    ),
+    "not present after QC"
+  )
+
+  expect_equal(result$args$outcome_names, "trait2")
+  expect_null(result$args$focal_outcome_idx)
+})
+
+test_that("colocboost_analysis skips empty individual QC for sumstat-only inputs", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  region_data <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  converted <- region_data_to_colocboost_input(region_data)
+
+  messages <- character()
+  withCallingHandlers(
+    result <- do.call(
+      colocboost_analysis,
+      c(converted$colocboost_input, list(qc_method = "none", M = 2))
+    ),
+    message = function(m) {
+      messages <<- c(messages, conditionMessage(m))
+      invokeRestart("muffleMessage")
+    }
+  )
+
+  expect_false(any(grepl("individual-level", messages)))
+  expect_true(any(grepl("summary-statistic", messages)))
+  expect_equal(names(result$args$sumstat), "study1")
+})
+
+test_that("colocboost_analysis falls back to direct call when QC inputs are unavailable", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+
+  expect_warning(
+    result <- colocboost_analysis(qc_method = "none", M = 2),
+    "required QC inputs are unavailable"
+  )
+  expect_equal(result$args$M, 2)
+  expect_length(result$dots, 0)
+})
+
+test_that("colocboost_analysis falls back when individual QC cannot run", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  X <- matrix(rnorm(20), 5, 4)
+  Y <- matrix(rnorm(10), 5, 2)
+  colnames(Y) <- c("y1", "y2")
+
+  expect_warning(
+    result <- colocboost_analysis(X = X, Y = Y, missing_rate_thresh = 1, M = 2),
+    "QC requested but skipped"
+  )
+  expect_identical(result$args$X, X)
+  expect_identical(result$args$Y, Y)
+  expect_equal(result$args$M, 2)
+})
+
+test_that("colocboost_analysis derives summary QC input from ColocBoost-style inputs", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  sumstat <- data.frame(
+    variant = variants,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- variants
+
+  expect_message(
+    result <- colocboost_analysis(sumstat = sumstat, LD = LD, qc_method = "none", M = 2),
+    "summary-statistic"
+  )
+  expect_equal(length(result$args$sumstat), 1)
+  expect_equal(length(result$args$LD), 1)
+  expect_equal(nrow(result$args$sumstat[[1]]), 5)
+  expect_equal(nrow(result$args$dict_sumstatLD), 1)
+  expect_equal(result$args$M, 2)
+})
+
+test_that("colocboost_analysis keeps multiple GWAS as colocboost list input after summary QC", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  make_sumstat <- function(seed) {
+    set.seed(seed)
+    data.frame(
+      variant = variants,
+      z = rnorm(5),
+      n = rep(1000, 5),
+      stringsAsFactors = FALSE
+    )
+  }
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- variants
+
+  expect_message(
+    result <- colocboost_analysis(
+      sumstat = list(gwas1 = make_sumstat(1), gwas2 = make_sumstat(2)),
+      LD = LD,
+      qc_method = "none",
+      M = 2
+    ),
+    "summary-statistic"
+  )
+  expect_equal(names(result$args$sumstat), c("gwas1", "gwas2"))
+  expect_equal(length(result$args$LD), 1)
+  expect_equal(nrow(result$args$dict_sumstatLD), 2)
+  expect_equal(result$args$dict_sumstatLD[, 2], c(1, 1))
+})
+
+test_that("colocboost_analysis imputes native LD input using inferred block metadata", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    },
+    raiss = function(ref_panel, known_zscores, LD_matrix = NULL, genotype_matrix = NULL, ...) {
+      expect_null(genotype_matrix)
+      expect_type(LD_matrix, "list")
+      expect_true("ld_matrices" %in% names(LD_matrix))
+      LD_mat <- diag(nrow(known_zscores))
+      rownames(LD_mat) <- colnames(LD_mat) <- known_zscores$variant_id
+      list(
+        result_filter = known_zscores,
+        LD_mat = LD_mat
+      )
+    }
+  )
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  sumstat <- data.frame(
+    variant = variants,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- variants
+
+  expect_warning(
+    expect_message(
+      result <- colocboost_analysis(
+        sumstat = sumstat, LD = LD,
+        qc_method = "none", impute = TRUE, M = 2
+      ),
+      "running imputation"
+    ),
+    NA
+  )
+  expect_equal(nrow(result$args$sumstat[[1]]), 5)
+})
+
+test_that("colocboost_analysis imputes native X_ref input through scaled genotype RAISS path", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    },
+    compute_LD = function(...) stop("compute_LD should not be called"),
+    raiss = function(ref_panel, known_zscores, LD_matrix = NULL, genotype_matrix = NULL, ...) {
+      expect_null(LD_matrix)
+      expect_equal(ncol(genotype_matrix), nrow(ref_panel))
+      expect_equal(unname(colMeans(genotype_matrix)), rep(0, ncol(genotype_matrix)),
+                   tolerance = 1e-10)
+      list(result_filter = known_zscores, LD_mat = NULL)
+    }
+  )
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  sumstat <- data.frame(
+    variant = variants,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  X_ref <- matrix(rnorm(50), 10, 5)
+  colnames(X_ref) <- variants
+
+  expect_warning(
+    expect_message(
+      result <- colocboost_analysis(
+        sumstat = sumstat, X_ref = X_ref,
+        qc_method = "none", impute = TRUE, M = 2
+      ),
+      "running imputation"
+    ),
+    NA
+  )
+  expect_equal(length(result$args$X_ref), 1)
+  expect_equal(ncol(result$args$X_ref[[1]]), 5)
+  expect_null(result$args$LD)
+})
+
+test_that("colocboost_analysis keeps QC-generated X_ref mutually exclusive with original LD", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  sumstat <- data.frame(
+    variant = variants,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- variants
+  X_ref <- matrix(rnorm(50), 10, 5)
+  colnames(X_ref) <- variants
+  ref_panel <- parse_variant_id(variants)
+  ref_panel$variant_id <- variants
+  LD_reference_info <- list(
+    LD_matrix = X_ref,
+    LD_variants = variants,
+    ref_panel = ref_panel,
+    is_genotype = TRUE
+  )
+
+  result <- suppressMessages(colocboost_analysis(
+    sumstat = sumstat,
+    LD = LD,
+    LD_reference_info = LD_reference_info,
+    qc_method = "none",
+    M = 2
+  ))
+
+  expect_null(result$args$LD)
+  expect_equal(length(result$args$X_ref), 1)
+  expect_equal(dim(result$args$X_ref[[1]]), c(10, 5))
+})
+
+test_that("colocboost_analysis native summary QC supports explicit A1_A2 variant convention", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  variants_a1a2 <- paste0("chr1:", seq_len(5) * 100, ":G:A")
+  sumstat <- data.frame(
+    variant = variants_a1a2,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- variants_a1a2
+
+  result <- suppressMessages(colocboost_analysis(
+    sumstat = sumstat, LD = LD,
+    qc_method = "none",
+    variant_convention = "A1_A2", M = 2
+  ))
+  expect_equal(result$args$sumstat[[1]]$variant, paste0("chr1:", seq_len(5) * 100, ":A:G"))
+  expect_equal(rownames(result$args$LD[[1]]), paste0("chr1:", seq_len(5) * 100, ":A:G"))
+})
+
+test_that("colocboost_analysis uses LD_reference_info data frame for rsid-named LD", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  rsids <- paste0("rs", seq_len(5))
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  sumstat <- data.frame(
+    variant = variants,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- rsids
+  LD_reference_info <- data.frame(
+    chrom = 1,
+    id = rsids,
+    pos = seq_len(5) * 100,
+    A2 = "A",
+    A1 = "G",
+    stringsAsFactors = FALSE
+  )
+
+  expect_message(
+    result <- colocboost_analysis(
+      sumstat = sumstat, LD = LD,
+      qc_method = "none",
+      LD_reference_info = LD_reference_info,
+      M = 2
+    ),
+    "LD_reference_info"
+  )
+  expect_equal(rownames(result$args$LD[[1]]), variants)
+  expect_equal(result$args$sumstat[[1]]$variant, variants)
+})
+
+test_that("colocboost_analysis uses LD_reference_info row order when LD names are absent", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  sumstat <- data.frame(
+    variant = variants,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  LD_reference_info <- data.frame(
+    chrom = 1,
+    pos = seq_len(5) * 100,
+    A2 = "A",
+    A1 = "G",
+    stringsAsFactors = FALSE
+  )
+
+  expect_message(
+    result <- colocboost_analysis(
+      sumstat = sumstat, LD = LD,
+      qc_method = "none",
+      LD_reference_info = LD_reference_info,
+      M = 2
+    ),
+    "row order"
+  )
+  expect_equal(rownames(result$args$LD[[1]]), variants)
+  expect_equal(result$args$sumstat[[1]]$variant, variants)
+})
+
+test_that("colocboost_analysis reads LD_reference_info from a bim file", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  rsids <- paste0("rs", seq_len(5))
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  sumstat <- data.frame(
+    variant = variants,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- rsids
+  bim_file <- tempfile(fileext = ".bim")
+  write.table(
+    data.frame(1, rsids, 0, seq_len(5) * 100, "G", "A"),
+    bim_file,
+    quote = FALSE,
+    row.names = FALSE,
+    col.names = FALSE
+  )
+
+  result <- suppressMessages(colocboost_analysis(
+    sumstat = sumstat, LD = LD,
+    qc_method = "none",
+    LD_reference_info = bim_file,
+    M = 2
+  ))
+  expect_equal(rownames(result$args$LD[[1]]), variants)
+})
+
+test_that("colocboost_analysis reports missing LD_reference_info when LD names are not genomic", {
+  local_mocked_bindings(
+    .cb_call_colocboost = function(args, dots) {
+      list(args = args, dots = dots)
+    }
+  )
+  variants <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  sumstat <- data.frame(
+    variant = variants,
+    z = rnorm(5),
+    n = rep(1000, 5),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- paste0("rs", seq_len(5))
+
+  expect_warning(
+    result <- colocboost_analysis(sumstat = sumstat, LD = LD, qc_method = "none", M = 2),
+    "LD_reference_info"
+  )
+  expect_identical(result$args$LD, LD)
+  expect_equal(result$args$M, 2)
+})
+
+test_that("colocboost_pipeline is the protocol entry", {
+  set.seed(450)
+  X <- matrix(rnorm(50), 10, 5)
+  colnames(X) <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  Y <- matrix(rnorm(10), 10, 1)
+  colnames(Y) <- "gene1"
+  region_data <- list(
+    individual_data = list(
+      residual_Y = list(ctx1 = Y),
+      residual_X = list(ctx1 = X),
+      maf = list(ctx1 = runif(5, 0.05, 0.45))
+    ),
+    sumstat_data = NULL
+  )
+
+  local_mocked_bindings(
+    qc_regional_data = function(region_data, ...) {
+      list(
+        individual_data = list(Y = list(ctx1 = Y), X = list(ctx1 = X)),
+        sumstat_data = NULL
+      )
+    },
+    .run_colocboost = function(label, ...) {
+      list(result = list(label = label, args = list(...)),
+           time = as.difftime(0, units = "secs"))
+    }
+  )
+
+  result <- suppressMessages(colocboost_pipeline(region_data))
+  expect_named(result, c("xqtl_coloc", "joint_gwas", "separate_gwas", "computing_time"))
+  expect_equal(result$xqtl_coloc$label, "xQTL-only ColocBoost")
+})
+
+test_that("colocboost_pipeline preserves result fields when analyses return NULL", {
+  set.seed(451)
+  X <- matrix(rnorm(50), 10, 5)
+  colnames(X) <- paste0("chr1:", seq_len(5) * 100, ":A:G")
+  Y <- matrix(rnorm(10), 10, 1)
+  colnames(Y) <- "gene1"
+  sumstat <- data.frame(
+    chrom = 1,
+    pos = seq_len(5) * 100,
+    A2 = "A",
+    A1 = "G",
+    z = rnorm(5),
+    n = 1000,
+    variant_id = colnames(X),
+    stringsAsFactors = FALSE
+  )
+  LD <- diag(5)
+  rownames(LD) <- colnames(LD) <- colnames(X)
+  region_data <- list(
+    individual_data = list(
+      residual_Y = list(ctx1 = Y),
+      residual_X = list(ctx1 = X),
+      maf = list(ctx1 = runif(5, 0.05, 0.45))
+    ),
+    sumstat_data = list(
+      sumstats = list(chr21_ref = list(study1 = list(sumstats = sumstat, n = 1000, var_y = 1))),
+      LD_info = list(chr21_ref = list(
+        LD_variants = colnames(X),
+        LD_matrix = LD,
+        ref_panel = cbind(parse_variant_id(colnames(X)), variant_id = colnames(X)),
+        block_metadata = data.frame(
+          block_id = 1L, chrom = "1", block_start = 100, block_end = 500,
+          size = 5L, start_idx = 1L, end_idx = 5L
+        ),
+        is_genotype = FALSE
+      ))
+    )
+  )
+
+  local_mocked_bindings(
+    .run_colocboost = function(label, ...) {
+      list(result = NULL, time = as.difftime(0, units = "secs"))
+    }
+  )
+
+  result <- suppressMessages(colocboost_pipeline(
+    region_data,
+    xqtl_coloc = TRUE,
+    joint_gwas = TRUE,
+    separate_gwas = TRUE,
+    qc_method = "none",
+    impute = FALSE
+  ))
+  expect_named(result, c("xqtl_coloc", "joint_gwas", "separate_gwas", "computing_time"))
+  expect_null(result$xqtl_coloc)
+  expect_null(result$joint_gwas)
+  expect_named(result$separate_gwas, "study1")
+  expect_null(result$separate_gwas$study1)
+})
+
+# ===========================================================================
+# 1. colocboost_pipeline: no analysis flags returns empty results
 # ===========================================================================
 test_that("pipeline returns empty results with message when no analysis flags set", {
   region_data <- make_individual_region_data()
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc    = FALSE,
       joint_gwas     = FALSE,
@@ -248,12 +1106,12 @@ test_that("pipeline returns empty results with message when no analysis flags se
 })
 
 # ===========================================================================
-# 2. colocboost_analysis_pipeline: NULL individual_data and NULL sumstat_data
+# 2. colocboost_pipeline: NULL individual_data and NULL sumstat_data
 # ===========================================================================
 test_that("pipeline returns early when both data sources are NULL", {
   region_data <- list(individual_data = NULL, sumstat_data = NULL)
   expect_message(
-    result <- colocboost_analysis_pipeline(region_data, xqtl_coloc = TRUE),
+    result <- colocboost_pipeline(region_data, xqtl_coloc = TRUE),
     "No individual data"
   )
   expect_type(result, "list")
@@ -264,7 +1122,7 @@ test_that("pipeline returns early when both data sources are NULL", {
 # 3. filter_events: type_pattern, valid_pattern, exclude_pattern
 # ===========================================================================
 test_that("filter_events keeps events matching valid_pattern", {
-  # Access the internal function from within colocboost_analysis_pipeline's environment
+  # Access the internal function from within colocboost_pipeline's environment
   # We need to build a minimal call through the pipeline to test filter_events indirectly.
   # Instead, recreate the inner function for testing purposes.
 
@@ -311,7 +1169,7 @@ test_that("filter_events keeps events matching valid_pattern", {
   )
 
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       event_filters  = event_filters,
       xqtl_coloc     = FALSE,
@@ -349,7 +1207,7 @@ test_that("filter_events errors on missing type_pattern", {
 
   expect_error(
     suppressMessages(
-      colocboost_analysis_pipeline(
+      colocboost_pipeline(
         region_data,
         event_filters  = bad_filter,
         xqtl_coloc     = TRUE,
@@ -382,7 +1240,7 @@ test_that("filter_events errors when only type_pattern is given (no valid or exc
 
   expect_error(
     suppressMessages(
-      colocboost_analysis_pipeline(
+      colocboost_pipeline(
         region_data,
         event_filters  = bad_filter,
         xqtl_coloc     = TRUE
@@ -412,7 +1270,7 @@ test_that("extract_contexts_studies returns individual contexts and sumstat stud
   # Pipeline calls extract_contexts_studies internally.
   # With no analysis flags it will still run extraction and return.
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc     = FALSE,
       joint_gwas      = FALSE,
@@ -466,7 +1324,7 @@ test_that("extract_contexts_studies reports after-QC when some individual data r
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc    = TRUE,
       joint_gwas     = FALSE,
@@ -492,9 +1350,12 @@ test_that("qc_regional_data handles named pip_cutoff_to_skip_sumstat vector", {
       LD_mat <- LD_data$LD_matrix[sumstats$variant_id, sumstats$variant_id, drop = FALSE]
       list(sumstats = sumstats, LD_mat = LD_mat)
     },
-    summary_stats_qc = function(sumstats, LD_data, ...) {
-      LD_mat <- LD_data$LD_matrix[sumstats$variant_id, sumstats$variant_id, drop = FALSE]
-      list(sumstats = sumstats, LD_mat = LD_mat, outlier_number = 0)
+    summary_stats_qc = function(rss_input = NULL, LD_data, ...) {
+      stats::setNames(lapply(names(rss_input), function(study) {
+        ss <- rss_input[[study]]$sumstats
+        LD_mat <- LD_data[[study]]$LD_matrix[ss$variant_id, ss$variant_id, drop = FALSE]
+        list(rss_input = rss_input[[study]], LD_matrix = LD_mat, outlier_number = 0)
+      }), names(rss_input))
     },
     raiss = function(...) {
       list(result_filter = data.frame(z = rnorm(5)), LD_mat = diag(5))
@@ -526,9 +1387,12 @@ test_that("qc_regional_data fills missing study names with 0 for pip_cutoff_to_s
       LD_mat <- LD_data$LD_matrix[sumstats$variant_id, sumstats$variant_id, drop = FALSE]
       list(sumstats = sumstats, LD_mat = LD_mat)
     },
-    summary_stats_qc = function(sumstats, LD_data, ...) {
-      LD_mat <- LD_data$LD_matrix[sumstats$variant_id, sumstats$variant_id, drop = FALSE]
-      list(sumstats = sumstats, LD_mat = LD_mat, outlier_number = 0)
+    summary_stats_qc = function(rss_input = NULL, LD_data, ...) {
+      stats::setNames(lapply(names(rss_input), function(study) {
+        ss <- rss_input[[study]]$sumstats
+        LD_mat <- LD_data[[study]]$LD_matrix[ss$variant_id, ss$variant_id, drop = FALSE]
+        list(rss_input = rss_input[[study]], LD_matrix = LD_mat, outlier_number = 0)
+      }), names(rss_input))
     },
     raiss = function(...) list(result_filter = data.frame(z = rnorm(5)), LD_mat = diag(5)),
     partition_LD_matrix = function(...) diag(5)
@@ -537,24 +1401,32 @@ test_that("qc_regional_data fills missing study names with 0 for pip_cutoff_to_s
   # Named vector with only one of the two studies: the missing study should get 0
 
   pip_partial <- c("study1" = 0.05)
-  result <- suppressMessages(
-    pecotmr:::qc_regional_data(
-      region_data,
-      pip_cutoff_to_skip_sumstat = pip_partial,
-      qc_method = "slalom",
-      impute = FALSE
-    )
+  result <- withCallingHandlers(
+    suppressMessages(
+      pecotmr:::qc_regional_data(
+        region_data,
+        pip_cutoff_to_skip_sumstat = pip_partial,
+        qc_method = "slalom",
+        impute = FALSE
+      )
+    ),
+    warning = function(w) {
+      if (grepl("IBSS algorithm did not converge", conditionMessage(w), fixed = TRUE)) {
+        invokeRestart("muffleWarning")
+      }
+      stop(w)
+    }
   )
   expect_type(result, "list")
 })
 
 # ===========================================================================
-# 9. colocboost_analysis_pipeline: output structure verification
+# 9. colocboost_pipeline: output structure verification
 # ===========================================================================
 test_that("pipeline output structure has expected top-level keys", {
   region_data <- list(individual_data = NULL, sumstat_data = NULL)
   result <- suppressMessages(
-    colocboost_analysis_pipeline(region_data, xqtl_coloc = FALSE, joint_gwas = FALSE, separate_gwas = FALSE)
+    colocboost_pipeline(region_data, xqtl_coloc = FALSE, joint_gwas = FALSE, separate_gwas = FALSE)
   )
   expect_true("xqtl_coloc" %in% names(result))
   expect_true("joint_gwas" %in% names(result))
@@ -590,7 +1462,7 @@ test_that("pipeline with individual data enters xqtl_coloc path and records timi
   )
 
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       xqtl_coloc    = TRUE,
       joint_gwas     = FALSE,
@@ -650,7 +1522,7 @@ test_that("filter_events exclude_pattern removes matching events via pipeline",
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       event_filters  = event_filters,
       xqtl_coloc     = TRUE
@@ -693,7 +1565,7 @@ test_that("pipeline catches colocboost xqtl error and returns NULL result", {
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = TRUE,
       joint_gwas  = FALSE,
@@ -718,7 +1590,7 @@ test_that("pipeline returns early when all data fails QC", {
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc    = TRUE,
       joint_gwas     = FALSE,
@@ -729,6 +1601,199 @@ test_that("pipeline returns early when all data fails QC", {
   expect_type(result, "list")
 })
 
+make_qced_sumstat_data <- function(studies = c("study1"), n_variants = 5) {
+  vids <- paste0("chr1:", seq_len(n_variants) * 100, ":A:G")
+  LD_mat <- diag(n_variants)
+  rownames(LD_mat) <- colnames(LD_mat) <- vids
+  sumstats <- lapply(studies, function(study) {
+    list(
+      sumstats = data.frame(
+        z = rnorm(n_variants),
+        variant_id = vids,
+        stringsAsFactors = FALSE
+      ),
+      n = 5000,
+      var_y = 1
+    )
+  })
+  names(sumstats) <- studies
+  list(
+    sumstats = sumstats,
+    LD_mat = stats::setNames(list(LD_mat), studies[1]),
+    LD_match = stats::setNames(rep(studies[1], length(studies)), studies)
+  )
+}
+
+test_that("pipeline skips xqtl branch when QC removes individual data but keeps sumstats", {
+  ind_region <- make_individual_region_data(n = 20, p = 8, n_contexts = 1, n_events = 2)
+  sumstat_region <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  region_data <- list(
+    individual_data = ind_region$individual_data,
+    sumstat_data = sumstat_region$sumstat_data
+  )
+  calls <- character()
+
+  local_mocked_bindings(
+    qc_regional_data = function(region_data, ...) {
+      list(
+        individual_data = NULL,
+        sumstat_data = make_qced_sumstat_data("study1")
+      )
+    },
+    .run_colocboost = function(label, ...) {
+      calls <<- c(calls, label)
+      list(result = list(label = label, args = list(...)),
+           time = as.difftime(0, units = "secs"))
+    }
+  )
+
+  expect_message(
+    result <- colocboost_pipeline(
+      region_data,
+      xqtl_coloc = TRUE,
+      joint_gwas = TRUE,
+      separate_gwas = FALSE
+    ),
+    "No individual data pass QC"
+  )
+  expect_false(any(grepl("xQTL-only", calls)))
+  expect_true(any(grepl("Joint GWAS", calls)))
+  expect_null(result$computing_time$Analysis$xqtl_coloc)
+  expect_s3_class(result$computing_time$Analysis$joint_gwas, "difftime")
+})
+
+test_that("pipeline skips joint_gwas when QC removes sumstats but keeps individual data", {
+  ind_region <- make_individual_region_data(n = 20, p = 8, n_contexts = 1, n_events = 2)
+  sumstat_region <- make_sumstat_region_data(n_variants = 5, n_studies = 1)
+  region_data <- list(
+    individual_data = ind_region$individual_data,
+    sumstat_data = sumstat_region$sumstat_data
+  )
+  calls <- character()
+
+  local_mocked_bindings(
+    qc_regional_data = function(region_data, ...) {
+      Y1 <- region_data$individual_data$residual_Y[[1]]
+      colnames(Y1) <- paste0("ctx1_", colnames(Y1))
+      list(
+        individual_data = list(
+          Y = list(ctx1 = Y1),
+          X = list(ctx1 = region_data$individual_data$residual_X[[1]])
+        ),
+        sumstat_data = list(sumstats = NULL)
+      )
+    },
+    .run_colocboost = function(label, ...) {
+      calls <<- c(calls, label)
+      list(result = list(label = label, args = list(...)),
+           time = as.difftime(0, units = "secs"))
+    }
+  )
+
+  expect_message(
+    result <- colocboost_pipeline(
+      region_data,
+      xqtl_coloc = TRUE,
+      joint_gwas = TRUE,
+      separate_gwas = FALSE
+    ),
+    "Skipping follow-up analysis for sumstat studies"
+  )
+  expect_true(any(grepl("xQTL-only", calls)))
+  expect_false(any(grepl("Joint GWAS", calls)))
+  expect_s3_class(result$computing_time$Analysis$xqtl_coloc, "difftime")
+  expect_null(result$computing_time$Analysis$joint_gwas)
+})
+
+test_that("pipeline separate_gwas loops only over sumstats that survive QC", {
+  region_data <- make_sumstat_region_data(n_variants = 5, n_studies = 2)
+  calls <- character()
+
+  local_mocked_bindings(
+    qc_regional_data = function(region_data, ...) {
+      list(
+        individual_data = NULL,
+        sumstat_data = make_qced_sumstat_data("study1")
+      )
+    },
+    .run_colocboost = function(label, ...) {
+      calls <<- c(calls, label)
+      list(result = list(label = label, args = list(...)),
+           time = as.difftime(0, units = "secs"))
+    }
+  )
+
+  expect_message(
+    result <- colocboost_pipeline(
+      region_data,
+      xqtl_coloc = FALSE,
+      joint_gwas = FALSE,
+      separate_gwas = TRUE
+    ),
+    "Skipping follow-up analysis for sumstat studies"
+  )
+  expect_equal(length(calls), 1)
+  expect_true(grepl("study1", calls[[1]]))
+  expect_equal(result$computing_time$Analysis$separate_gwas$n_studies, 1)
+  expect_true("study2" %in% names(result$separate_gwas))
+  expect_null(result$separate_gwas$study2)
+})
+
+test_that("pipeline event filters can remove all events in one context while keeping another", {
+  set.seed(1401)
+  n <- 10
+  p <- 5
+  X <- matrix(rnorm(n * p), n, p)
+  colnames(X) <- paste0("chr1:", seq_len(p) * 100, ":A:G")
+  Y_drop <- matrix(rnorm(n * 2), n, 2)
+  colnames(Y_drop) <- c("bad_event1", "bad_event2")
+  Y_keep <- matrix(rnorm(n * 2), n, 2)
+  colnames(Y_keep) <- c("keep_event", "bad_event")
+  region_data <- list(
+    individual_data = list(
+      residual_Y = list(ctx_drop = Y_drop, ctx_keep = Y_keep),
+      residual_X = list(ctx_drop = X, ctx_keep = X),
+      maf = list(ctx_drop = runif(p, 0.05, 0.45), ctx_keep = runif(p, 0.05, 0.45))
+    ),
+    sumstat_data = NULL
+  )
+  event_filters <- list(list(
+    type_pattern = ".*_event.*",
+    valid_pattern = "keep_event"
+  ))
+  calls <- character()
+
+  local_mocked_bindings(
+    qc_regional_data = function(region_data, ...) {
+      list(
+        individual_data = list(
+          Y = region_data$individual_data$residual_Y,
+          X = region_data$individual_data$residual_X
+        ),
+        sumstat_data = NULL
+      )
+    },
+    .run_colocboost = function(label, ...) {
+      calls <<- c(calls, label)
+      list(result = list(label = label, args = list(...)),
+           time = as.difftime(0, units = "secs"))
+    }
+  )
+
+  expect_message(
+    result <- colocboost_pipeline(
+      region_data,
+      event_filters = event_filters,
+      xqtl_coloc = TRUE,
+      joint_gwas = FALSE,
+      separate_gwas = FALSE
+    ),
+    "Skipping follow-up analysis for individual traits ctx_drop"
+  )
+  expect_equal(length(calls), 1)
+  expect_s3_class(result$computing_time$Analysis$xqtl_coloc, "difftime")
+})
+
 
 # ===========================================================================
 # Tests from test_twas_colocboost_round3.R (colocboost-related)
@@ -809,22 +1874,8 @@ make_sumstat_region_data <- function(n_variants = 5, n_studies = 2) {
 }
 
 
-test_that("filter_valid_sumstats: all valid entries pass through unchanged", {
-  ss1 <- data.frame(z = c(1.0, 2.0), n = c(100, 100), variant = c("v1", "v2"))
-  ss2 <- data.frame(z = c(-1.0, 0.5), n = c(200, 200), variant = c("v3", "v4"))
-
-  sumstats <- list(s1 = ss1, s2 = ss2)
-  LD_mat <- list(s1 = diag(2), s2 = diag(2))
-  LD_match <- c("s1", "s2")
-
-  # No message about removed entries when all are valid
-  result <- pecotmr:::filter_valid_sumstats(sumstats, LD_mat, LD_match)
-  expect_equal(length(result$sumstats), 2)
-  expect_equal(nrow(result$dict_sumstatLD), 2)
-})
-
 # ===========================================================================
-# SECTION C: colocboost_analysis_pipeline - filter_events valid_pattern path
+# SECTION C: colocboost_pipeline - filter_events valid_pattern path
 # (lines 64, 67, 71-72, 74, 82, 84-85)
 # ===========================================================================
 
@@ -869,7 +1920,7 @@ test_that("filter_events: valid_pattern with no matching groups returns NULL (li
 
   # The filter returns NULL for the context -> residual_Y entry becomes NULL
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       event_filters = event_filters,
       xqtl_coloc = TRUE,
@@ -922,7 +1973,7 @@ test_that("filter_events: type_pattern matches nothing skips via next (line 64)"
   )
 
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       event_filters = event_filters,
       xqtl_coloc = TRUE,
@@ -972,7 +2023,7 @@ test_that("filter_events: all events pass -> 'included in following analysis' me
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       event_filters = event_filters,
       xqtl_coloc = TRUE,
@@ -1026,7 +2077,7 @@ test_that("filter_events: valid_pattern with successful groups retains valid eve
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       event_filters = event_filters,
       xqtl_coloc = TRUE,
@@ -1063,7 +2114,7 @@ test_that("extract_contexts_studies: all individual data pass QC (line 126)", {
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = TRUE,
       joint_gwas = FALSE,
@@ -1090,7 +2141,7 @@ test_that("extract_contexts_studies: all individual data fail QC (line 134-135)"
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = TRUE,
       joint_gwas = FALSE,
@@ -1130,7 +2181,7 @@ test_that("extract_contexts_studies: sumstat studies extraction on initial call
 
   # With xqtl_coloc=FALSE, separate_gwas=TRUE -> will enter the sumstat code paths
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1161,7 +2212,7 @@ test_that("extract_contexts_studies: after-QC sumstat all pass (line 144)", {
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1193,7 +2244,7 @@ test_that("extract_contexts_studies: after-QC sumstat partial pass (line 146)",
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1205,7 +2256,7 @@ test_that("extract_contexts_studies: after-QC sumstat partial pass (line 146)",
 })
 
 # ===========================================================================
-# SECTION E: colocboost_analysis_pipeline - sumstat processing block
+# SECTION E: colocboost_pipeline - sumstat processing block
 # (lines 244-281: organizing sumstats, normalizing variant IDs, LD normalization)
 # ===========================================================================
 
@@ -1248,7 +2299,7 @@ test_that("pipeline sumstat block: normalizes variant IDs and processes LD matri
   )
 
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1300,7 +2351,7 @@ test_that("pipeline sumstat block: single sumstat study initializes separate_gwa
   )
 
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1354,7 +2405,7 @@ test_that("pipeline sumstat block: multiple sumstat studies initializes separate
   )
 
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1368,11 +2419,11 @@ test_that("pipeline sumstat block: multiple sumstat studies initializes separate
 })
 
 # ===========================================================================
-# SECTION F: colocboost_analysis_pipeline - filter_valid_sumstats returning NULL
+# SECTION F: colocboost_pipeline - no valid summary statistics after validation
 # (lines 275-276: pipeline with all invalid sumstats -> "No data pass QC")
 # ===========================================================================
 
-test_that("pipeline: all sumstats invalid after filter_valid_sumstats returns No data pass QC (line 275-276)", {
+test_that("pipeline: all sumstats invalid returns No data pass QC (line 275-276)", {
   set.seed(830)
   n_variants <- 3
   vids <- paste0("chr1:", seq_len(n_variants) * 100, ":A:G")
@@ -1414,7 +2465,7 @@ test_that("pipeline: all sumstats invalid after filter_valid_sumstats returns No
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1426,7 +2477,7 @@ test_that("pipeline: all sumstats invalid after filter_valid_sumstats returns No
 })
 
 # ===========================================================================
-# SECTION G: colocboost_analysis_pipeline - focal_trait matching (lines 300-301)
+# SECTION G: colocboost_pipeline - focal_trait matching (lines 300-301)
 # ===========================================================================
 
 test_that("pipeline: focal_trait matches a trait name sets focal_outcome_idx (lines 300-301)", {
@@ -1450,7 +2501,7 @@ test_that("pipeline: focal_trait matches a trait name sets focal_outcome_idx (li
 
   # focal_trait = "ctx1_event2" should match and set focal_outcome_idx
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       focal_trait = "ctx1_event2",
       xqtl_coloc = TRUE,
@@ -1483,7 +2534,7 @@ test_that("pipeline: focal_trait does NOT match leaves focal_outcome_idx NULL (l
 
   # focal_trait is specified but doesn't match any trait
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       focal_trait = "nonexistent_trait",
       xqtl_coloc = TRUE,
@@ -1496,7 +2547,7 @@ test_that("pipeline: focal_trait does NOT match leaves focal_outcome_idx NULL (l
 })
 
 # ===========================================================================
-# SECTION H: colocboost_analysis_pipeline - joint_gwas path (lines 320-323)
+# SECTION H: colocboost_pipeline - joint_gwas path (lines 320-323)
 # ===========================================================================
 
 test_that("pipeline: joint_gwas path is entered with both individual and sumstat data (lines 320-323)", {
@@ -1548,7 +2599,7 @@ test_that("pipeline: joint_gwas path is entered with both individual and sumstat
 
   # joint_gwas=TRUE should trigger the joint GWAS branch (lines 320-323)
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = TRUE,
@@ -1562,7 +2613,7 @@ test_that("pipeline: joint_gwas path is entered with both individual and sumstat
 
 
 # ===========================================================================
-# SECTION I: colocboost_analysis_pipeline - separate_gwas path (lines 341+)
+# SECTION I: colocboost_pipeline - separate_gwas path (lines 341+)
 # ===========================================================================
 
 test_that("pipeline: separate_gwas path is entered for each GWAS study", {
@@ -1611,7 +2662,7 @@ test_that("pipeline: separate_gwas path is entered for each GWAS study", {
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1624,7 +2675,7 @@ test_that("pipeline: separate_gwas path is entered for each GWAS study", {
 })
 
 # ===========================================================================
-# SECTION J: colocboost_analysis_pipeline - all Y NULL after organizing (lines 225-227)
+# SECTION J: colocboost_pipeline - all Y NULL after organizing (lines 225-227)
 # ===========================================================================
 
 test_that("pipeline: all Y become NULL after organizing individual data (lines 225-227)", {
@@ -1644,7 +2695,7 @@ test_that("pipeline: all Y become NULL after organizing individual data (lines 2
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = TRUE,
       joint_gwas = FALSE,
@@ -1656,7 +2707,7 @@ test_that("pipeline: all Y become NULL after organizing individual data (lines 2
 })
 
 # ===========================================================================
-# SECTION K: colocboost_analysis_pipeline - sumstat all z NA (line 252-255)
+# SECTION K: colocboost_pipeline - sumstat all z NA (line 252-255)
 # ===========================================================================
 
 test_that("pipeline: sumstat with all NA z-scores yields warning message (lines 252-255)", {
@@ -1700,7 +2751,7 @@ test_that("pipeline: sumstat with all NA z-scores yields warning message (lines
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1712,7 +2763,7 @@ test_that("pipeline: sumstat with all NA z-scores yields warning message (lines
 })
 
 # ===========================================================================
-# SECTION L: colocboost_analysis_pipeline - no sumstat_data pass QC (line 152)
+# SECTION L: colocboost_pipeline - no sumstat_data pass QC (line 152)
 # ===========================================================================
 
 test_that("extract_contexts_studies: no sumstat data pass QC message", {
@@ -1740,13 +2791,13 @@ test_that("extract_contexts_studies: no sumstat data pass QC message", {
   )
 
   expect_message(
-    result <- colocboost_analysis_pipeline(
+    result <- colocboost_pipeline(
       region_data,
       xqtl_coloc = TRUE,
       joint_gwas = FALSE,
       separate_gwas = FALSE
     ),
-    "No sumstat data pass QC"
+    "Skipping follow-up analysis for sumstat studies"
   )
   expect_type(result, "list")
 })
@@ -1841,7 +2892,7 @@ test_that("qc_regional_data: pip_cutoff_to_skip_ind lookup works when X and Y ha
 })
 
 # ===========================================================================
-# SECTION U: colocboost_analysis_pipeline - sumstat with NA z filtering (line 252-258)
+# SECTION U: colocboost_pipeline - sumstat with NA z filtering (line 252-258)
 # ===========================================================================
 
 test_that("pipeline sumstat processing handles all-NA z-scores with warning (lines 252-258)", {
@@ -1886,7 +2937,7 @@ test_that("pipeline sumstat processing handles all-NA z-scores with warning (lin
 
   # Should produce message about NA z-scores or no valid studies
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1897,7 +2948,7 @@ test_that("pipeline sumstat processing handles all-NA z-scores with warning (lin
 })
 
 # ===========================================================================
-# SECTION V: colocboost_analysis_pipeline - LD matrix normalization (lines 263-269)
+# SECTION V: colocboost_pipeline - LD matrix normalization (lines 263-269)
 # ===========================================================================
 
 test_that("pipeline: LD matrix dimnames are normalized to canonical format (lines 261-269)", {
@@ -1945,7 +2996,7 @@ test_that("pipeline: LD matrix dimnames are normalized to canonical format (line
 
   # normalize_variant_id should add chr prefix to LD matrix dimnames
   result <- suppressMessages(
-    colocboost_analysis_pipeline(
+    colocboost_pipeline(
       region_data,
       xqtl_coloc = FALSE,
       joint_gwas = FALSE,
@@ -1967,9 +3018,11 @@ test_that("qc_regional_data: with only sumstat data processes correctly", {
       LD_mat <- LD_data$LD_matrix
       list(sumstats = sumstats, LD_mat = LD_mat)
     },
-    summary_stats_qc = function(sumstats, LD_data, ...) {
-      LD_mat <- LD_data$LD_matrix
-      list(sumstats = sumstats, LD_mat = LD_mat, outlier_number = 0)
+    summary_stats_qc = function(rss_input = NULL, LD_data, ...) {
+      stats::setNames(lapply(names(rss_input), function(study) {
+        list(rss_input = rss_input[[study]], LD_matrix = LD_data[[study]]$LD_matrix,
+             outlier_number = 0)
+      }), names(rss_input))
     },
     raiss = function(...) {
       list(result_filter = data.frame(z = rnorm(5)), LD_mat = diag(5))
diff --git a/tests/testthat/test_file_utils.R b/tests/testthat/test_file_utils.R
index c93f36c2..541c9935 100644
--- a/tests/testthat/test_file_utils.R
+++ b/tests/testthat/test_file_utils.R
@@ -1543,7 +1543,8 @@ test_that("load_multitask_regional_data errors with multiple genotypes and no ma
       region = "chr1:1-1000",
       genotype_list = c("geno1.bed", "geno2.bed"),
       phenotype_list = c("pheno1.gz"),
-      covariate_list = c("covar1.gz")
+      covariate_list = c("covar1.gz"),
+      conditions_list_individual = "cond1"
     ),
     "match_geno_pheno"
   )
@@ -1570,6 +1571,164 @@ test_that("load_multitask_regional_data individual-level path returns expected s
   expect_true("chrom" %in% names(result$individual_data))
 })
 
+test_that("load_multitask_regional_data loads and merges multiple genotype groups", {
+  calls <- list()
+  make_mock_dat <- function(conditions) {
+    x <- setNames(lapply(conditions, function(nm) {
+      matrix(1, nrow = 2, ncol = 2,
+             dimnames = list(c("s1", "s2"), paste0(nm, "_v", 1:2)))
+    }), conditions)
+    y <- setNames(lapply(conditions, function(nm) {
+      matrix(1, nrow = 2, ncol = 1,
+             dimnames = list(c("s1", "s2"), nm))
+    }), conditions)
+    list(
+      residual_Y = y,
+      residual_X = x,
+      residual_Y_scalar = setNames(as.list(rep(1, length(conditions))), conditions),
+      residual_X_scalar = setNames(as.list(rep(1, length(conditions))), conditions),
+      dropped_sample = list(X = list(), Y = list(), covar = list()),
+      covar = list(),
+      Y = y,
+      X_data = x,
+      X = do.call(cbind, x),
+      maf = setNames(lapply(x, function(mat) rep(0.1, ncol(mat))), conditions),
+      chrom = "chr1",
+      grange = c("1", "100"),
+      Y_coordinates = list()
+    )
+  }
+
+  local_mocked_bindings(
+    load_regional_univariate_data = function(genotype, phenotype, covariate,
+                                             conditions, extract_region_name, ...) {
+      calls[[length(calls) + 1]] <<- list(
+        genotype = genotype,
+        phenotype = phenotype,
+        covariate = covariate,
+        conditions = conditions,
+        extract_region_name = extract_region_name
+      )
+      make_mock_dat(conditions)
+    }
+  )
+
+  result <- expect_warning(
+    load_multitask_regional_data(
+      region = "chr1:1-100",
+      genotype_list = c("geno1", "geno2"),
+      phenotype_list = paste0("pheno", 1:4),
+      covariate_list = paste0("covar", 1:4),
+      conditions_list_individual = paste0("cond", 1:4),
+      match_geno_pheno = c(1, 1, 2, 2),
+      extract_region_name = as.list(paste0("gene", 1:4)),
+      region_name_col = 4
+    ),
+    NA
+  )
+
+  expect_equal(length(calls), 2L)
+  expect_equal(vapply(calls, `[[`, character(1), "genotype"), c("geno1", "geno2"))
+  expect_equal(calls[[1]]$phenotype, paste0("pheno", 1:2))
+  expect_equal(calls[[2]]$phenotype, paste0("pheno", 3:4))
+  expect_equal(calls[[1]]$extract_region_name, as.list(paste0("gene", 1:2)))
+  expect_equal(calls[[2]]$extract_region_name, as.list(paste0("gene", 3:4)))
+  expect_true("residual_X" %in% names(result$individual_data))
+  expect_equal(names(result$individual_data$residual_X), paste0("cond", 1:4))
+})
+
+test_that("load_multitask_regional_data defaults missing individual condition names", {
+  calls <- list()
+  make_mock_dat <- function(conditions) {
+    x <- setNames(lapply(conditions, function(nm) {
+      matrix(1, nrow = 2, ncol = 2,
+             dimnames = list(c("s1", "s2"), paste0(nm, "_v", 1:2)))
+    }), conditions)
+    y <- setNames(lapply(conditions, function(nm) {
+      matrix(1, nrow = 2, ncol = 1,
+             dimnames = list(c("s1", "s2"), nm))
+    }), conditions)
+    list(
+      residual_Y = y,
+      residual_X = x,
+      residual_Y_scalar = setNames(as.list(rep(1, length(conditions))), conditions),
+      residual_X_scalar = setNames(as.list(rep(1, length(conditions))), conditions),
+      dropped_sample = list(X = list(), Y = list(), covar = list()),
+      covar = list(),
+      Y = y,
+      X_data = x,
+      X = do.call(cbind, x),
+      maf = setNames(lapply(x, function(mat) rep(0.1, ncol(mat))), conditions),
+      chrom = "chr1",
+      grange = c("1", "100"),
+      Y_coordinates = list()
+    )
+  }
+
+  local_mocked_bindings(
+    load_regional_univariate_data = function(genotype, phenotype, covariate,
+                                             conditions, ...) {
+      calls[[length(calls) + 1]] <<- list(
+        genotype = genotype,
+        phenotype = phenotype,
+        covariate = covariate,
+        conditions = conditions
+      )
+      make_mock_dat(conditions)
+    }
+  )
+
+  result <- expect_warning(
+    load_multitask_regional_data(
+      region = "chr1:1-100",
+      genotype_list = c("geno1", "geno2"),
+      phenotype_list = paste0("pheno", 1:4),
+      covariate_list = paste0("covar", 1:4),
+      match_geno_pheno = c(1, 1, 2, 2)
+    ),
+    "conditions_list_individual"
+  )
+
+  expect_equal(length(calls), 2L)
+  expect_equal(calls[[1]]$conditions, paste0("condition", 1:2))
+  expect_equal(calls[[2]]$conditions, paste0("condition", 3:4))
+  expect_equal(names(result$individual_data$residual_X), paste0("condition", 1:4))
+})
+
+test_that("load_multitask_regional_data validates individual input vector lengths", {
+  expect_error(
+    load_multitask_regional_data(
+      region = "chr1:1-100",
+      genotype_list = "geno",
+      phenotype_list = paste0("pheno", 1:2),
+      covariate_list = "covar",
+      conditions_list_individual = paste0("cond", 1:2)
+    ),
+    "phenotype_list"
+  )
+  expect_error(
+    load_multitask_regional_data(
+      region = "chr1:1-100",
+      genotype_list = "geno",
+      phenotype_list = paste0("pheno", 1:2),
+      covariate_list = paste0("covar", 1:2),
+      conditions_list_individual = "cond"
+    ),
+    "conditions_list_individual"
+  )
+  expect_error(
+    load_multitask_regional_data(
+      region = "chr1:1-100",
+      genotype_list = c("geno1", "geno2"),
+      phenotype_list = paste0("pheno", 1:2),
+      covariate_list = paste0("covar", 1:2),
+      conditions_list_individual = paste0("cond", 1:2),
+      match_geno_pheno = c(1, 3)
+    ),
+    "match_geno_pheno"
+  )
+})
+
 test_that("load_multitask_regional_data sumstat path returns expected structure", {
   skip_if_not_installed("pgenlibr")
   td <- test_path("test_data")
diff --git a/tests/testthat/test_raiss.R b/tests/testthat/test_raiss.R
index 2315798b..085da5cb 100644
--- a/tests/testthat/test_raiss.R
+++ b/tests/testthat/test_raiss.R
@@ -1390,6 +1390,68 @@ test_that("X path R2 filtering matches R path", {
                info = "Same variants should pass R2/LD filtering")
 })
 
+test_that("raw genotype_matrix path is not equivalent to LD path used by legacy pipeline", {
+  set.seed(1)
+  n <- 80
+  p <- 40
+  n_known <- 20
+  X_raw <- matrix(sample(0:2, n * p, replace = TRUE,
+                         prob = c(0.35, 0.45, 0.20)),
+                  nrow = n, ncol = p)
+  ref_panel <- data.frame(
+    chrom = rep(1, p),
+    pos = seq_len(p) * 100,
+    variant_id = paste0("rs", seq_len(p)),
+    A1 = rep("A", p),
+    A2 = rep("G", p),
+    stringsAsFactors = FALSE
+  )
+  colnames(X_raw) <- ref_panel$variant_id
+
+  known_idx <- sort(sample(seq_len(p), n_known))
+  known_zscores <- data.frame(
+    chrom = rep(1, n_known),
+    pos = ref_panel$pos[known_idx],
+    variant_id = ref_panel$variant_id[known_idx],
+    A1 = ref_panel$A1[known_idx],
+    A2 = ref_panel$A2[known_idx],
+    z = rnorm(n_known),
+    stringsAsFactors = FALSE
+  )
+
+  R <- compute_LD(X_raw, method = "sample")
+  rownames(R) <- colnames(R) <- ref_panel$variant_id
+  X_scaled <- scale(X_raw)
+  X_scaled[is.na(X_scaled)] <- 0
+  colnames(X_scaled) <- ref_panel$variant_id
+
+  result_LD <- raiss(ref_panel, known_zscores, LD_matrix = R,
+                     lamb = 0.01, rcond = 0.01,
+                     R2_threshold = 0.6, minimum_ld = 0,
+                     verbose = FALSE)
+  result_raw_X <- raiss(ref_panel, known_zscores, genotype_matrix = X_raw,
+                        lamb = 0.01, svd_tol = 1e-8,
+                        R2_threshold = 0.6, minimum_ld = 0,
+                        verbose = FALSE)
+  result_scaled_X <- raiss(ref_panel, known_zscores, genotype_matrix = X_scaled,
+                           lamb = 0.01, svd_tol = 1e-12,
+                           R2_threshold = 0.6, minimum_ld = 0,
+                           verbose = FALSE)
+
+  ld_ids <- sort(result_LD$result_filter$variant_id)
+  raw_x_ids <- sort(result_raw_X$result_filter$variant_id)
+  scaled_x_ids <- sort(result_scaled_X$result_filter$variant_id)
+
+  expect_false(identical(ld_ids, raw_x_ids))
+  expect_gt(nrow(result_raw_X$result_filter), nrow(result_LD$result_filter))
+  expect_equal(scaled_x_ids, ld_ids)
+
+  ld_sorted <- result_LD$result_nofilter %>% arrange(variant_id)
+  scaled_sorted <- result_scaled_X$result_nofilter %>% arrange(variant_id)
+  expect_equal(ld_sorted$z, scaled_sorted$z, tolerance = 1e-10)
+  expect_equal(ld_sorted$raiss_R2, scaled_sorted$raiss_R2, tolerance = 1e-10)
+})
+
 test_that("raiss rejects both LD_matrix and genotype_matrix", {
   data <- generate_X_test_data(n = 50, p = 20, n_known = 10, seed = 1)
   expect_error(
@@ -1454,4 +1516,4 @@ test_that("X path with collinear variants matches R path", {
 
   expect_equal(r_sorted$z, x_sorted$z, tolerance = 1e-3,
                info = "Collinear case: z-scores should be close")
-})
\ No newline at end of file
+})
diff --git a/tests/testthat/test_sumstats_qc.R b/tests/testthat/test_sumstats_qc.R
index 2159404e..366596f0 100644
--- a/tests/testthat/test_sumstats_qc.R
+++ b/tests/testthat/test_sumstats_qc.R
@@ -257,6 +257,19 @@ test_that("rss_basic_qc handles multiple skip regions", {
   expect_true(500 %in% remaining_pos)
 })
 
+test_that("rss_basic_qc can skip LD matrix subsetting for genotype references", {
+  td <- make_test_sumstats_ld(n_variants = 3)
+  X_ref <- matrix(rnorm(30), 10, 3)
+  colnames(X_ref) <- td$variant_ids
+  td$LD_data$LD_matrix <- X_ref
+  td$LD_data$is_genotype <- TRUE
+
+  result <- rss_basic_qc(td$sumstats, td$LD_data, return_LD_mat = FALSE)
+
+  expect_true(nrow(result$sumstats) > 0)
+  expect_null(result$LD_mat)
+})
+
 # ===========================================================================
 # summary_stats_qc
 # ===========================================================================
@@ -391,6 +404,110 @@ test_that("summary_stats_qc returns LD_mat matching filtered sumstats dimensions
   expect_equal(ncol(result$LD_mat), nrow(result$sumstats))
 })
 
+test_that("summary_stats_qc basic genotype-backed path does not compute LD", {
+  td <- make_test_sumstats_ld(n_variants = 5)
+  X_ref <- matrix(rnorm(50), 10, 5)
+  colnames(X_ref) <- td$variant_ids
+  td$LD_data$LD_matrix <- X_ref
+  td$LD_data$is_genotype <- TRUE
+  rss_input <- list(sumstats = td$sumstats, n = 1000, var_y = 1)
+
+  local_mocked_bindings(
+    compute_LD = function(...) stop("compute_LD should not be called")
+  )
+
+  expect_message(
+    result <- summary_stats_qc(rss_input = rss_input, LD_data = td$LD_data,
+                               qc_method = "none", impute = FALSE),
+    "basic harmonization retained"
+  )
+  expect_equal(nrow(result$LD_matrix), nrow(X_ref))
+  expect_equal(ncol(result$LD_matrix), nrow(result$rss_input$sumstats))
+})
+
+test_that("summary_stats_qc PIP screening uses LD-independent SER", {
+  td <- make_test_sumstats_ld(n_variants = 5)
+  X_ref <- matrix(rnorm(50), 10, 5)
+  colnames(X_ref) <- td$variant_ids
+  td$LD_data$LD_matrix <- X_ref
+  td$LD_data$is_genotype <- TRUE
+  rss_input <- list(sumstats = td$sumstats, n = 1000, var_y = 1)
+
+  local_mocked_bindings(
+    compute_LD = function(...) stop("compute_LD should not be called"),
+    susie_rss = function(z, R = NULL, X = NULL, ...) {
+      expect_null(X)
+      expect_equal(R, diag(length(z)))
+      list(pip = rep(1, length(z)))
+    }
+  )
+
+  result <- suppressMessages(summary_stats_qc(
+    rss_input = rss_input,
+    LD_data = td$LD_data,
+    qc_method = "none",
+    pip_cutoff_to_skip = 0.1,
+    impute = FALSE
+  ))
+  expect_equal(ncol(result$LD_matrix), nrow(result$rss_input$sumstats))
+})
+
+test_that("summary_stats_qc treats NULL qc_method as basic-only none", {
+  td <- make_test_sumstats_ld(n_variants = 5)
+  rss_input <- list(sumstats = td$sumstats, n = 1000, var_y = 1)
+
+  local_mocked_bindings(
+    ld_mismatch_qc = function(...) stop("ld_mismatch_qc should not be called")
+  )
+
+  expect_message(
+    result <- summary_stats_qc(
+      rss_input = rss_input,
+      LD_data = td$LD_data,
+      qc_method = NULL,
+      impute = FALSE
+    ),
+    "basic harmonization retained"
+  )
+  expect_equal(nrow(result$rss_input$sumstats), nrow(td$sumstats))
+})
+
+test_that("summary_stats_qc LD-mismatch QC computes only filtered local LD from X_ref", {
+  td <- make_test_sumstats_ld(n_variants = 5)
+  X_ref <- matrix(rnorm(50), 10, 5)
+  colnames(X_ref) <- td$variant_ids
+  td$LD_data$LD_matrix <- X_ref
+  td$LD_data$is_genotype <- TRUE
+  rss_input <- list(sumstats = td$sumstats, n = 1000, var_y = 1)
+  compute_calls <- 0
+
+  local_mocked_bindings(
+    compute_LD = function(X, ...) {
+      compute_calls <<- compute_calls + 1
+      expect_equal(ncol(X), 3)
+      R <- diag(ncol(X))
+      rownames(R) <- colnames(R) <- colnames(X)
+      R
+    },
+    ld_mismatch_qc = function(zScore, R, nSample = NULL, method = NULL, ...) {
+      expect_equal(nrow(R), length(zScore))
+      expect_equal(ncol(R), length(zScore))
+      data.frame(outlier = rep(FALSE, length(zScore)))
+    }
+  )
+
+  result <- suppressMessages(summary_stats_qc(
+    rss_input = rss_input,
+    LD_data = td$LD_data,
+    qc_method = "slalom",
+    skip_region = "1:150-350",
+    impute = FALSE
+  ))
+  expect_equal(compute_calls, 1)
+  expect_equal(ncol(result$LD_matrix), nrow(result$rss_input$sumstats))
+  expect_equal(ncol(result$LD_matrix), 3)
+})
+
 # ===========================================================================
 # ld_mismatch_qc
 # ===========================================================================
diff --git a/tests/testthat/test_univariate_pipeline.R b/tests/testthat/test_univariate_pipeline.R
index ba3b2ff7..cb32abee 100644
--- a/tests/testthat/test_univariate_pipeline.R
+++ b/tests/testthat/test_univariate_pipeline.R
@@ -971,7 +971,10 @@ test_that("rss: pip_cutoff_to_skip > 0, signal detected => continues", {
     load_rss_data = function(...) list(sumstats = ss, n = 1000, var_y = 1),
     rss_basic_qc = function(...) list(sumstats = ss, LD_mat = ld_mat),
     susie_rss = function(...) list(pip = c(0.9, 0.01, 0.01, 0.01, 0.01)),
-    summary_stats_qc = function(...) list(sumstats = ss, LD_mat = ld_mat, outlier_number = 0),
+    summary_stats_qc = function(...) {
+      message("Follow-up on region: signals above PIP threshold 0.5 detected.")
+      list(sumstats = ss, LD_mat = ld_mat, outlier_number = 0)
+    },
     partition_LD_matrix = function(...) ld_mat,
     raiss = function(...) list(result_filter = ss, LD_mat = ld_mat),
     susie_rss_pipeline = function(...) fake_rss_result,
@@ -1179,7 +1182,7 @@ test_that("rss: finemapping_method = NULL skips fine-mapping", {
 #  SECTION 14: rss_analysis_pipeline - QC = NULL skips QC
 # ========================================================================
 
-test_that("rss: qc_method = NULL skips quality control", {
+test_that("rss: qc_method = NULL uses combined basic QC without LD-mismatch method", {
   ss <- make_rss_sumstats(5)
   ld_mat <- make_rss_ld_mat(5)
   fake_result <- make_fake_post_result(5)
@@ -1204,7 +1207,7 @@ test_that("rss: qc_method = NULL skips quality control", {
     finemapping_method = "susie_rss"
   )
 
-  expect_false(qc_called)
+  expect_true(qc_called)
 })
 
 # ========================================================================
@@ -1947,7 +1950,7 @@ test_that("load_study_LD with comma-separated paths builds list of X panels", {
 #  rss_analysis_pipeline: X-as-LD path (genotype interface)
 # ========================================================================
 
-test_that("rss: is_genotype=TRUE path computes R from X and uses X for fine-mapping", {
+test_that("rss: is_genotype=TRUE path does not precompute R and uses X for fine-mapping", {
   ss <- make_rss_sumstats(5)
   ld_mat <- make_rss_ld_mat(5)
   X_geno <- matrix(rnorm(20 * 5), nrow = 20, ncol = 5)
@@ -1961,7 +1964,7 @@ test_that("rss: is_genotype=TRUE path computes R from X and uses X for fine-mapp
   local_mocked_bindings(
     compute_LD = function(X, method = "sample") {
       compute_LD_called <<- TRUE
-      ld_mat
+      stop("rss_analysis_pipeline should not precompute LD from X")
     },
     load_rss_data = function(...) list(sumstats = ss, n = 1000, var_y = 1),
     rss_basic_qc = function(...) list(sumstats = ss, LD_mat = ld_mat),
@@ -1983,7 +1986,7 @@ test_that("rss: is_genotype=TRUE path computes R from X and uses X for fine-mapp
     finemapping_method = "susie_rss"
   )
 
-  expect_true(compute_LD_called)
+  expect_false(compute_LD_called)
   expect_true(is.matrix(susie_X_arg))
   expect_null(susie_LD_arg)
   expect_true(any(grepl("susie_rss", names(result))))
@@ -1998,9 +2001,13 @@ test_that("rss: mixture LD_data (list of X panels) preserves list shape into sus
   fake_result <- make_fake_post_result(5)
 
   susie_X_arg <- NULL
+  compute_LD_called <- FALSE
 
   local_mocked_bindings(
-    compute_LD = function(X, method = "sample") ld_mat,
+    compute_LD = function(X, method = "sample") {
+      compute_LD_called <<- TRUE
+      stop("rss_analysis_pipeline should not precompute LD from mixture X")
+    },
     load_rss_data = function(...) list(sumstats = ss, n = 1000, var_y = 1),
     rss_basic_qc = function(...) list(sumstats = ss, LD_mat = ld_mat),
     summary_stats_qc = function(...) list(sumstats = ss, LD_mat = ld_mat, outlier_number = 0),
@@ -2021,6 +2028,7 @@ test_that("rss: mixture LD_data (list of X panels) preserves list shape into sus
   )
 
   # Mixture path => list of subset matrices passed to susie_rss_pipeline as X_mat
+  expect_false(compute_LD_called)
   expect_true(is.list(susie_X_arg))
   expect_length(susie_X_arg, 2)
   expect_true(all(sapply(susie_X_arg, is.matrix)))
diff --git a/vignettes/colocboost-pipeline.Rmd b/vignettes/colocboost-pipeline.Rmd
index 86857ccb..7a4e65eb 100644
--- a/vignettes/colocboost-pipeline.Rmd
+++ b/vignettes/colocboost-pipeline.Rmd
@@ -25,7 +25,7 @@ This vignette demonstrates how to use the bioinformatics pipeline for ColocBoost
 
 Acknowledgment: Thanks to Kate (Kathryn) Lawrence (GitHub:@kal26) for her contributions to this vignette.
 
-# 1. Loading Data using `colocboost_analysis_pipeline` function
+# 1. Loading Data using `colocboost_pipeline` function
 
 This function harmonizes the input data and prepares it for colocalization analysis. 
 
@@ -34,7 +34,7 @@ In this section, we introduce how to load the regional data required for the Col
 This function loads mixed datasets for a specific region, including individual-level data (genotype, phenotype, covariate data), summary statistics 
 (sumstats, LD), or a combination of both. It runs `load_regional_univariate_data` for each individual-level dataset and `load_rss_data` for each summary statistics dataset.
 The output is a list with `individual_data` and `sumstat_data` components, where `individual_data` is a list of individual-level data and `sumstat_data` is a list of summary statistics data. 
-This list is then passed to the `colocboost_analysis_pipeline` function for the colocalization analysis.
+This list is then passed to the `colocboost_pipeline` function for the colocalization analysis.
 
 
 Below are the input parameters for this function for loading individual-level data:
@@ -201,9 +201,9 @@ The LD metadata file is a tab-separated file with the following columns:
 ```
 
 
-# 2. Perform ColocBoost using `colocboost_analysis_pipeline` function
+# 2. Perform ColocBoost using `colocboost_pipeline` function
 
-In this section, we load region data for a combination of individual-level and summary statistics data, then perform the colocalization analysis using the `colocboost_analysis_pipeline` function. 
+In this section, we load region data for a combination of individual-level and summary statistics data, then perform the colocalization analysis using the `colocboost_pipeline` function.
 The colocalization analysis can be run in any one of three modes, or in a combination of these modes (names assume that individual-level data is xQTL data and summary statistics data is GWAS data):
 
 - **`xQTL-only mode`**: Only perform colocalization analysis on the individual-level data. Summary statistics data is not used.
@@ -228,7 +228,7 @@ Example: for sQTL, `list(type_pattern = ".*clu_(\\d+_[+-?]).*", valid_pattern =
 
 Outputs:
 
-- **`colocboost_results`**: List of colocboost objects (with `xqtl_coloc`, `joint_gwas`, `separate_gwas`); Output of the `colocboost_analysis_pipeline` function. If the mode is not run, the corresponding element will be `NULL`.
+- **`colocboost_results`**: List of colocboost objects (with `xqtl_coloc`, `joint_gwas`, `separate_gwas`); Output of the `colocboost_pipeline` function. If the mode is not run, the corresponding element will be `NULL`.
 
 ```{r, colocboost-analysis, eval = FALSE}
 #### Please check the example code below ####
@@ -265,7 +265,7 @@ pip_cutoff_to_skip_sumstat = rep(0, length(sumstat_path_list))
 qc_method = "slalom"
 
 # run colocboost analysis
-colocboost_results <- colocboost_analysis_pipeline(
+colocboost_results <- colocboost_pipeline(
     region_data_combined,
     maf_cutoff = maf_cutoff, 
     pip_cutoff_to_skip_ind = pip_cutoff_to_skip_ind,
@@ -286,4 +286,4 @@ colocboost_plot(colocboost_results$joint_gwas)
 for (i in 1:length(colocboost_results$separate_gwas)) {
     colocboost_plot(colocboost_results$separate_gwas[[i]])
 }
-```
\ No newline at end of file
+```

From 11db15185c4a637e0c74c4f32029f0430211d08a Mon Sep 17 00:00:00 2001
From: xueweic <xueweic@users.noreply.github.com>
Date: Sun, 24 May 2026 16:27:11 +0000
Subject: [PATCH 2/5] Update documentation

---
 man/colocboost_analysis.Rd             | 31 ++++++------
 man/colocboost_pipeline.Rd             | 20 +++++---
 man/qc_individual_data.Rd              | 21 +++++---
 man/region_data_to_colocboost_input.Rd |  8 +--
 man/region_data_to_ind_input.Rd        |  8 +--
 man/region_data_to_rss_input.Rd        |  7 +--
 man/summary_stats_qc.Rd                | 68 +++++++++++++-------------
 7 files changed, 92 insertions(+), 71 deletions(-)

diff --git a/man/colocboost_analysis.Rd b/man/colocboost_analysis.Rd
index e7b23584..c5249c57 100644
--- a/man/colocboost_analysis.Rd
+++ b/man/colocboost_analysis.Rd
@@ -1,4 +1,5 @@
 % Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/colocboost_pipeline.R
 \name{colocboost_analysis}
 \alias{colocboost_analysis}
 \title{ColocBoost analysis with optional pipeline QC}
@@ -26,20 +27,21 @@ data inputs such as \code{X}, \code{Y}, \code{sumstat}, \code{LD},
 \code{outcome_names}, and all ColocBoost model/post-processing options.
 QC can only inspect inputs that are supplied by name.}
 
-\item{missing_rate_thresh, maf_cutoff, xvar_cutoff, ld_reference_meta_file, pip_cutoff_to_skip_ind}{Individual-level QC controls.
-If all are \code{NULL}, individual-level QC is not run.}
+\item{missing_rate_thresh, maf_cutoff, xvar_cutoff, ld_reference_meta_file, pip_cutoff_to_skip_ind}{Individual-level QC controls. If all are \code{NULL}, individual-level QC
+is not run.}
 
-\item{keep_indel, pip_cutoff_to_skip_sumstat, qc_method, impute, impute_opts}{Summary-statistic QC controls.
-\code{qc_method = "none"} and \code{qc_method = "rss_qc"} run basic allele
-harmonization without LD-mismatch outlier detection. Imputation is only run
-when \code{impute = TRUE}.}
+\item{keep_indel, pip_cutoff_to_skip_sumstat, qc_method, impute, impute_opts}{Summary-statistic QC controls. \code{qc_method = "none"} and
+\code{qc_method = "rss_qc"} run basic allele harmonization without
+LD-mismatch outlier detection. Imputation is only run when
+\code{impute = TRUE}.}
 
 \item{LD_reference_info}{Optional LD reference information for
 summary-statistic QC. This is only needed when the native \code{LD} matrix
-row/column names or \code{X_ref} column names are missing or are not parseable
-genomic variant IDs. It can be a .bim/.pvar/.pvar.zst file path, a data.frame
-with variant metadata, or a \code{load_LD_matrix()} result. This is a QC-only
-argument and is not passed to \code{colocboost::colocboost()}.}
+row/column names or \code{X_ref} column names are missing or are not
+parseable genomic variant IDs. It can be a .bim/.pvar/.pvar.zst file path,
+a data.frame with variant metadata, or a \code{load_LD_matrix()} result.
+This is a QC-only argument and is not passed to
+\code{colocboost::colocboost()}.}
 
 \item{variant_convention}{Allele order used by native ColocBoost-style
 \code{sumstat$variant} and LD/X_ref names when deriving QC inputs:
@@ -82,10 +84,11 @@ LD-mismatch QC. RAISS imputation is controlled separately by
 \code{impute = TRUE}.
 
 If no QC controls are supplied, this function is a thin direct call to
-\code{colocboost::colocboost(...)}. When QC removes outcomes,
-\code{outcome_names} and \code{focal_outcome_idx} are updated to match the
-post-QC outcome order. If the requested focal outcome is removed by QC,
-\code{focal_outcome_idx} is set to \code{NULL} with a warning.
+\code{colocboost::colocboost(...)}.
+When QC removes outcomes, \code{outcome_names} and \code{focal_outcome_idx}
+are updated to match the post-QC outcome order. If the requested focal outcome
+is removed by QC, \code{focal_outcome_idx} is set to \code{NULL} with a
+warning.
 }
 \examples{
 \dontrun{
diff --git a/man/colocboost_pipeline.Rd b/man/colocboost_pipeline.Rd
index a708b883..ace84f3f 100644
--- a/man/colocboost_pipeline.Rd
+++ b/man/colocboost_pipeline.Rd
@@ -34,18 +34,26 @@ colocboost_pipeline(
 
 \item{pip_cutoff_to_skip_sumstat}{A vector of cutoff values for skipping analysis based on PIP values for each sumstat Default is 0.}
 
-\item{qc_method}{Quality control method to use. Options are "none", "rss_qc", "slalom", or "dentist". \code{NULL} and \code{"rss_qc"} are treated as \code{"none"} for basic-only summary-stat preprocessing.}
+\item{qc_method}{Quality control method to use. Options are "none",
+"rss_qc", "slalom", or "dentist". \code{NULL} and \code{"rss_qc"}
+are treated as \code{"none"} for basic-only summary-stat preprocessing.}
 
 \item{impute}{Logical; if TRUE, performs imputation for outliers identified in the analysis (default: TRUE).}
 
 \item{impute_opts}{A list of imputation options including rcond, R2_threshold, and minimum_ld (default: list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5)).}
-
-\item{...}{Additional arguments passed to the ColocBoost analysis calls.}
 }
 \value{
-A list containing \code{xqtl_coloc}, \code{joint_gwas}, \code{separate_gwas},
-and \code{computing_time}. Analyses that are not requested or cannot be run
-return \code{NULL} in their corresponding result slot.
+A list containing the individual_data and sumstat_data after QC:
+individual_data contains the following components if exist
+\itemize{
+  \item Y: A list of residualized phenotype values for all tasks.
+  \item X: A list of residualized genotype matrices all tasks.
+}
+sumstat_data contains the following components if exist
+\itemize{
+  \item sumstats: A list of summary statistics f or the matched LD_info, each sublist contains sumstats, n, var_y from \code{load_rss_data}.
+  \item LD_info: A list of LD information, each sublist contains LD_variants, LD_matrix, ref_panel  \code{load_LD_matrix}.
+}
 }
 \description{
 This function performs protocol-level multi-trait colocalization using
diff --git a/man/qc_individual_data.Rd b/man/qc_individual_data.Rd
index 1f1e070f..677b7ba4 100644
--- a/man/qc_individual_data.Rd
+++ b/man/qc_individual_data.Rd
@@ -1,4 +1,5 @@
 % Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/colocboost_pipeline.R
 \name{qc_individual_data}
 \alias{qc_individual_data}
 \title{Run reusable individual-level QC}
@@ -19,22 +20,30 @@ qc_individual_data(
 }
 \arguments{
 \item{X}{Genotype matrix or named list of genotype matrices.}
+
 \item{Y}{Phenotype vector/matrix or named list of phenotype matrices.}
+
 \item{maf}{Optional MAF vector or named list.}
+
 \item{X_variance}{Optional variant variance vector or named list.}
+
 \item{missing_rate_thresh}{Maximum missing genotype rate.}
+
 \item{maf_cutoff}{Minimum MAF cutoff.}
+
 \item{xvar_cutoff}{Minimum genotype variance cutoff.}
+
 \item{ld_reference_meta_file}{Optional LD reference metadata file.}
-\item{keep_indel}{Whether indel variants are kept during LD-reference filtering.}
+
+\item{keep_indel}{Whether indel variants are kept during LD-reference
+filtering.}
+
 \item{pip_cutoff_to_skip}{Optional single-effect PIP cutoff.}
-\item{context}{Optional context label for matrix inputs.}
 }
 \value{
-A named list of cleaned context-level \code{X}/\code{Y} records, or one
-cleaned record for matrix inputs.
+A named list of cleaned context-level \code{X}/\code{Y} records, or
+  one cleaned record for matrix inputs.
 }
 \description{
-Runs individual-level preprocessing by reusing existing pecotmr genotype
-filtering helpers and optional single-effect screening.
+Run reusable individual-level QC
 }
diff --git a/man/region_data_to_colocboost_input.Rd b/man/region_data_to_colocboost_input.Rd
index b49f57dc..3f2f50dc 100644
--- a/man/region_data_to_colocboost_input.Rd
+++ b/man/region_data_to_colocboost_input.Rd
@@ -1,4 +1,5 @@
 % Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/colocboost_pipeline.R
 \name{region_data_to_colocboost_input}
 \alias{region_data_to_colocboost_input}
 \title{Convert loaded regional data to ColocBoost inputs}
@@ -9,10 +10,9 @@ region_data_to_colocboost_input(region_data)
 \item{region_data}{A list returned by \code{load_multitask_regional_data()}.}
 }
 \value{
-A structured list containing \code{colocboost_input}, \code{qc_input}, and
-\code{source_info}.
+A structured list containing \code{colocboost_input},
+  \code{qc_input}, and \code{source_info}.
 }
 \description{
-Builds direct ColocBoost input records plus QC input records from loaded
-regional data.
+Convert loaded regional data to ColocBoost inputs
 }
diff --git a/man/region_data_to_ind_input.Rd b/man/region_data_to_ind_input.Rd
index f3ee98e4..b1af764b 100644
--- a/man/region_data_to_ind_input.Rd
+++ b/man/region_data_to_ind_input.Rd
@@ -1,4 +1,5 @@
 % Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/file_utils.R
 \name{region_data_to_ind_input}
 \alias{region_data_to_ind_input}
 \title{Convert loaded regional data to individual-level inputs}
@@ -9,10 +10,9 @@ region_data_to_ind_input(region_data)
 \item{region_data}{A list returned by \code{load_multitask_regional_data()}.}
 }
 \value{
-A list containing \code{X}, \code{Y}, \code{maf}, \code{X_variance}, and
-source information.
+A list containing \code{X}, \code{Y}, \code{maf},
+  \code{X_variance}, and source information.
 }
 \description{
-Extracts individual-level matrices from loaded regional data without running
-QC or model fitting.
+Convert loaded regional data to individual-level inputs
 }
diff --git a/man/region_data_to_rss_input.Rd b/man/region_data_to_rss_input.Rd
index b6006c01..62e17229 100644
--- a/man/region_data_to_rss_input.Rd
+++ b/man/region_data_to_rss_input.Rd
@@ -1,4 +1,5 @@
 % Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/file_utils.R
 \name{region_data_to_rss_input}
 \alias{region_data_to_rss_input}
 \title{Convert loaded regional data to RSS inputs}
@@ -9,9 +10,9 @@ region_data_to_rss_input(region_data)
 \item{region_data}{A list returned by \code{load_multitask_regional_data()}.}
 }
 \value{
-A list containing named RSS inputs, matched LD data, and source information.
+A list containing named RSS inputs, matched LD data, and source
+  information.
 }
 \description{
-Extracts loaded RSS summary-statistic records and their matched LD/reference
-data without running QC or model fitting.
+Convert loaded regional data to RSS inputs
 }
diff --git a/man/summary_stats_qc.Rd b/man/summary_stats_qc.Rd
index 139c2aa7..4ba39714 100644
--- a/man/summary_stats_qc.Rd
+++ b/man/summary_stats_qc.Rd
@@ -40,21 +40,21 @@ combined RSS/ColocBoost QC workflow. A single RSS record is detected by
 structure, not by the name \code{sumstats}, so a multi-study list may safely
 include a study named \code{"sumstats"}.}
 
-\item{keep_indel, skip_region, pip_cutoff_to_skip, qc_method, impute, impute_opts, study, var_y, return_on_skip, R_finite, R_mismatch}{Additional controls for the combined RSS/ColocBoost QC workflow.
-They are ignored by the historical LD-mismatch-only call unless
-\code{rss_input} or combined-QC options are supplied.}
+\item{keep_indel, skip_region, pip_cutoff_to_skip, qc_method, impute, impute_opts, study, var_y, return_on_skip, R_finite, R_mismatch}{Additional controls for the combined RSS/ColocBoost QC workflow. They are
+ignored by the historical LD-mismatch-only call unless \code{rss_input} or
+combined-QC options are supplied.}
 }
 \value{
-A list containing the quality-controlled summary statistics and updated LD
-matrix for the historical call:
-\itemize{
-  \item sumstats: The quality-controlled summary statistics data frame.
-  \item LD_mat: The updated LD matrix after quality control.
-  \item outlier_number: The number of outlier variants removed.
-}
-When \code{rss_input} or combined-QC controls are supplied, returns a cleaned
-RSS/LD record for one RSS record, or a named list of records for a list of RSS
-records.
+A list containing the quality-controlled summary statistics and
+  updated LD matrix for the historical call:
+  \itemize{
+    \item sumstats: The quality-controlled summary statistics data frame.
+    \item LD_mat: The updated LD matrix after quality control.
+    \item outlier_number: The number of outlier variants removed.
+  }
+  When \code{rss_input} or combined-QC controls are supplied, returns a
+  cleaned RSS/LD record for one RSS record, or a named list of records for a
+  list of RSS records.
 }
 \description{
 This function performs quality control on the processed summary statistics
@@ -62,29 +62,29 @@ using the specified method. It wraps \code{\link{ld_mismatch_qc}} and handles
 subsetting of the summary statistics and LD matrix.
 }
 \details{
-This function applies the specified quality control method to the processed
-summary statistics via \code{\link{ld_mismatch_qc}}, then subsets the summary
-statistics and LD matrix to keep only non-outlier variants.
+This function applies the specified quality control method to the
+  processed summary statistics via \code{\link{ld_mismatch_qc}}, then subsets
+  the summary statistics and LD matrix to keep only non-outlier variants.
 
-As an additional workflow for ColocBoost/RSS pipelines, callers may supply
-\code{rss_input} or combined-QC controls. That path first runs
-\code{rss_basic_qc()}, optional PIP screening, optional LD-mismatch QC through
-this same function, and optional RAISS imputation. The combined path normalizes
-one or many RSS records to the same loop internally; only true single-record
-input is unwrapped on return.
-\code{qc_method = NULL}, \code{"none"}, and \code{"rss_qc"} all mean
-basic-only summary-stat preprocessing without SLALOM/DENTIST outlier QC.
+  As an additional workflow for ColocBoost/RSS pipelines, callers may supply
+  \code{rss_input} or combined-QC controls. That path first runs
+  \code{rss_basic_qc()}, optional PIP screening, optional LD-mismatch QC
+  through this same function, and optional RAISS imputation. The combined
+  path normalizes one or many RSS records to the same loop internally; only
+  true single-record input is unwrapped on return.
+  \code{qc_method = NULL}, \code{"none"}, and \code{"rss_qc"} all mean
+  basic-only summary-stat preprocessing without SLALOM/DENTIST outlier QC.
 
-When the combined path receives genotype-backed reference data
-(\code{X_ref}), basic harmonization avoids constructing an LD matrix. PIP
-screening uses an LD-independent single-effect screen with
-\code{susie_rss(R = diag(p), L = 1, max_iter = 1)}, and LD-mismatch QC
-computes only the filtered local correlation matrix required by
-SLALOM/DENTIST. RAISS imputation temporarily centers/scales genotype-backed
-\code{X_ref} before using the whole-region genotype/SVD path. This avoids
-LD-block partition artifacts while matching the LD scale used by
-\code{compute_LD(X_ref)}. Final ColocBoost calls still keep the original
-\code{X_ref} as the reference input.
+  When the combined path receives genotype-backed reference data
+  (\code{X_ref}), basic harmonization avoids constructing an LD matrix. PIP
+  screening uses an LD-independent single-effect screen with
+  \code{susie_rss(R = diag(p), L = 1, max_iter = 1)}, and LD-mismatch QC
+  computes only the filtered local correlation matrix required by
+  SLALOM/DENTIST. RAISS imputation temporarily centers/scales
+  genotype-backed \code{X_ref} before using the whole-region genotype/SVD
+  path. This avoids LD-block partition artifacts while matching the LD scale
+  used by \code{compute_LD(X_ref)}. Final ColocBoost calls still keep the
+  original \code{X_ref} as the reference input.
 }
 \examples{
 # Perform SLALOM quality control (default)

From 52e04673c436638db0de1d9ed38441f5d9378e5c Mon Sep 17 00:00:00 2001
From: xuewei cao <36172337+xueweic@users.noreply.github.com>
Date: Sun, 24 May 2026 16:10:37 -0500
Subject: [PATCH 3/5] Remove rss_qc from summary-stat QC methods

Remove the legacy rss_qc alias so summary_stats_qc only accepts none, slalom, and dentist.
---
 R/colocboost_pipeline.R           | 16 ++++++++--------
 R/sumstats_qc.R                   |  9 ++++-----
 R/univariate_pipeline.R           |  4 ++--
 man/colocboost_analysis.Rd        | 12 ++++++------
 man/colocboost_pipeline.Rd        |  4 +---
 man/rss_analysis_pipeline.Rd      |  4 ++--
 man/summary_stats_qc.Rd           | 16 ++++++++--------
 tests/testthat/test_sumstats_qc.R | 14 ++++++++++++++
 8 files changed, 45 insertions(+), 34 deletions(-)

diff --git a/R/colocboost_pipeline.R b/R/colocboost_pipeline.R
index 61c4874d..63300b28 100644
--- a/R/colocboost_pipeline.R
+++ b/R/colocboost_pipeline.R
@@ -116,8 +116,8 @@ region_data_to_colocboost_input <- function(region_data) {
 #' \code{pip_cutoff_to_skip_sumstat}, \code{impute = TRUE}, or
 #' \code{LD_reference_info} is supplied and named \code{sumstat} plus either
 #' \code{LD}, \code{X_ref}, or \code{LD_reference_info} are available.
-#' \code{qc_method = "none"} and \code{qc_method = "rss_qc"} mean run basic
-#' allele/variant harmonization only; they do not run SLALOM/DENTIST
+#' \code{qc_method = "none"} means run basic allele/variant harmonization
+#' only; it does not run SLALOM/DENTIST
 #' LD-mismatch QC. RAISS imputation is controlled separately by
 #' \code{impute = TRUE}.
 #'
@@ -137,8 +137,8 @@ region_data_to_colocboost_input <- function(region_data) {
 #'   Individual-level QC controls. If all are \code{NULL}, individual-level QC
 #'   is not run.
 #' @param keep_indel,pip_cutoff_to_skip_sumstat,qc_method,impute,impute_opts
-#'   Summary-statistic QC controls. \code{qc_method = "none"} and
-#'   \code{qc_method = "rss_qc"} run basic allele harmonization without
+#'   Summary-statistic QC controls. \code{qc_method = "none"} runs
+#'   basic allele harmonization without
 #'   LD-mismatch outlier detection. Imputation is only run when
 #'   \code{impute = TRUE}.
 #' @param LD_reference_info Optional LD reference information for
@@ -343,8 +343,8 @@ colocboost_analysis <- function(...,
 #' @param pip_cutoff_to_skip_ind A vector of cutoff values for skipping analysis based on PIP values for each context. Default is 0.
 #' @param pip_cutoff_to_skip_sumstat A vector of cutoff values for skipping analysis based on PIP values for each sumstat Default is 0.
 #' @param qc_method Quality control method to use. Options are "none",
-#'   "rss_qc", "slalom", or "dentist". \code{NULL} and \code{"rss_qc"}
-#'   are treated as \code{"none"} for basic-only summary-stat preprocessing.
+#'   "slalom", or "dentist". \code{NULL} is treated as \code{"none"} for
+#'   basic-only summary-stat preprocessing.
 #' @param impute Logical; if TRUE, performs imputation for outliers identified in the analysis (default: TRUE).
 #' @param impute_opts A list of imputation options including rcond, R2_threshold, and minimum_ld (default: list(rcond = 0.01, R2_threshold = 0.6, minimum_ld = 5)).
 #'
@@ -640,8 +640,8 @@ colocboost_pipeline <- function(
 #' @param pip_cutoff_to_skip_ind A vector of cutoff values for skipping individual contexts.
 #' @param pip_cutoff_to_skip_sumstat A vector of cutoff values for skipping summary-stat studies.
 #' @param qc_method Quality control method to use. Options are "none",
-#'   "rss_qc", "slalom", or "dentist". \code{NULL} and \code{"rss_qc"}
-#'   are treated as \code{"none"} for basic-only summary-stat preprocessing.
+#'   "slalom", or "dentist". \code{NULL} is treated as \code{"none"} for
+#'   basic-only summary-stat preprocessing.
 #' @param impute Logical; if TRUE, performs imputation when required metadata are available.
 #' @param impute_opts A list of imputation options.
 #' @return A list containing post-QC \code{individual_data} and \code{sumstat_data}.
diff --git a/R/sumstats_qc.R b/R/sumstats_qc.R
index aca54aea..f6d4928d 100644
--- a/R/sumstats_qc.R
+++ b/R/sumstats_qc.R
@@ -149,8 +149,7 @@ ld_mismatch_qc <- function(zScore, R = NULL, X = NULL, nSample = NULL,
 
 .resolve_summary_qc_method <- function(qc_method) {
   if (is.null(qc_method)) return("none")
-  qc_method <- match.arg(qc_method, c("none", "rss_qc", "slalom", "dentist"))
-  if (identical(qc_method, "rss_qc")) "none" else qc_method
+  match.arg(qc_method, c("none", "slalom", "dentist"))
 }
 
 #' Perform Quality Control on Summary Statistics
@@ -196,8 +195,8 @@ ld_mismatch_qc <- function(zScore, R = NULL, X = NULL, nSample = NULL,
 #'   through this same function, and optional RAISS imputation. The combined
 #'   path normalizes one or many RSS records to the same loop internally; only
 #'   true single-record input is unwrapped on return.
-#'   \code{qc_method = NULL}, \code{"none"}, and \code{"rss_qc"} all mean
-#'   basic-only summary-stat preprocessing without SLALOM/DENTIST outlier QC.
+#'   \code{qc_method = NULL} and \code{"none"} mean basic-only summary-stat
+#'   preprocessing without SLALOM/DENTIST outlier QC.
 #'
 #'   When the combined path receives genotype-backed reference data
 #'   (\code{X_ref}), basic harmonization avoids constructing an LD matrix. PIP
@@ -520,7 +519,7 @@ summary_stats_qc <- function(sumstats, LD_data, n = NULL,
   }
 
   outlier_number <- 0L
-  if (!is.null(qc_method) && !qc_method %in% c("none", "rss_qc")) {
+  if (!is.null(qc_method) && !identical(qc_method, "none")) {
     message("QC track: running ", qc_method, " LD-mismatch QC for summary-stat study ", study, ".")
     qc <- summary_stats_qc(sumstats = sumstats, LD_data = ld_data_with_local_R(sumstats),
                             n = n, method = qc_method)
diff --git a/R/univariate_pipeline.R b/R/univariate_pipeline.R
index f92b70ff..5bc2227b 100644
--- a/R/univariate_pipeline.R
+++ b/R/univariate_pipeline.R
@@ -204,7 +204,7 @@ load_study_LD <- function(ld_path, region) {
 #' @param region_name_col Column to filter for extract_region_name.
 #' @param qc_method Summary-statistic QC method. \code{"slalom"} and
 #'   \code{"dentist"} run basic allele harmonization plus LD-mismatch QC;
-#'   \code{"rss_qc"} and \code{"none"} run basic allele harmonization only.
+#'   \code{"none"} runs basic allele harmonization only.
 #' @param finemapping_method One of "susie_rss", "single_effect", "bayesian_conditional_regression".
 #' @param finemapping_opts List of fine-mapping options (L, L_greedy, coverage,
 #'   signal_cutoff, min_abs_corr).
@@ -227,7 +227,7 @@ rss_analysis_pipeline <- function(
     sumstat_path, column_file_path, LD_data,
     n_sample = 0, n_case = 0, n_control = 0, region = NULL, skip_region = NULL,
     extract_region_name = NULL, region_name_col = NULL,
-    qc_method = c("slalom", "dentist", "rss_qc", "none"),
+    qc_method = c("slalom", "dentist", "none"),
     finemapping_method = c("susie_rss", "single_effect", "bayesian_conditional_regression"),
     finemapping_opts = list(
       L = 20, L_greedy = 5,
diff --git a/man/colocboost_analysis.Rd b/man/colocboost_analysis.Rd
index c5249c57..cc88741b 100644
--- a/man/colocboost_analysis.Rd
+++ b/man/colocboost_analysis.Rd
@@ -30,10 +30,10 @@ QC can only inspect inputs that are supplied by name.}
 \item{missing_rate_thresh, maf_cutoff, xvar_cutoff, ld_reference_meta_file, pip_cutoff_to_skip_ind}{Individual-level QC controls. If all are \code{NULL}, individual-level QC
 is not run.}
 
-\item{keep_indel, pip_cutoff_to_skip_sumstat, qc_method, impute, impute_opts}{Summary-statistic QC controls. \code{qc_method = "none"} and
-\code{qc_method = "rss_qc"} run basic allele harmonization without
-LD-mismatch outlier detection. Imputation is only run when
-\code{impute = TRUE}.}
+\item{keep_indel, pip_cutoff_to_skip_sumstat, qc_method, impute, impute_opts}{Summary-statistic QC controls.
+\code{qc_method = "none"} runs basic allele harmonization without LD-mismatch
+outlier detection. Imputation is only run
+when \code{impute = TRUE}.}
 
 \item{LD_reference_info}{Optional LD reference information for
 summary-statistic QC. This is only needed when the native \code{LD} matrix
@@ -78,8 +78,8 @@ is non-\code{NULL} and named \code{X} and \code{Y} inputs are available in
 \code{pip_cutoff_to_skip_sumstat}, \code{impute = TRUE}, or
 \code{LD_reference_info} is supplied and named \code{sumstat} plus either
 \code{LD}, \code{X_ref}, or \code{LD_reference_info} are available.
-\code{qc_method = "none"} and \code{qc_method = "rss_qc"} mean run basic
-allele/variant harmonization only; they do not run SLALOM/DENTIST
+\code{qc_method = "none"} means run basic allele/variant harmonization only;
+it does not run SLALOM/DENTIST
 LD-mismatch QC. RAISS imputation is controlled separately by
 \code{impute = TRUE}.
 
diff --git a/man/colocboost_pipeline.Rd b/man/colocboost_pipeline.Rd
index ace84f3f..c06c0373 100644
--- a/man/colocboost_pipeline.Rd
+++ b/man/colocboost_pipeline.Rd
@@ -34,9 +34,7 @@ colocboost_pipeline(
 
 \item{pip_cutoff_to_skip_sumstat}{A vector of cutoff values for skipping analysis based on PIP values for each sumstat Default is 0.}
 
-\item{qc_method}{Quality control method to use. Options are "none",
-"rss_qc", "slalom", or "dentist". \code{NULL} and \code{"rss_qc"}
-are treated as \code{"none"} for basic-only summary-stat preprocessing.}
+\item{qc_method}{Quality control method to use. Options are "none", "slalom", or "dentist". \code{NULL} is treated as \code{"none"} for basic-only summary-stat preprocessing.}
 
 \item{impute}{Logical; if TRUE, performs imputation for outliers identified in the analysis (default: TRUE).}
 
diff --git a/man/rss_analysis_pipeline.Rd b/man/rss_analysis_pipeline.Rd
index df410a0e..d34d351e 100644
--- a/man/rss_analysis_pipeline.Rd
+++ b/man/rss_analysis_pipeline.Rd
@@ -15,7 +15,7 @@ rss_analysis_pipeline(
   skip_region = NULL,
   extract_region_name = NULL,
   region_name_col = NULL,
-  qc_method = c("slalom", "dentist", "rss_qc", "none"),
+  qc_method = c("slalom", "dentist", "none"),
   finemapping_method = c("susie_rss", "single_effect", "bayesian_conditional_regression"),
   finemapping_opts = list(L = 20, L_greedy = 5, coverage = c(0.95, 0.7, 0.5),
     signal_cutoff = 0.025, min_abs_corr = 0.8),
@@ -56,7 +56,7 @@ require a correlation matrix.}
 
 \item{qc_method}{Summary-statistic QC method. \code{"slalom"} and
 \code{"dentist"} run basic allele harmonization plus LD-mismatch QC;
-\code{"rss_qc"} and \code{"none"} run basic allele harmonization only.}
+\code{"none"} runs basic allele harmonization only.}
 
 \item{finemapping_method}{One of "susie_rss", "single_effect", "bayesian_conditional_regression".}
 
diff --git a/man/summary_stats_qc.Rd b/man/summary_stats_qc.Rd
index 4ba39714..d43c9d22 100644
--- a/man/summary_stats_qc.Rd
+++ b/man/summary_stats_qc.Rd
@@ -66,14 +66,14 @@ This function applies the specified quality control method to the
   processed summary statistics via \code{\link{ld_mismatch_qc}}, then subsets
   the summary statistics and LD matrix to keep only non-outlier variants.
 
-  As an additional workflow for ColocBoost/RSS pipelines, callers may supply
-  \code{rss_input} or combined-QC controls. That path first runs
-  \code{rss_basic_qc()}, optional PIP screening, optional LD-mismatch QC
-  through this same function, and optional RAISS imputation. The combined
-  path normalizes one or many RSS records to the same loop internally; only
-  true single-record input is unwrapped on return.
-  \code{qc_method = NULL}, \code{"none"}, and \code{"rss_qc"} all mean
-  basic-only summary-stat preprocessing without SLALOM/DENTIST outlier QC.
+As an additional workflow for ColocBoost/RSS pipelines, callers may supply
+\code{rss_input} or combined-QC controls. That path first runs
+\code{rss_basic_qc()}, optional PIP screening, optional LD-mismatch QC through
+this same function, and optional RAISS imputation. The combined path normalizes
+one or many RSS records to the same loop internally; only true single-record
+input is unwrapped on return.
+\code{qc_method = NULL} and \code{"none"} mean basic-only summary-stat
+preprocessing without SLALOM/DENTIST outlier QC.
 
   When the combined path receives genotype-backed reference data
   (\code{X_ref}), basic harmonization avoids constructing an LD matrix. PIP
diff --git a/tests/testthat/test_sumstats_qc.R b/tests/testthat/test_sumstats_qc.R
index 366596f0..a7152f42 100644
--- a/tests/testthat/test_sumstats_qc.R
+++ b/tests/testthat/test_sumstats_qc.R
@@ -472,6 +472,20 @@ test_that("summary_stats_qc treats NULL qc_method as basic-only none", {
   expect_equal(nrow(result$rss_input$sumstats), nrow(td$sumstats))
 })
 
+test_that("summary_stats_qc rejects invalid qc_method values", {
+  td <- make_test_sumstats_ld(n_variants = 5)
+  rss_input <- list(sumstats = td$sumstats, n = 1000, var_y = 1)
+
+  expect_error(
+    summary_stats_qc(
+      rss_input = rss_input,
+      LD_data = td$LD_data,
+      qc_method = "bad_method"
+    ),
+    "should be one of"
+  )
+})
+
 test_that("summary_stats_qc LD-mismatch QC computes only filtered local LD from X_ref", {
   td <- make_test_sumstats_ld(n_variants = 5)
   X_ref <- matrix(rnorm(50), 10, 5)

From 20464f4953351bf4b852d8dcef962048e2d2ad66 Mon Sep 17 00:00:00 2001
From: xueweic <xueweic@users.noreply.github.com>
Date: Sun, 24 May 2026 21:24:31 +0000
Subject: [PATCH 4/5] Update documentation

---
 man/colocboost_analysis.Rd | 12 ++++++------
 man/colocboost_pipeline.Rd |  4 +++-
 man/summary_stats_qc.Rd    | 16 ++++++++--------
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/man/colocboost_analysis.Rd b/man/colocboost_analysis.Rd
index cc88741b..054a20bc 100644
--- a/man/colocboost_analysis.Rd
+++ b/man/colocboost_analysis.Rd
@@ -30,10 +30,10 @@ QC can only inspect inputs that are supplied by name.}
 \item{missing_rate_thresh, maf_cutoff, xvar_cutoff, ld_reference_meta_file, pip_cutoff_to_skip_ind}{Individual-level QC controls. If all are \code{NULL}, individual-level QC
 is not run.}
 
-\item{keep_indel, pip_cutoff_to_skip_sumstat, qc_method, impute, impute_opts}{Summary-statistic QC controls.
-\code{qc_method = "none"} runs basic allele harmonization without LD-mismatch
-outlier detection. Imputation is only run
-when \code{impute = TRUE}.}
+\item{keep_indel, pip_cutoff_to_skip_sumstat, qc_method, impute, impute_opts}{Summary-statistic QC controls. \code{qc_method = "none"} runs
+basic allele harmonization without
+LD-mismatch outlier detection. Imputation is only run when
+\code{impute = TRUE}.}
 
 \item{LD_reference_info}{Optional LD reference information for
 summary-statistic QC. This is only needed when the native \code{LD} matrix
@@ -78,8 +78,8 @@ is non-\code{NULL} and named \code{X} and \code{Y} inputs are available in
 \code{pip_cutoff_to_skip_sumstat}, \code{impute = TRUE}, or
 \code{LD_reference_info} is supplied and named \code{sumstat} plus either
 \code{LD}, \code{X_ref}, or \code{LD_reference_info} are available.
-\code{qc_method = "none"} means run basic allele/variant harmonization only;
-it does not run SLALOM/DENTIST
+\code{qc_method = "none"} means run basic allele/variant harmonization
+only; it does not run SLALOM/DENTIST
 LD-mismatch QC. RAISS imputation is controlled separately by
 \code{impute = TRUE}.
 
diff --git a/man/colocboost_pipeline.Rd b/man/colocboost_pipeline.Rd
index c06c0373..2db58dd2 100644
--- a/man/colocboost_pipeline.Rd
+++ b/man/colocboost_pipeline.Rd
@@ -34,7 +34,9 @@ colocboost_pipeline(
 
 \item{pip_cutoff_to_skip_sumstat}{A vector of cutoff values for skipping analysis based on PIP values for each sumstat Default is 0.}
 
-\item{qc_method}{Quality control method to use. Options are "none", "slalom", or "dentist". \code{NULL} is treated as \code{"none"} for basic-only summary-stat preprocessing.}
+\item{qc_method}{Quality control method to use. Options are "none",
+"slalom", or "dentist". \code{NULL} is treated as \code{"none"} for
+basic-only summary-stat preprocessing.}
 
 \item{impute}{Logical; if TRUE, performs imputation for outliers identified in the analysis (default: TRUE).}
 
diff --git a/man/summary_stats_qc.Rd b/man/summary_stats_qc.Rd
index d43c9d22..b20f8cc8 100644
--- a/man/summary_stats_qc.Rd
+++ b/man/summary_stats_qc.Rd
@@ -66,14 +66,14 @@ This function applies the specified quality control method to the
   processed summary statistics via \code{\link{ld_mismatch_qc}}, then subsets
   the summary statistics and LD matrix to keep only non-outlier variants.
 
-As an additional workflow for ColocBoost/RSS pipelines, callers may supply
-\code{rss_input} or combined-QC controls. That path first runs
-\code{rss_basic_qc()}, optional PIP screening, optional LD-mismatch QC through
-this same function, and optional RAISS imputation. The combined path normalizes
-one or many RSS records to the same loop internally; only true single-record
-input is unwrapped on return.
-\code{qc_method = NULL} and \code{"none"} mean basic-only summary-stat
-preprocessing without SLALOM/DENTIST outlier QC.
+  As an additional workflow for ColocBoost/RSS pipelines, callers may supply
+  \code{rss_input} or combined-QC controls. That path first runs
+  \code{rss_basic_qc()}, optional PIP screening, optional LD-mismatch QC
+  through this same function, and optional RAISS imputation. The combined
+  path normalizes one or many RSS records to the same loop internally; only
+  true single-record input is unwrapped on return.
+  \code{qc_method = NULL} and \code{"none"} mean basic-only summary-stat
+  preprocessing without SLALOM/DENTIST outlier QC.
 
   When the combined path receives genotype-backed reference data
   (\code{X_ref}), basic harmonization avoids constructing an LD matrix. PIP

From d86da9905a90876b0b4480db25720385b8f65f14 Mon Sep 17 00:00:00 2001
From: Daniel Nachun <daniel.nachun@gmail.com>
Date: Sun, 24 May 2026 19:47:47 -0500
Subject: [PATCH 5/5] fix unit test

---
 tests/testthat/test_univariate_pipeline.R | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tests/testthat/test_univariate_pipeline.R b/tests/testthat/test_univariate_pipeline.R
index cb32abee..1e3ac5ad 100644
--- a/tests/testthat/test_univariate_pipeline.R
+++ b/tests/testthat/test_univariate_pipeline.R
@@ -955,7 +955,8 @@ test_that("rss: pip_cutoff_to_skip > 0, no signal => early return", {
       sumstat_path = "/fake/sumstats.tsv",
       column_file_path = "/fake/columns.yml",
       LD_data = list(),
-      pip_cutoff_to_skip = 0.5
+      pip_cutoff_to_skip = 0.5,
+      qc_method = "none"
     ),
     "Skipping follow-up"
   )
@@ -1012,7 +1013,8 @@ test_that("rss: negative pip_cutoff_to_skip auto-computes threshold", {
       sumstat_path = "/fake/sumstats.tsv",
       column_file_path = "/fake/columns.yml",
       LD_data = list(),
-      pip_cutoff_to_skip = -1
+      pip_cutoff_to_skip = -1,
+      qc_method = "none"
     ),
     "Skipping follow-up"
   )