Skip to content

Evaluate alignment paradigms that solve the global problem differently (pair-HMM, WFA) #49

Description

@cjfields

Background

A brief lit review confirms that Needleman-Wunsch / Gotoh global DP is the dominant pairwise-alignment paradigm, and that the research frontier is mostly acceleration of the same recurrence (WFA, KSW2, Parasail/striped SW, Edlib bit-parallel, GASAL2/GPU) rather than new alignment models.

The alignment paradigm in dada2-rs is deliberately fixed to global / ends-free NW because the error-model contract requires a positional alignment: al2subs compresses it into a Sub (per-position map + substitution list with both nucleotides and qualities), which the p-value model in pval.rs consumes to classify each difference as biological divergence vs base-calling error. Distance-only or local-only methods break this pipeline.

Rule of thumb for what's worth evaluating: a paradigm should address the global alignment problem from a genuinely different perspective — not change the problem.

Candidates that clear that bar

  1. Pair-HMMs — fuse alignment and the error model into a single likelihood layer. DADA2 currently bolts pval.rs onto an NW alignment as two separate steps; a pair-HMM models error-vs-signal natively. Richer, but higher cost/complexity — assess whether it's justified for amplicon data.

  2. WFA (wavefront alignment) — changes the complexity regime to O(n·s) (sequence length × edit distance), exploiting the same "sequences are very similar" assumption DADA2 already relies on (k-mer pre-screen + banded DP). Strongest candidate by the "different perspective, same problem" criterion. Caveat: we need a variant that retains the positional traceback, not score/edit-distance only, to feed al2subs.

Scope

  • Literature/feasibility evaluation, not a commitment to implement.
  • For any candidate: confirm it can produce the positional substitution record pval.rs needs, and benchmark vs the current anti-diagonal auto-vectorized banded NW (align_vectorized_with_buf).
  • Out of scope: switching away from a global paradigm (local/Smith-Waterman would wrongly trim divergent-but-real read ends).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions