Skip to content

guanhomer/MethylVerse_MPACT

 
 

Repository files navigation

MPACT Custom Pipeline

This repository is a fork of the original MethylVerse/MPACT implementation with an added standalone execution script for batch processing of sequencing-derived methylation data.

Overview

The custom pipeline extends the original MPACT workflow with:

  • Batch processing of multiple samples from directory structures
  • Support for HVW (Human Variation Workflow) bed-format inputs
  • Dual prediction modes (raw and regressed)
  • Extended output reporting (top-3 class probabilities)
  • Custom tumor decomposition implementation
  • Centralized logging across runs

Added Script

Location:

scripts/run_mpact.py

This script acts as a high-level pipeline wrapper around the core MPACT_process_raw functionality.

Key Features

1. Input Handling

  • Accepts HVW .bed files
  • Converts to MPACT-compatible temporary format
  • Automatically skips empty or invalid inputs
  • Recursively scans directories for input files

2. Batch Execution

  • Iterates over all matching input files
  • Maintains relative directory structure in output
  • Skips completed samples unless REPLACE=True

3. Dual Classification Output

For each sample:

  • Regressed predictions (after contamination removal)
  • Raw predictions (no regression)

Outputs top 3 classes with probabilities for both modes.

4. Tumor Decomposition (Custom)

  • Returns:

    • Tumor purity
    • Full component decomposition matrix

5. Output Files

Per sample directory:

MPACT_regressed_probabilities.tsv
MPACT_raw_probabilities.tsv
MPACT_decomposition.tsv
MPACT_classifications.tsv
MPACT_cnvs.pdf (optional)

6. Logging

  • Single universal log file per run
  • Timestamped entries
  • Captures stdout and stderr

Example:

script/run_mpact_YYYYMMDD_HHMMSS.log

Configuration

Parameters are defined at the top of the script:

REPLACE = False
KEEP_TMP = False
IMPUTE = True
CALL_CNVS = False
VERBOSE = False

PROBABILITY_THRESHOLD = 0.7
MAX_CONTAMINATION_FRACTION = 0.5

Paths:

DRIVE_DIR = Path("/mnt/x")
HVW_DIR = DRIVE_DIR / "Human variation workflow"
WORK_DIR = DRIVE_DIR / "Nanopore_classification/methylverse_MPACT"

These should be adapted to the local environment.

Installation (Ubuntu on Windows 11 / WSL)

1. Create a Conda Environment

conda create -n methylverse python=3.10 -y
conda activate methylverse

2. Install Core Scientific and Build Dependencies

conda install -c conda-forge -c bioconda \
    pysam cython numpy scipy \
    setuptools wheel compilers \
    make autoconf automake libtool pkg-config \
    zlib bzip2 xz libcurl openssl \
    -y

3. Upgrade pip and Install Python Build Backends

python -m pip install -U pip
python -m pip install "setuptools<81" wheel poetry-core

4. Install Required Python Dependencies

python -m pip install ailist genome_info

5. Install ngsfragments

python -m pip install --no-build-isolation ngsfragments

6. Install PyTorch

python -m pip install torch

7. Install MethylVerse

python -m pip install MethylVerse

8. Verify the Installation

python -c "import ngsfragments, MethylVerse; print('Installation successful')"

python -m MethylVerse MPACT -h

Usage

Activate environment:

conda activate methylverse

Run pipeline:

python scripts/run_mpact.py

Notes

  • The script contains hard-coded paths and is intended for local execution.
  • It is not integrated into the package API.
  • Refactoring would be required for general-purpose distribution.

About

A suite of tools for methylation analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 94.3%
  • C 3.6%
  • Cython 2.1%