Skip to content

srh-bzd/ViroScan-nf

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ViroScan-nf

Nextflow Docker Singularity License CI

ViroScan-nf is a Nextflow pipeline designed to separate host and viral reads from sequencing data, identify viral mutations, and compute viral alignment and coverage metrics.

The pipeline combines host read filtering, viral variant calling, and summary metric generation in a fully reproducible workflow.

Table of Contents

Foreword

ViroScan-nf is an automated pipeline that:

  • Filters out host reads by aligning sequencing reads against a host reference genome using Bowtie2
  • Retains unmapped reads and uses them as candidate viral reads
  • Aligns viral reads to a viral reference genome using breseq
  • Identifies viral mutations
  • Computes viral alignment and coverage metrics directly from breseq outputs

Pipeline overview :

flowchart TB
%% --- Styles ---
classDef input fill:#FFE4B5,stroke:#CC6600,stroke-width:2px,color:#663300,font-weight:bold;
classDef preprocessing fill:#B6ECE2,stroke:#065647,stroke-width:2px,color:#065647,font-weight:bold;
classDef host fill:#FFD580,stroke:#CC6600,stroke-width:2px,color:#663300,font-weight:bold;
classDef viral fill:#FFB6C1,stroke:#CC3366,stroke-width:2px,color:#660033,font-weight:bold;
classDef report fill:#D3D3D3,stroke:#555,stroke-width:2px,color:#000,font-weight:bold;
classDef decision fill:#FFFACD,stroke:#FFA500,stroke-width:2px,stroke-dasharray: 5 5,color:#663300,font-weight:bold;

%% --- Inputs subgraph ---
subgraph Inputs
    reads_ch["Input reads"]:::input
    viral_genome_ch["Viral genome"]:::input
    subgraph Host_Input["Host Input (choose one)"]
        host_genome_ch["Host genome"]:::input
        host_genome_index_ch["Host genome index"]:::input
    end
end

%% --- Preprocessing (optional Fastp) ---
reads_ch --> cond_fastp
subgraph Preprocessing
    cond_fastp{"Trim reads ?"}:::decision
    FP["TRIMMING (Fastp)"]:::preprocessing
    trimmed_reads_ch["Trimmed reads"]:::preprocessing
    cond_fastp -->|Yes| FP
    FP --> trimmed_reads_ch
    cond_fastp -->|No| trimmed_reads_ch
end

%% --- Host filtering ---
host_genome_ch --> cond_index
host_genome_index_ch --> cond_index
subgraph Host_Filtering
    cond_index{"User provides host index ?"}:::decision
    BT2_INDEX["INDEXING (Bowtie2)"]:::host
    BT2_ALIGN["MAPPING (Bowtie2)"]:::host
    BT2_ALIGN_mapped["Mapped reads"]:::host
    BT2_ALIGN_unmapped["Unmapped reads"]:::host
    
    cond_index -->|Yes| BT2_ALIGN
    cond_index -->|No| BT2_INDEX
    BT2_INDEX --> BT2_ALIGN
    BT2_ALIGN --> BT2_ALIGN_mapped
    BT2_ALIGN --> BT2_ALIGN_unmapped
end

trimmed_reads_ch --> BT2_ALIGN

%% --- Viral analysis ---
viral_genome_ch --> BRESEQ
subgraph Viral_Analysis
    BRESEQ["VARIANT CALLING (Breseq)"]:::viral
    VM["VIRAL METRICS CALCULATION"]:::viral
    BRESEQ --> VM
end

BT2_ALIGN_unmapped --> BRESEQ

%% --- QC and reporting ---
subgraph Reporting
    fastp_reports_ch["Trimming reports"]:::report
    reports_ch["Mapping reports"]:::report
    MULTIQC["AGGREGATE (MultiQC)"]:::report
    FP --> fastp_reports_ch
    BT2_ALIGN --> reports_ch
    fastp_reports_ch --> MULTIQC
    reports_ch --> MULTIQC
end

%% --- Final outputs subgraph ---
subgraph Outputs
    multiqc_report_ch["Single HTML report"]:::report
    CV["Comprehensive Viral Report"]:::report
end

MULTIQC --> multiqc_report_ch
VM --> CV
Loading

Installation

Requirements

ViroScan-nf

# clone the workflow repository
git clone https://github.com/srh-bzd/ViroScan-nf.git

# cd into the repository
cd ViroScan-nf

Nextflow

You can install Nextflow either via conda (recommended) or manually.

  • Using conda

    conda create -n nextflow
    conda activate nextflow
    conda install nextflow
  • Manual installation

    # Make sure 11 or later is installed on your computer by using the command:
    java -version
    
    # Install Nextflow by entering this command in your terminal(it creates a file nextflow in the current dir):
    curl -s https://get.nextflow.io | bash 
    
    # Add Nextflow binary to your user's PATH:
    mv nextflow ~/bin/
    # OR system-wide installation:
    # sudo mv nextflow /usr/local/bin

Container platform

You must use Docker or Singularity.

Usage

Display available options:

nextflow run main.nf --help

Before running the workflow, make sure that the Python script used for generating viral metrics is executable:

chmod +x bin/write_viral_table.py

Run the pipeline using Docker:

nextflow run main.nf \
    -profile docker,local \
    --reads 'data/*R{1,2}.fq.gz' \
    --host_genome host.fasta \
    --viral_genome virus.gbk

Run the pipeline on the test dataset:

nextflow run main.nf \
    -profile docker,local,test

Available profiles:

  • docker
  • singularity
  • local
  • ifb

Parameters

Mandatory parameters

Parameter Description
--reads Input reads
--host_genome Host reference genome (FASTA)
--viral_genome Viral genome (FASTA or GenBank)
--outdir Output directory

Optional parameters

Parameter Default Description
--paired_end true Paired-end or single-end reads
--host_genome_index null Prefix of an existing Bowtie2 index (skip indexing)
--run_fastp true Enable read trimming
--fastp_options "" Additional fastp options
--bowtie2_options "" Additional Bowtie2 options
--breseq_options "" Additional breseq options
--table_threshold 5 Minimum percentage of reads aligned to the viral genome required to include the sample in the viral metrics table
--help false Display help message

Outputs

The main results are written to the directory specified by --outdir.

results/
├── 01.cleaned_reads
│   ├── log
│   │   └── sample_fastp.html
│   └── sample_R*.fastq.gz
├── 02.indexed_ref
│   ├── host.*.bt2
│   └── host.rev.*.bt2
├── 03.aligned_reads
│   ├── host
│   │   ├── log
│   │   │   └── sample_bowtie2.log
│   │   ├── sample.bam
│   │   ├── sample_matched.fq.gz
│   │   └── sample_matched_R*.fq.gz
│   └── viral
│       ├── sample.bam
│       └── sample.bam.bai
├── 04.unmapped_reads
│   ├── host
│   │   ├── sample_unmatched.fq.gz
│   │   └── sample_unmatched_R*.fq.gz
│   └── viral
│       ├── sample_R*.unmatched.fastq
│       └── sample.unmatched.fastq
├── 05.called_variants
│   ├── sample
│   │   └── output
│   │       ├── calibration
│   │       ├── evidence
│   │       ├── index.html
│   │       ├── log.txt
│   │       ├── marginal.html
│   │       ├── output.done
│   │       ├── output.gd
│   │       ├── output.vcf
│   │       ├── summary.html
│   │       └── summary.json
│   └── viral_alignment_metrics.txt
└── multiqc_report.html

Viral metrics table

Generated from breseq summary.json. Example:

Sample_ID Viral_genome Num_reads Num_reads_aligned Percent_reads_aligned Avg_coverage Percent_coverage Num_bases_mapped Num_genes Num_features Coverage_variance
sample OR669303 8984 8740 97.3 165 100 1309237 8 10 660.3331
Column Description
Sample_ID Name of the sample being analyzed
Viral_genome Viral reference genome ID used for alignment
Num_reads Total number of input sequencing reads
Num_reads_aligned Number of reads that aligned to the viral genome
Percent_reads_aligned Percentage of reads aligned to the virus
Avg_coverage Average sequencing coverage across the viral genome
Percent_coverage Approximate percentage of the genome covered by reads
Num_bases_mapped Total number of bases mapped to the viral genome
Num_genes Number of viral genes detected
Num_features Number of genomic features detected
Coverage_variance Variability of coverage along the viral genome

Uninstall

No installation is required. To uninstall, simply delete the repository directory.

Contributing

Contributions are welcome. See Contributing guidelines

Report bugs and issues

Please open an issue on GitHub: https://github.com/srh-bzd/ViroScan-nf/issues

Acknowledgement

Jacques Dainat (@Juke34)
Based on the BiTeN template: https://github.com/Juke34/BiTeN

About

Nextflow pipeline for separating host and viral reads and identifying viral mutations from sequencing data.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors