Skip to content
EntericVirusUB edited this page Apr 12, 2024 · 3 revisions

INTRODUCTION

Here we present a Wastewater Enterovirus Typing Tool (WEVTYTO) to analyse samples containing a mixture of Enterovirus.

This pipeline was designed with wastewater samples and Oxford Nanopore data in mind. It is based on VSEARCH tool to perform read filtering, clustering and searching. Briefly, 20 nucleotides are trimmed from both ends of the reads and the resulting filtered reads are clustered at 95% id. The consensus sequence of each cluster is generated and searched against a custom reference database. A final Excel file is generated for each sample containing the Enterovirus types found as well as its proportion.

Nix and Shaw

Two scripts are provided, ev_typing_nix.py and ev_typing_shaw.py.

The nix script must be used when the Nix et.al., 2006 protocol is followed. This protocol targets all Enterovirus types by amplifying a ~348-393bp region of the VP1 gene.

The shaw script must be used when the Shaw et.al., 2020 protocol is followed. This protocol targets Enterovirus C cluster by amplifying a ~1089bp region of the Enterovirus genome.

GUIDE

A Conda enviroment can be created using the ev_typing_environment.yml file containing all the required dependencies to run the pipeline.

conda env create -f ev_typing_environment.yml

Or, if using Mamba

mamba env create -f ev_typing_environment.yml

Once the environment has been created, activate it.

conda activate ev_typing

Then, go to the folder containing all your fastq.gz files.

cd \path\to\yoursample\directory

Copy the corresponding script ev_typing_nix.py or ev_typing_shaw.py and the reference FASTA file into the same folder as your fastq.gz files to be analysed.

Execute the script.

python ev_typing_nix.py

Or

python ev_typing_shaw.py

The script will run and a results file will be generated.

The scripts are set to use only 4 CPU cores. To increase the speed of the analysis, change the number of cores according to your CPU capabilities.

Other viruses

It is possible to adapt the scripts to analyse other viruses. Changing the reference FASTA file for one containing the references of interest and adapting the --minseqlength and --maxseqlength parameters should be enough to be able to analyse other viruses. Changes on BLAST id may also be necessary.

If changing the reference FASTA file, be careful to keep the same structure as the one provided here so the final Excel files are properly formatted.

Clone this wiki locally