-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Here we present a Wastewater Enterovirus Typing Tool (WEVTYTO) to analyse samples containing a mixture of Enterovirus.
This pipeline was designed with wastewater samples and Oxford Nanopore data in mind. It is based on VSEARCH tool to perform read filtering, clustering and searching. Briefly, 20 nucleotides are trimmed from both ends of the reads and the resulting filtered reads are clustered at 95% id. The consensus sequence of each cluster is generated and searched against a custom reference database. A final Excel file is generated for each sample containing the Enterovirus types found as well as its proportion.
Two scripts are provided, ev_typing_nix.py and ev_typing_shaw.py.
The nix script must be used when the Nix et.al., 2006 protocol is followed. This protocol targets all Enterovirus types by amplifying a ~348-393bp region of the VP1 gene.
The shaw script must be used when the Shaw et.al., 2020 protocol is followed. This protocol targets Enterovirus C cluster by amplifying a ~1089bp region of the Enterovirus genome.
A Conda enviroment can be created using the ev_typing_environment.yml file containing all the required dependencies to run the pipeline.
conda env create -f ev_typing_environment.yml
Or, if using Mamba
mamba env create -f ev_typing_environment.yml
Once the environment has been created, activate it.
conda activate ev_typing
Then, go to the folder containing all your fastq.gz files.
cd \path\to\yoursample\directory
Copy the corresponding script ev_typing_nix.py or ev_typing_shaw.py and the reference FASTA file into the same folder as your fastq.gz files to be analysed.
Execute the script.
python ev_typing_nix.py
Or
python ev_typing_shaw.py
The script will run and a results file will be generated.
The scripts are set to use only 4 CPU cores. To increase the speed of the analysis, change the number of cores according to your CPU capabilities.
It is possible to adapt the scripts to analyse other viruses.
Changing the reference FASTA file for one containing the references of interest and adapting the --minseqlength and --maxseqlength parameters should be enough to be able to analyse other viruses. Changes on BLAST id may also be necessary.
If changing the reference FASTA file, be careful to keep the same structure as the one provided here so the final Excel files are properly formatted.