Add feature/species specific params#42
Conversation
0.1.0 Release
Minor Release: 0.2.0
Minor Release: 0.3.0
Release: 0.3.1
Patch Release: 0.3.2
Patch Release: 0.3.3
| // Ecoli with wrong species in samplesheet | ||
| assert path("$outputDir/staramr/B2_results/B2_settings.staramr.txt").exists() | ||
| def ecoli_settings = new File("$outputDir/staramr/B2_results/B2_settings.staramr.txt") | ||
| def ecoli_cmd = ecoli_settings.readLines().get(0) | ||
| assert ecoli_cmd == "command_line = /usr/local/bin/staramr search --pointfinder-organism escherichia_coli --minimum-contig-length 300 --genome-size-lower-bound 4000000 --genome-size-upper-bound 6000000 --minimum-N50-value 10000 --minimum-contig-length 300 --unacceptable-number-contigs 1000 --pid-threshold 98 --percent-length-overlap-plasmidfinder 60 --percent-length-overlap-resfinder 60 --percent-length-overlap-pointfinder 95 --nprocs 1 -o B2_results B2.fasta" | ||
| assert ecoli_cmd == "command_line = /usr/local/bin/staramr search --pointfinder-organism escherichia_coli --minimum-contig-length 300 --minimum-N50-value 10000 --minimum-contig-length 300 --unacceptable-number-contigs 1000 --pid-threshold 98 --percent-length-overlap-plasmidfinder 60 --genome-size-lower-bound 4000000 --genome-size-upper-bound 6700000 --percent-length-overlap-resfinder 52 --percent-length-overlap-pointfinder 95 --nprocs 1 -o B2_results B2.fasta" |
There was a problem hiding this comment.
Can you explain what this is testing and how it works?
The sample sheet is:
sample,sample_name,contigs,species
GCA_000008105,A 1#,https://github.com/phac-nml/staramrnf/raw/dev/tests/genomes/salmonella/GCA_000008105.1_ASM810v1_genomic.fna.gz,Salmonella
GCA_000947975,B2,https://github.com/phac-nml/staramrnf/raw/dev/tests/genomes/ecoli/GCA_000947975.1_ASM94797v1_genomic.fna.gz,Escherichia coli
GCF_000196035,B2,https://github.com/phac-nml/staramrnf/raw/dev/tests/genomes/listeria/GCF_000196035.1_ASM19603v1_genomic.fna,Listeria monocytogenes
GCF_000196035_B,,https://github.com/phac-nml/staramrnf/raw/dev/tests/genomes/listeria/GCF_000196035.1_ASM19603v1_genomic.fna,Listeria monocytogenes
Where there are 2 B2 sample_name entries, one is E coli (GCA_000947975) and the other is Listeria (GCF_000196035).
Is the test checking the Listeria one or the E. coli one? I don't really understand which one the B2_results is supposed to be corresponding to. If it corresponds to the E. coli one, why is E. coli the wrong species in the sample sheet? It looks like there's Escherichia coli in the sample sheet for GCA_000947975,B2. Is it not actually E. coli?
There was a problem hiding this comment.
The change I made is simply because the new feature changes the output, and so I adjusted it for the test.
As for the test itself, I believe this original test was slightly poorly planned (my first pipeline) test that was a kind of all-in-one where it was confirming the sample renaming and the outputs of these for a full test.
There was a problem hiding this comment.
Not sure why the last sample is not tested. Maybe it is a good time to fix things up.
| validationFailUnrecognisedParams = false | ||
| validationLenientMode = false | ||
| validationSchemaIgnoreParams = 'genomes,igenomes_base' | ||
| validationSchemaIgnoreParams = 'genomes,igenomes_base,genus_list,default_staramr,salmonella,escherichia,shigella,campylobacter' |
There was a problem hiding this comment.
What are the warnings or errors when not ignored?
There was a problem hiding this comment.
It looks like this at the top of every run. Since they are more "settings" than "parameters" I decided to hide them:
e.g,
N E X T F L O W ~ version 24.10.6
Launching `main.nf` [amazing_engelbart] DSL2 - revision: 38dd4bef6a
WARN: The following invalid input values have been detected:
* --genus_list: [salmonella, campylobacter, escherichia, shigella]
* --default_staramr: [genome_size_lower_bound:4000000, genome_size_upper_bound:6000000, percent_length_overlap_resfinder:60, percent_length_overlap_pointfinder:95]
* --salmonella: [genome_size_lower_bound:4000000, genome_size_upper_bound:6700000, percent_length_overlap_resfinder:52, percent_length_overlap_pointfinder:95]
* --escherichia: [genome_size_lower_bound:4000000, genome_size_upper_bound:6700000, percent_length_overlap_resfinder:52, percent_length_overlap_pointfinder:95]
* --shigella: [genome_size_lower_bound:4000000, genome_size_upper_bound:6700000, percent_length_overlap_resfinder:52, percent_length_overlap_pointfinder:95]
* --campylobacter: [genome_size_lower_bound:1250000, genome_size_upper_bound:2500000, percent_length_overlap_resfinder:52, percent_length_overlap_pointfinder:58]
| ### Genus specific settings | ||
|
|
||
| They are used when the `species` column has any of the following genus selected: | ||
| ``` | ||
| genus_list = ['salmonella', 'campylobacter', 'escherichia', 'shigella'] | ||
| ``` |
There was a problem hiding this comment.
### Genus-specific settings
Genus-specific settings are used when the `species` column of the sample sheet contains any of the following genera:
|
|
||
|
|
||
|
|
| - Upper bound for our genome size for quality metrics= 2,500,000 | ||
| - Percent length overlap of BLAST hit for ResFinder Database = 52 | ||
| - Percent length overlap ofBLAST hit for PointFinder Database = 58 | ||
| - Point Finder Database = Campylobacter |
| - Lower bound for our genome size for quality metrics= 1,250,000 | ||
| - Upper bound for our genome size for quality metrics= 2,500,000 | ||
| - Percent length overlap of BLAST hit for ResFinder Database = 52 | ||
| - Percent length overlap ofBLAST hit for PointFinder Database = 58 |
| - Lower bound for our genome size for quality metrics = 4,000,000 | ||
| - Upper bound for our genome size for quality metrics= 6,700,000 | ||
| - Percent length overlap of BLAST hit for ResFinder Database = 52 | ||
| - PointFinder Database = Salmonella or PointFinder Database = E.coli | ||
| #### Campylobacter | ||
| - Lower bound for our genome size for quality metrics= 1,250,000 | ||
| - Upper bound for our genome size for quality metrics= 2,500,000 |
There was a problem hiding this comment.
Some = have space before, others do not. Recommend making consistent.
Update starAMR parameters
Add parameters for starAMR module
Parameters used in starAMR now modifiable in the config for the module.
Add species specific starAMR parameters
Parameters that use the species classification to assign the values (can be turned off through
--skip_species_classification. The defaults are:--genome_size_lower_bound : 4000000--genome_size_upper_bound : 6000000--percent_length_overlap_resfinder : 60--percent_length_overlap_pointfinder : 95The species with specific settings are:
Salmonella, Shigella, or Escherchia coli
Campylobacter