Nextflow pipeline for processing paired-end Illumina reads, especially useful for analysing amplicons of library variants. Takes raw FASTQ files and outputs per-bin sequence frequency counts.
- fastp — quality control and adapter trimming
- FLASH2 — merge paired-end reads by overlap
- cutadapt — orient reads using inline barcodes and a 3′ anchor sequence
- cutadapt — demultiplex into per-bin FASTQ files using inline barcodes
- cutadapt (optional) — extract the library region between left and right flanking sequences; reads that are untrimmed, too short, or too long are written to separate files for inspection
- Python — count the frequency of each unique sequence per bin
- MultiQC — aggregate QC report across all steps
06_counts/— per-bin sequence counts for the extracted library region (TSV)07_merged_counts/— all bins merged into a single TSV06_whole_counts/— per-bin sequence counts for full demuxed reads (TSV)07_merged_whole_counts/— all bins merged into a single TSV00_reports/— MultiQC HTML report
Whole-read and library-region counting can each be toggled independently (see parameters below).
- Java
- Nextflow
- conda
Clone and run locally:
git clone https://github.com/ilyacarey/libseq.git
cd libseq
nextflow run main.nf -profile condaRun directly from GitHub (barcodes FASTA must be available locally):
nextflow run ilyacarey/libseq -profile condaUpdate the relevant parameters in nextflow.config before running, or pass them on the command line:
nextflow run main.nf -profile conda \
--read1 sample_R1.fastq.gz \
--read2 sample_R2.fastq.gz \
--sample my_sample \
--barcodes_fasta barcodes_anchored.fasta| Parameter | Description |
|---|---|
read1 / read2 |
Input paired-end FASTQ files |
sample |
Sample name used for output file naming |
barcodes_fasta |
FASTA of inline barcodes (must be 5′-anchored with ^) |
orient_3p_anchor |
3′ anchor sequence for read orientation (anchor to end with $) |
left_flank |
Sequence immediately upstream of the library region |
right_flank |
Sequence immediately downstream of the library region |
lib_min_len / lib_max_len |
Expected length range (bp) of the extracted library region |
count_library |
Output counts of extracted library region (default: true) |
count_whole_reads |
Output counts of full demuxed reads (default: true) |
cleanup |
Delete work/ directory on successful completion (default: true); set to false to enable -resume |
cpus |
Number of CPU threads (default: 8) |