A modular Snakemake pipeline for paired-end ATAC-seq from raw FASTQ to peaks, signal tracks, and QC reports.
This repository currently uses:
workflow/Snakefileconfig/config.yml- per-rule Conda environments in
workflow/envs/
WARNING: This pipeline covers upstream data analysis only (QC → alignment → peak calling → QC per-sample counts). The per-sample count output (
featurecounts/{sample}.readCountInPeaks.txt) is not ready for differential accessibility (DA) analysis with DESeq2 or edgeR. For downstream differential accessibility analysis you must:
- Generate a consensus peak set across all samples.
- Re-quantify reads against the consensus peaks to create a unified count matrix.
- Run differential accessibility analysis on that count matrix.
- Pipeline Summary
- Workflow DAG
- Rule Quick Reference
- Requirements
- Installation
- Input Files
- Local Run
- Running on HPC with LSF
- Key Configuration
- BAM Filtering Criteria
- Main Outputs
- QC Visual Guide
- ENCODE QC Benchmarks
- Space-saving behavior
- Troubleshooting
- Acknowledgments
- References
- License
Main workflow (per sample):
- Merge lanes by
sample_id(from samplesheet) - Raw FastQC
- Trimming (
trim_galoredefault, orfastp) - Alignment (
bowtie2default, orbwa= BWA-MEM2) - Re-mark duplicates (Picard MarkDuplicates; optional by config)
- BAM filtering to produce
*.filtered.bam- always applies SAM flag/MAPQ filtering
- optionally applies blacklist exclusion
- optionally excludes mitochondrial reads via
ref.keep_mito - can preserve the source BAM before filtering
- BAM stats on filtered BAM (
samtools stats/flagstat/idxstats) - Signal tracks
- Scaled bedGraph + bigWig from filtered BAM
- ATAC-shifted BAM + shifted RPGC bigWig
- Peak calling (MACS3 with Tn5-shifted BED)
- Peak QC summary plots (
plot_macs_qc.r) - FRiP (two methods: bedtools intersect + featureCounts log)
- Peak annotation (HOMER + summary)
- deepTools matrix/profile/heatmap/fingerprint/bamPEFragmentSize
- NFR / fragment-length-class analysis — short-fragment vs mono-class bigWigs + TSS profile/heatmap
- ataqv JSON + mkarv HTML report + TSS enrichment / NFR metrics table for MultiQC
- ATACseqQC — PT score, NFR score, TSSE score + QC plots (narrow peaks only)
- MultiQC
- Cleanup temporary/intermediate FASTQ files
Pipeline flow (per sample):
samplesheet + reference prep
|
v
merge_raw_fastqs
|
+--> fastqc_raw
|
v
trimming (fastp | trim_galore)
|
+--> fastqc_trimmed (trim_galore mode)
|
v
alignment (bwa-mem2 | bowtie2)
|
v
sort_bam
|
v
mark_duplicates (optional)
|
v
bam_filter ---> filtered.bam / filtered.bam.bai
| |
| +--> align_stats (samtools stats/flagstat/idxstats)
| +--> bedtools_genomecov -> bedGraphToBigWig -> bigWig
| +--> shift_bam (alignmentSieve --ATACshift) -> shifted.bam + shifted.bigWig
| +--> macs3_callpeak_tn5 (narrow: filtered.bam -> bamtobed -> awk Tn5 shift -> MACS3 BED mode)
| | (broad: filtered.bam -> MACS3 BAMPE mode)
| |
| +--> frip_score (filtered.bam + peaks -> bedtools intersect + featureCounts log -> MultiQC TSVs)
| +--> annotate_peaks (optional)
| +--> deeptools (optional)
| | +--> computeMatrix / plotProfile / plotHeatmap (shifted.bigWig for narrow peaks; bigWig for broad peaks)
| | +--> plotFingerprint / bamPEFragmentSize (filtered.bam)
| +--> nfr_fragment_counts (optional; nfr.fragment_counts + narrow peaks)
| | +--> samtools view + awk → fragment_counts_mqc.tsv (NFR/mono/di/tri counts)
| +--> nfr (optional; nfr.enabled + shift_bam + narrow peaks)
| | +--> alignmentSieve → nfr.bigWig + mono.bigWig (fragment-length classes)
| | +--> computeMatrix / plotProfile / plotHeatmap (short-fragment vs mono-class)
| +--> ataqv (optional; filtered.bam)
| | +--> ataqv_mqc → ataqv_mqc.tsv (TSS enrichment, NFR ratio)
| +--> atacseqqc (optional; narrow peaks only)
| | +--> fragSizeDist → fragsize_dist.png
| | +--> PTscore → pt_score.png + atacseqqc_mqc.tsv
| | +--> NFRscore → nfr_score.png + atacseqqc_mqc.tsv
| | +--> TSSEscore → tsse.png + atacseqqc_mqc.tsv
|
v
multiqc
|
v
delete_tmp
All per-sample rules in execution order. Toggle columns: t = threads, MB = mem_mb.
| # | Step | Rule(s) | Module | Tool | t | MB | Input → Key Output |
|---|---|---|---|---|---|---|---|
| 1 | Merge lanes | merge_raw_fastqs |
common.smk | shell (cat / symlink) | 1 | — | raw FASTQ lanes → raw_merged/*_merged_{1,2}.fastq.gz |
| 2 | FastQC raw | fastqc_raw |
qc.smk | FastQC | 2 | 4 096 | merged FASTQ → fastqc_raw/*_raw_{1,2}_fastqc.{html,zip} |
| 3 | Trim | trim_galore or fastp |
trim.smk | Trim Galore / fastp | 12 | 36 864 / 16 384 | merged FASTQ → trim/*_trimmed_{1,2}.fastq.gz |
| 4 | FastQC trimmed | fastqc_trimmed |
qc.smk | FastQC | 2 | 4 096 | trimmed FASTQ → trim/*_trimmed_{1,2}_fastqc.{html,zip} |
| 5 | BWA-MEM2 index | bwa_mem2_index |
align.smk | bwa-mem2 index | 12 | 65 536 | FASTA → .amb / .ann / .bwt.2bit.64 / .pac / .0123 |
| 6 | Bowtie2 index | bowtie2_index |
align.smk | bowtie2-build | 12 | 16 384 | FASTA → *.{1,2,3,4,rev.1,rev.2}.bt2 |
| 7 | Align | bwa_mem2_align or bowtie2_align |
align.smk | bwa-mem2 / bowtie2 + samtools view | 12 / 26 | 49 152 / 16 384 | trimmed FASTQ + index → bam/*.unsorted.bam |
| 8 | Sort BAM | sort_bam |
align.smk | samtools sort + index | 8 | 36 864 | .unsorted.bam → bam/*.bam + *.bam.bai |
| 9 | Mark duplicates | mark_duplicates |
mark_duplicates.smk | Picard MarkDuplicates | 2 | 49 152 | .bam → bam/*.markdup.sorted.bam + *.MarkDuplicates.metrics.txt |
| 10 | Samtools stats pre-filter | samtools_stats_pre_filter |
align_stats.smk | samtools stats / flagstat / idxstats | 1 | 2 048 | pre-filter BAM → bam/*.pre_filter.bam.{stats,flagstat,idxstats} |
| 11 | BAM filter | bam_filter |
bam_filter.smk | samtools view + bamtools filter + pysam bampe_rm_orphan | 8 | 49 152 | .markdup.sorted.bam + include_regions → bam/*.filtered.bam + *.bai |
| 12 | Samtools stats | samtools_stats |
align_stats.smk | samtools stats / flagstat / idxstats | 1 | 2 048 | .filtered.bam → bam/*.filtered.bam.{stats,flagstat,idxstats} |
| 13 | Picard metrics | picard_collect_multiple_metrics |
align_stats.smk | Picard CollectMultipleMetrics | 1 | 16 384 | .filtered.bam → alignment_summary / insert_size / base_dist / quality_* metrics |
| 14 | BigWig (unshifted) | bedtools_genomecov → ucsc_bedgraphtobigwig |
bam_to_bigwig.smk | bedtools genomecov + bedGraphToBigWig | 2 / 2 | 40 960 / 6 144 | .filtered.bam → bigwig/*.bedGraph → bigwig/*.bigWig |
| 15 | Shift BAM | shift_bam |
shift_bam.smk | deepTools alignmentSieve + bamCoverage (RPGC) | 26 | 98 304 | .filtered.bam → bam/*.shifted.bam + bigwig/*.shifted.bigWig |
| 16 | MACS3 peak calling | macs3_callpeak_tn5 |
call_peaks.smk | bedtools bamtobed + awk Tn5-shift + MACS3 (narrow: BED mode; broad: BAMPE mode) | 2 | 8 192 | .filtered.bam → peaks/*.tn5_shifted.bed + *_peaks.peak + *_peaks.xls |
| 17 | MACS3 peak QC | macs3_peak_qc_plot |
call_peaks.smk | R (plot_macs_qc.r) | 2 | 8 192 | *_peaks.peak → peaks/*.macs_peakqc.summary.txt + *.plots.pdf |
| 18 | featureCounts | featurecounts_in_peaks |
frip_score.smk | featureCounts / Subread (SAF, paired, unstranded) | 1 | 6 144 | .filtered.bam + peaks SAF → featurecounts/*.readCountInPeaks.txt |
| 19 | FRiP score | frip_score |
frip_score.smk | bedtools intersect (+ featureCounts log parse) | 1 | 6 144 | .filtered.bam + peaks + flagstat → peaks/*.FRiP.txt + *_peaks.{FRiP,count}_mqc.tsv |
| 20 | HOMER annotation | homer_annotate_peaks |
annotate_peaks.smk | HOMER annotatePeaks.pl + R (plot_homer_annotatepeaks.r) | 2 | 10 240 | peaks + FASTA + GTF → annotation/*_peaks.annotatePeaks.txt + *.summary.txt |
| 21 | deepTools | deeptools_compute_matrix_gene_body / deeptools_compute_matrix_tss / deeptools_plot_profile_gene_body / deeptools_plot_profile_tss / deeptools_plot_heatmap_tss / deeptools_plot_fingerprint / deeptools_fragment_size_distribution |
deeptools.smk | deepTools computeMatrix / plotProfile / plotHeatmap / plotFingerprint / bamPEFragmentSize | 2–12 | 6 144–20 480 | .shifted.bigWig (narrow) or .bigWig (broad) + .filtered.bam → deeptools/ matrices, profiles, heatmaps, fingerprint, fragment-size plots |
| 22a | NFR fragment counts | nfr_fragment_counts |
nfr.smk | samtools view + awk | 2 | 2 048–6 144 | .shifted.bam (or .filtered.bam) → nfr/*.fragment_counts_mqc.tsv (NFR/mono/di/tri fragment-length class counts) |
| 22b | NFR bigWigs + TSS profile/heatmap | nfr_bigwig_nfr / nfr_bigwig_mono / nfr_compute_matrix / nfr_plot_profile / nfr_plot_heatmap |
nfr.smk | deepTools alignmentSieve + bamCoverage + computeMatrix / plotProfile / plotHeatmap | 2–12 | 6 144–20 480 | .shifted.bam → *.nfr.bigWig + *.mono.bigWig + NFR-vs-mono TSS profile/heatmap |
| 23 | ataqv | ataqv / ataqv_mkarv / ataqv_mqc |
ataqv.smk | ataqv + mkarv + Python (extract_ataqv_score.py) | 1 | 6 144 / 1 024 / 256 | .filtered.bam + peaks + TSS + autosomes → ataqv/*.ataqv.json + *.mkarv_html/index.html + *.ataqv_mqc.tsv |
| 24 | ATACseqQC | atacseqqc_mqc |
atacseqqc.smk | ATACseqQC R pkg (calc_pt_score.R) | 1 | 16 384 | .shifted.bam + BED → atacseqqc/*.atacseqqc_mqc.tsv + fragsize/pt_score/nfr_score/tsse .png |
Module toggle summary (config.yml):
| Module | Config key | Default | Requires |
|---|---|---|---|
| Trimming | trimming.enabled |
true | — |
| Mark duplicates | markduplicates.enabled |
true | — |
| BAM filter | (always on) | always | — |
| Peak calling | call_peaks.enabled |
true | bam_filter |
| Peak QC plot | call_peaks.macs3_peak_qc_plot |
true | call_peaks |
| Annotation | annotate_peaks.enabled |
true | call_peaks |
| featureCounts + FRiP | (auto with call_peaks) | — | call_peaks |
| Shift BAM + shifted bigWig | shift_bam.enabled |
true | narrow peaks |
| deepTools | deeptools.enabled |
true | call_peaks |
| NFR fragment counts | nfr.fragment_counts |
true | narrow peaks |
| NFR bigWigs + plots | nfr.enabled |
true | shift_bam + narrow peaks |
| ataqv | ataqv.enabled |
true | call_peaks |
| ATACseqQC | atacseqqc.enabled |
true | call_peaks + shift_bam + narrow peaks |
- Linux
- Snakemake ≥ 8 (in a dedicated controller environment)
- Conda/Mamba
HPC users: skip this section and follow Running on HPC with LSF instead, which covers environment setup outside your home directory.
For local use, create a minimal controller environment:
mamba create -n atacseq_snakemake -c conda-forge -c bioconda snakemake
mamba activate atacseq_snakemakeEach rule uses its own isolated Conda environment defined in workflow/envs/*.yml.
Pass --use-conda on every Snakemake invocation so these per-rule envs are built and activated automatically.
git clone https://github.com/UKHD-NP/atacseq_snakemake_new.git
cd atacseq_snakemake_newconfig/config.yml key: samples_csv
Required columns:
sample_idfq1fq2outdir
Example:
sample_id,fq1,fq2,outdir
S1,/data/S1_L001_R1.fastq.gz,/data/S1_L001_R2.fastq.gz,results/S1
S1,/data/S1_L002_R1.fastq.gz,/data/S1_L002_R2.fastq.gz,results/S1
S2,/data/S2_R1.fastq.gz,/data/S2_R2.fastq.gz,results/S2Notes:
- Repeated
sample_idrows are treated as lanes and merged. - Even with a single lane/sample, the pipeline still runs
merge_raw_fastqs(symlink mode), so merged FASTQ filenames still use the*_merged_*suffix. - All rows for the same
sample_idmust have the sameoutdir.
Set in config/config.yml -> ref.
assembly:hg19,hg38, orcustom- For
custom, provide:ref.fastaref.gtfref.blacklist
Pipeline stages references into references/{assembly}/ and derives:
.fai,.sizes(chromsizes),.autosomes.txt,.tss.bed,.include_regions.bedref.bedis generated from GTF byprepare_genomeusing the dedicated envworkflow/envs/gtf2bed.yml(Perl + gzip/unzip) for portable HPC runs.
For cluster execution on HPC, see Running on HPC with LSF below. The commands here are for single-machine (local) execution only.
Step 1 — Dry-run first (always). Resolves the full DAG and prints every rule that would run — without executing anything:
snakemake -s workflow/Snakefile --use-conda -nStep 2 — Optionally verify with the bundled test dataset. Runs the full pipeline end-to-end on small test data:
snakemake -s workflow/Snakefile \
--configfile config/config_test.yml \
--use-conda --conda-frontend mamba \
--cores allStep 3 — Run with your real config.
# Normal run
snakemake -s workflow/Snakefile --use-conda --conda-frontend mamba --cores 24
# Rerun only failed/incomplete jobs after fixing an error
snakemake -s workflow/Snakefile --use-conda --conda-frontend mamba --cores 24 --rerun-incomplete
config/config.ymlis loaded automatically by the Snakefile as the default configfile. Pass--configfile path/to/other.ymlonly when you want to override it (e.g. for a test config).
This setup uses IBM Spectrum LSF.
A ready-made LSF profile is provided at workflow/profiles/lsf/config.yaml.
| Node | Purpose | Allowed |
|---|---|---|
<worker-node> |
Dev, install, testing | ✅ Software install, small runs |
<submit-node> |
Job submission only | ✅ Run Snakemake (lightweight), ❌ Processing |
| Cluster nodes | Computation | Jobs submitted automatically via bsub |
Do this on
<worker-node>, not on<submit-node>. Worker nodes allow software installation. Submission hosts do not.
ssh YOUR_USERNAME@<worker-node>Configure conda channels.
Some HPC clusters ban the defaults (Anaconda) channel due to licensing restrictions.
You may need to explicitly restrict to conda-forge and bioconda:
cat > ~/.condarc << 'EOF'
channels:
- conda-forge
- bioconda
EOFLoad Mamba and initialise your shell.
This adds mamba/conda to your PATH permanently via ~/.bashrc:
module load Mamba # adjust module name to your site
mamba init bash
source ~/.bashrc # apply changes to the current shell without re-logging inCreate the Snakemake controller environment outside your home directory. Home quota on HPC systems is often limited. Conda environments can easily exceed this — install them on group storage:
# Set your working directory on group storage (adjust path as needed)
YOUR_WORKDIR="/path/to/group/storage/YOUR_USERNAME"
mkdir -p ${YOUR_WORKDIR}/conda_envs
# Create the controller environment with Snakemake + the LSF executor plugin
mamba create -p ${YOUR_WORKDIR}/conda_envs/snakemake \
-c conda-forge -c bioconda \
python=3.11 \
"numpy<1.25" \
snakemake=8.30.0 \
snakemake-executor-plugin-lsf \
-y
# Activate the new environment
mamba activate ${YOUR_WORKDIR}/conda_envs/snakemake
snakemake-executor-plugin-lsftranslates Snakemake rule resources (mem_mb,runtime,threads) intobsubsubmission flags automatically — no manualbsubscripting needed.
cd ${YOUR_WORKDIR}
git clone https://github.com/UKHD-NP/atacseq_snakemake_new.git
cd atacseq_snakemake_newOpen config/config.yml and set at minimum:
samples_csv: path to your samplesheet CSVref.assembly:hg19,hg38, orcustom- Output directories (via the
outdircolumn in the samplesheet) - Enable/disable optional modules (
deeptools,ataqv,annotate_peaks, etc.)
See the Key Configuration section below for all options and defaults.
conda-prefix tells Snakemake where to build and cache the per-rule conda environments (from workflow/envs/*.yml).
All rule environments combined take roughly 5–15 GB and must live outside your home directory.
Update the placeholder path to your actual working directory:
sed -i "s|/path/to/group/storage/conda_envs|${YOUR_WORKDIR}/conda_envs|g" \
workflow/profiles/lsf/config.yaml
# Confirm the replacement was applied correctly
grep "conda-prefix" workflow/profiles/lsf/config.yamlNote: Add the following line to your
~/.bashrc(once, thensource ~/.bashrc). LSF enforces memory limits per-job, so this variable tells the LSF plugin to submit the fullmem_mbvalue as a per-job request instead of dividing it per slot:export SNAKEMAKE_LSF_MEMFMT=perjob
Resolves the full DAG and prints every rule that would run — without executing or submitting any jobs. Always do this before submitting to the cluster to catch config errors, missing inputs, or unexpected rule counts.
mamba activate ${YOUR_WORKDIR}/conda_envs/snakemake
cd ${YOUR_WORKDIR}/atacseq_snakemake_new
# Dry-run: prints all rules, checks all inputs, submits nothing
snakemake -s workflow/Snakefile --use-conda -nConfirm that the printed rule count and sample names match expectations before proceeding to Step 6.
For local testing with the bundled test dataset, see the Local Run section.
Do this on
<submit-node>, not on<worker-node>. Snakemake must run on a submission host to dispatch jobs viabsub.
Use screen so the Snakemake controller process survives SSH disconnects:
ssh YOUR_USERNAME@<submit-node>
# Create a named screen session — it keeps running after SSH disconnect
screen -S <session_name>
# Set your working directory (same value as used in Step 1)
YOUR_WORKDIR="/path/to/group/storage/YOUR_USERNAME"
# Activate the Snakemake controller environment
mamba activate ${YOUR_WORKDIR}/conda_envs/snakemake
# Move into the pipeline directory
cd ${YOUR_WORKDIR}/atacseq_snakemake_new
# Launch the pipeline — Snakemake submits each rule as a separate bsub job automatically.
# The config/config.yml is loaded automatically from the Snakefile; no --configfile needed.
# Concurrency is controlled by `jobs:` in workflow/profiles/lsf/config.yaml.
snakemake --profile workflow/profiles/lsfTo rerun only failed/incomplete jobs after fixing an error:
snakemake --profile workflow/profiles/lsf --rerun-incompleteTo rerun with the test dataset config:
snakemake --profile workflow/profiles/lsf --rerun-incomplete --configfile config/config_test.ymlForce rerun examples:
# Force one rule for all matching jobs (e.g. rerun all trim_galore jobs)
snakemake --profile workflow/profiles/lsf --forcerun trim_galore
# Force specific output files (target-level force)
snakemake --profile workflow/profiles/lsf --force \
test_data/results/SAMPLE_ID/trim/SAMPLE_ID_trimmed_1.fastq.gz \
test_data/results/SAMPLE_ID/trim/SAMPLE_ID_trimmed_2.fastq.gz
# Force all jobs in the DAG to rerun from scratch
snakemake --profile workflow/profiles/lsf --forceallscreen command |
Action |
|---|---|
screen -S <session_name> |
Start new named session |
Ctrl+A, then D |
Detach - session keeps running after SSH disconnect |
screen -ls |
List all active sessions |
screen -r <session_name> |
Re-attach to session |
screen -S <session_name> -X quit |
Kill the named session |
bjobs command |
Action |
|---|---|
bjobs -w |
List all running/pending jobs |
bjobs -w -r |
Running only |
bjobs -w -p |
Pending only |
bjobs -l JOB_ID |
Detailed info for one job |
Below is the default-style config block with practical explanation:
# Path to samplesheet CSV (columns: sample_id, fq1, fq2, outdir)
samples_csv: "samplesheet.csv"
ref:
assembly: "custom" # hg19 / hg38 / custom
fasta: "genome.fa"
gtf: "genes.gtf"
blacklist: "blacklist.v3.bed"
bwa_index: "" # optional prebuilt BWA prefix; auto-generated if empty
bowtie2_index: "" # optional prebuilt Bowtie2 prefix; auto-generated if empty
mito_name: "chrM" # MUST match FASTA mito contig exactly (e.g. MT/chrM/M)
keep_mito: false # false = exclude mitochondrial reads from include_regions
trimming:
enabled: true
delete_trimming: true # delete trimmed FASTQs after pipeline completes
tool: "trim_galore" # fastp / trim_galore
trim_galore_params: "--nextseq 25 --length 36"
fastp_params: "--cut_tail --cut_tail_window_size 4 --cut_tail_mean_quality 20 --trim_poly_g --length_required 36"
align:
tool: "bowtie2" # bowtie2 / bwa-mem2 (alias: bwa)
bowtie2_params: "--very-sensitive --no-discordant -X 2000"
bwa_params: "-I 0,2000"
bam_filter:
params: "-F 0x004 -F 0x0008 -f 0x001 -F 0x0100 -F 0x0400 -q 30"
apply_canonical_chromosomes: false # hg19/hg38: set true. Non-human or custom genomes: set false (see below)
apply_blacklist: true # true = exclude blacklist regions; false = keep them
keep_input_bam: false # true = preserve BAM before bam_filter; false = delete to save space
markduplicates:
enabled: true
call_peaks:
enabled: true
peak_type: "narrow" # narrow / broad
macs3_gsize: "2701495711" # effective genome size (preferred); if empty, auto-sum from chromsizes
macs3_narrow_params: "--trackline --shift -75 --extsize 150 --keep-dup all --nomodel --call-summits -q 0.01"
macs3_broad_params: "--trackline --keep-dup all --nomodel --broad --broad-cutoff 0.1"
macs3_peak_qc_plot: true # run plot_macs_qc.r to generate peak QC summary/plots
frip_overlap_fraction: 0.2
frip_threshold: 20
annotate_peaks:
enabled: true
deeptools:
enabled: true
# NFR / fragment-length-class analysis (runs when call_peaks.peak_type=narrow).
# enabled: NFR/mono bigWigs + computeMatrix + plotProfile/plotHeatmap (slow; requires shift_bam).
# fragment_counts: NFR/mono/di/tri read counting only (fast; independent of enabled).
nfr:
enabled: true
fragment_counts: true
nfr_max_fragment: 150 # fragments ≤ this bp → NFR bigWig / NFR count (default: 150)
mono_min_fragment: 151 # fragments ≥ this bp → mono-class bigWig / count (default: 151)
mono_max_fragment: 300 # fragments ≤ this bp → mono-class bigWig / count (default: 300)
di_min_fragment: 301 # putative di-class count lower bound (default: 301)
di_max_fragment: 500 # putative di-class count upper bound (default: 500)
tri_min_fragment: 501 # putative tri-class count lower bound (default: 501)
tri_max_fragment: 700 # putative tri-class count upper bound (default: 700)
ataqv:
enabled: true
atacseqqc:
enabled: true
multiqc:
config: "workflow/scripts/multiqc_config.yml"Explanation by block:
-
samples_csv: input table for sample discovery and lane merging. -
ref: reference genome/annotation source. Forcustom,fastaandgtfare required.blacklistis only required whenbam_filter.apply_blacklist: true.bwa_index/bowtie2_indexcan be left empty to auto-build. -
ref.mito_name: critical — must exactly match the mitochondrial contig name in your FASTA (oftenMT,chrM, orM). Used bybam_filterto buildinclude_regionsand byataqv. -
ref.keep_mito: settrueto retain mitochondrial reads ininclude_regions;false(default) excludes them. -
ref.autosome_pattern(optional): awk regex for the autosome list passed toataqv --autosomal-reference-file. Default covers hg19/hg38:^chr([1-9]|1[0-9]|2[0-2])$. Override for other genomes, e.g.^([0-9]+)$for ENSEMBL naming or^(I|II|III|IV|V)$force11. -
trimming: choose one trimming engine and pass tool-specific options. -
align: choose aligner (bowtie2orbwa-mem2) and set aligner-specific CLI parameters. The aliasbwais also accepted and maps tobwa-mem2. -
bam_filter.params: SAMtools core filter flags; see BAM Filtering for full breakdown. -
bam_filter.apply_canonical_chromosomes: controls whether reads are restricted to standard chromosomes before blacklist and mito filtering.true— filter chromosomes usingcanonical_chroms_pattern(see below). Removes noise from unplaced contigs (chrUn_*,*_random, EBV, decoy sequences) that inflate peak-calling background.false— keep all contigs present in the FASTA. Safe for any genome without configuration.ref.keep_mitostill controls whether the mitochondrial contig is included in the final region set regardless of this flag.
-
bam_filter.canonical_chroms_pattern(optional): awk regex applied whenapply_canonical_chromosomes=true. Default covers hg19/hg38:^chr([1-9]|1[0-9]|2[0-2]|X|Y|M)$. Override for other genomes:Genome Pattern hg19 / hg38 (default) `^chr([1-9] Mouse mm10/mm39, rat rn7, zebrafish danRer11 `^chr([0-9]+ ENSEMBL naming (no chrprefix, e.g.1,X,MT)`^([0-9]+ C. elegans ce11 (I–V, X, MtDNA) `^(I{1,3} -
bam_filter.apply_blacklist: settrue(default) to exclude blacklist intervals frominclude_regions; setfalseto keep blacklist regions while still respectingref.keep_mito. -
bam_filter.keep_input_bam: settrueto preserve the BAM enteringbam_filter(*.markdup.sorted.bamwhen duplicate marking is enabled, otherwise*.bam). -
markduplicates.enabled: run Picard MarkDuplicates before filtering. When disabled, duplicates are not flagged and-F 0x0400inbam_filter.paramshas no effect. -
trimming.delete_trimming: whentrue, trimmed FASTQ files are deleted after the pipeline completes. -
call_peaks.peak_type:narrowuses filtered BAM →bamtobed→ awk Tn5 shift (+4 forward / -5 reverse) → MACS3 BED mode;broaduses filtered BAM directly in MACS3 BAMPE mode. -
call_peaks.macs3_peak_qc_plot: whentrue, runsplot_macs_qc.rto produce*.macs_peakqc.summary.txtand*.macs_peakqc.plots.pdf. -
call_peaks.frip_overlap_fraction: minimum read-peak overlap fraction for FRiP (passed to bothbedtools intersect -fand featureCounts--fracOverlap). -
call_peaks.frip_threshold: FRiP percentage threshold for quality label in*.FRiP.txt; samples at or above this value are labelledgood, below isbad(default: 20%). -
annotate_peaks.enabled: run HOMERannotatePeaksand summary plotting. -
shift_bam.enabled(default:true): Tn5-shift the filtered BAM withalignmentSieve --ATACshiftand produceshifted.bam+shifted.bigWig. Setfalseto skip — saves significant time (24 threads, up to 16h runtime) and disk. Disabling also skips NFR analysis and ATACseqQC, which both requireshifted.bam. -
deeptools.enabled: run computeMatrix/plotProfile/plotHeatmap/plotFingerprint modules. Requirescall_peaks.enabled=true. For narrow peaks, computeMatrix uses the Tn5-shifted bigWig (shifted.bigWig); for broad peaks, it uses the unshifted bigWig (bigWig). -
nfr.enabled(default:true): enable/disable NFR/mono bigWigs, computeMatrix, plotProfile, and plotHeatmap (the slow part). Requiresshift_bam.enabled=trueandcall_peaks.peak_type=narrow. Setfalseto skip bigWig generation and TSS profiles while keeping fragment counts. -
nfr.fragment_counts(default:true): enable/disable NFR/mono/di/tri read counting (fragment_counts_mqc.tsv). Fast and independent — runs even whennfr.enabled=false. Usesshifted.bamwhenshift_bam.enabled=true, otherwise falls back tofiltered.bam. Requirescall_peaks.peak_type=narrow. Fragment size boundaries can be tuned in the samenfr:block (see config comments for defaults). -
ataqv.enabled: run ATAC-specific QC (ataqv) and render interactive HTML (mkarv), plus extract short mononucleosomal ratio and TSS enrichment score (TSSE) score toataqv_mqc.tsvfor MultiQC. Requirescall_peaks.enabled=true. -
atacseqqc.enabled: run ATACseqQC R package to compute Promoter/transcript body score (PT), per-TSS nucleosome-free region score (NFR), and TSS enrichment score (TSSE) and produce QC plots. Outputsatacseqqc_mqc.tsvfor MultiQC. Requirescall_peaks.enabled=true,call_peaks.peak_type=narrow, andshift_bam.enabled=true. -
multiqc.config: path to MultiQC config used by this pipeline.
trimming.tool: trim_galorewith--nextseq 25 --length 36: chosen for two-color Illumina runs (poly-G prone) and to remove very short reads that are usually uninformative for peak calling.align.tool: bowtie2with--very-sensitive --no-discordant -X 2000: chosen to maximize paired-end sensitivity while constraining improbable pair structure for ATAC fragment lengths.bam_filter.params: ... -q 30: chosen as a relatively strict default to retain high-confidence alignments for downstream peak calling and signal tracks.call_peaksdefaults (--shift -75 --extsize 150 --nomodel): for narrow peaks, reads are Tn5-shifted inline by awk (+4 on forward strand, -5 on reverse strand) before being passed to MACS3 BED mode.--shift -75then pulls each cut-site tag a further 75 bp upstream so that the 150 bp extension lands symmetrically around the insertion site. Positive shift would offset windows away from the cut site.
macs3_gsizeis passed to MACS3 as--gsize.- Prefer effective genome size values from deepTools documentation:
https://deeptools.readthedocs.io/en/latest/content/feature/effectiveGenomeSize.html - If
macs3_gsizeis empty, this pipeline falls back to summing chromosome sizes fromref.chromsizes. - Common values (from deepTools table):
hg19:2864785220hg38:2913022398
After alignment and coordinate sorting, duplicates are marked with Picard (mark_duplicates, when enabled), then bam_filter creates *.filtered.bam.
Important: bam_filter.params is the SAMtools core filter only.
Additional filters are applied by BAMTools and Pysam (if available in the runtime env).
Why this default filtering strategy is used:
- It prioritizes specificity for ATAC peak detection.
- It reduces noisy alignments before MACS3, FRiP, and bigWig generation.
- It is a practical strict default (
-q 30); for low-depth data you can relax to-q 20.
Practical examples:
- Keep mitochondrial filtering, but disable blacklist filtering:
ref.keep_mito: falseandbam_filter.apply_blacklist: false - Preserve the BAM before filtering for debugging or alternate peak-calling runs:
bam_filter.keep_input_bam: true
Current default:
-F 0x004 -F 0x0008 -f 0x001 -F 0x0100 -F 0x0400 -q 30
Meaning:
-F 0x004: remove unmapped reads-F 0x0008: remove reads whose mate is unmapped-f 0x001: keep only reads flagged as paired-F 0x0100: remove secondary alignments-F 0x0400: remove reads marked as duplicates-q 30: keep reads with MAPQ >= 30 (remove lower-confidence multimappers/ambiguous mappings)
bam_filter also uses -L ref.include_regions with samtools view:
- this always constrains BAMs to the generated include regions
- canonical chromosomes are enforced there when
bam_filter.apply_canonical_chromosomes: true - blacklist intervals are excluded there when
bam_filter.apply_blacklist: true - mitochondrial contig is excluded there when
ref.keep_mito: false - if
bam_filter.apply_blacklist: falseandref.keep_mito: false,chrM/MTis still filtered out as long asref.mito_namematches the FASTA
From workflow/scripts/bamtools_filter_pe.json, extra constraints are:
- mismatches
NM <= 4 - remove soft-clipped reads (
CIGARcontainingS) — note: this rejects any read with even 1 bp soft-clip, which can be overly aggressive; seeresources/pipeline_comparison.mdSection 7 for a recommended fix - keep insert size in
[-2000, 2000]
From workflow/scripts/bampe_rm_orphan.py (--only_fr_pairs mode):
- remove singleton/orphan reads
- keep only read pairs on the same chromosome
- keep only FR-oriented proper pairs
- remove pairs where one mate fails the pair criteria
Per sample under <outdir>:
- BAM
bam/{sample}.filtered.bambam/{sample}.filtered.bam.baibam/{sample}.shifted.bam(narrow peaks only)bam/{sample}.shifted.bam.bai(narrow peaks only)bam/{sample}.markdup.sorted.bam/bam/{sample}.markdup.sorted.bam.baiorbam/{sample}.bam/bam/{sample}.bam.baimay also be retained whenbam_filter.keep_input_bam: true
- BAM stats
bam/{sample}.pre_filter.bam.statsbam/{sample}.pre_filter.bam.flagstatbam/{sample}.pre_filter.bam.idxstatsbam/{sample}.filtered.bam.statsbam/{sample}.filtered.bam.flagstatbam/{sample}.filtered.bam.idxstatsbam/{sample}.markdup.sorted.MarkDuplicates.metrics.txtbam/{sample}.CollectMultipleMetrics.alignment_summary_metricsbam/{sample}.CollectMultipleMetrics.base_distribution_by_cycle.pdfbam/{sample}.CollectMultipleMetrics.base_distribution_by_cycle_metricsbam/{sample}.CollectMultipleMetrics.insert_size_histogram.pdfbam/{sample}.CollectMultipleMetrics.insert_size_metricsbam/{sample}.CollectMultipleMetrics.quality_by_cycle.pdfbam/{sample}.CollectMultipleMetrics.quality_by_cycle_metricsbam/{sample}.CollectMultipleMetrics.quality_distribution.pdfbam/{sample}.CollectMultipleMetrics.quality_distribution_metrics
- Signal tracks
bigwig/{sample}.bedGraphbigwig/{sample}.scale_factor.txtbigwig/{sample}.bigWigbigwig/{sample}.shifted.bigWig(narrow peaks only)
- Peaks / FRiP
peaks/{sample}.tn5_shifted.bed(narrow peaks only)peaks/{sample}_peaks.peakpeaks/{sample}_peaks.xlspeaks/{sample}.FRiP.txtpeaks/{sample}_peaks.FRiP_mqc.tsvpeaks/{sample}_peaks.count_mqc.tsvpeaks/{sample}.macs_peakqc.summary.txtpeaks/{sample}.macs_peakqc.plots.pdffeaturecounts/{sample}.readCountInPeaks.txtfeaturecounts/{sample}.readCountInPeaks.txt.summary
- Annotation
annotation/{sample}_peaks.annotatePeaks.txtannotation/{sample}.macs_annotatePeaks.summary.txt
- Read QC / trimming
fastqc_raw/{sample}_raw_1_fastqc.html+.zipfastqc_raw/{sample}_raw_2_fastqc.html+.ziptrim/{sample}_1.fastq.gz_trimming_report.txtandtrim/{sample}_2.fastq.gz_trimming_report.txt(Trim Galore mode)trim/{sample}_trimmed_1.fastq.gzandtrim/{sample}_trimmed_2.fastq.gz(unless deleted during cleanup)trim/{sample}_trimmed_1_fastqc.html+.zipandtrim/{sample}_trimmed_2_fastqc.html+.zip(Trim Galore mode)trim/{sample}.fastp.htmlandtrim/{sample}.fastp.json(fastp mode)
- deepTools
deeptools/{sample}.gene_body.computeMatrix.gzdeeptools/{sample}.gene_body.computeMatrix.tabdeeptools/{sample}.gene_body.plotProfile.pdf+.tabdeeptools/{sample}.tss.computeMatrix.gzdeeptools/{sample}.tss.computeMatrix.tabdeeptools/{sample}.tss.plotProfile.pdf+.tabdeeptools/{sample}.tss.plotHeatmap.pdf+.tabdeeptools/{sample}.plotFingerprint.pdfdeeptools/{sample}.plotFingerprint.raw_counts.txtdeeptools/{sample}.plotFingerprint.qcmetrics.txtdeeptools/{sample}.fragment_size_distribution.pdfdeeptools/{sample}.fragment_size.raw_lengths.txtdeeptools/{sample}.fragment_size.qcmetrics.txt
- NFR analysis
nfr/{sample}.fragment_counts_mqc.tsv(NFR / mono / di / tri fragment-length class counts + % for MultiQC)nfr/{sample}.nfr.bigWig(fragments ≤150 bp)nfr/{sample}.mono.bigWig(fragments 151–300 bp; putative mono-class)nfr/{sample}.nfr_vs_mono.computeMatrix.gznfr/{sample}.nfr_vs_mono.computeMatrix.tabnfr/{sample}.nfr_vs_mono.plotProfile.pdf+.tabnfr/{sample}.nfr_vs_mono.plotHeatmap.pdf+.tab
- ataqv
ataqv/{sample}.ataqv.jsonataqv/{sample}.mkarv_html/index.htmlataqv/{sample}.ataqv_mqc.tsv(NFR ratio + TSSE score for MultiQC)
- ATACseqQC
atacseqqc/{sample}.fragsize_dist.pngatacseqqc/{sample}.pt_score.pngatacseqqc/{sample}.nfr_score.pngatacseqqc/{sample}.tsse.pngatacseqqc/{sample}.atacseqqc_mqc.tsv(PT/NFR/TSSE scores for MultiQC)
- MultiQC
multiqc/{sample}.multiqc.html
- Cleanup
logs/{sample}.deletion.log
How to read:
- Raw ATAC-seq data can show end-bias and poly-G artifacts (especially on two-color chemistry platforms).
- After trimming, base-content curves should be more stable and less biased at read ends.
How to read:
- Before trimming, adapter contamination can be high at read tails.
- After trimming, adapter signal should drop strongly (ideally near zero across most cycles).
Produced by: bamPEFragmentSize → deeptools/{sample}.fragment_size_distribution.pdf
How to read:
- Good ATAC libraries usually show a short-fragment peak plus putative mono-/di-nucleosome periodic peaks.
- A flat/noisy pattern without clear peaks often indicates lower signal quality.
How to read:
- Curves help assess enrichment and library complexity.
- Better enrichment typically separates signal from background more clearly.
Sources: ENCODE ATAC-seq Standards · ataqv · ATACseqQC
| Metric | Source | Tool / Output | Target | Acceptable |
|---|---|---|---|---|
| Alignment rate | ENCODE | Bowtie2 / BWA-MEM2 log | >95% | ≥80% |
| Duplication rate | general practice | *.MarkDuplicates.metrics.txt |
<20% | <30% |
| FRiP score | ENCODE | *_peaks.FRiP_mqc.tsv |
≥0.3 | ≥0.2 |
| NFR ratio (short:mono) | ataqv | ataqv/*.ataqv_mqc.tsv |
>2 | — |
| TSSE score | ataqv / ENCODE (hg38) | ataqv/*.ataqv_mqc.tsv |
≥7 | ≥5 |
| PT score (2^mean) | ATACseqQC / pipeline | atacseqqc/*.atacseqqc_mqc.tsv |
≥10 | ≥5 |
| NFR score (mean) | ATACseqQC | atacseqqc/*.atacseqqc_mqc.tsv |
>0 | — |
| TSSE score | ATACseqQC / ENCODE | atacseqqc/*.atacseqqc_mqc.tsv |
≥7 | ≥5 |
Source: ENCODE ATAC-seq Standards
What it measures: The proportion of all mapped reads that fall within called peak regions.
FRiP = reads_overlapping_peaks / total_mapped_reads
Why it matters: A high FRiP means most of your sequencing reads captured genuine open chromatin sites, rather than noisy background. Low FRiP suggests poor enrichment, over-amplification, or degraded nuclei.
Thresholds (from ENCODE):
≥0.3— strong enrichment; ENCODE recommended target≥0.2— ENCODE minimum passing standard<0.2— poor enrichment; inspect fragment size distribution and TSS enrichment
In this pipeline: two FRiP estimates are reported per sample — one from bedtools intersect and one from featureCounts. Both appear in MultiQC. The quality label (good / bad) in *.FRiP.txt uses call_peaks.frip_threshold (default 20%).
Source: ataqv (short_mononucleosomal_ratio field); no official ENCODE threshold — use comparatively across samples
What it measures: The ratio of sub-nucleosomal (TF-bound) fragments to mononucleosomal fragments, as computed by ataqv.
NFR ratio = count(fragments ≤ 100 bp) [hqaa_tf_count]
────────────────────────────────────────────
count(fragments 180–300 bp) [hqaa_mononucleosomal_count]
This is a ratio, not a percentage of all reads. A ratio of 5 means there are 5 short TF-bound fragments for every 1 mononucleosomal fragment.
Why it matters: A healthy ATAC-seq library should be dominated by sub-nucleosomal insertions (nucleosome-free regions). A low ratio means the library is enriched for mononucleosomal fragments, indicating the nuclei had poor chromatin accessibility or Tn5 over-digestion.
Thresholds: No official ENCODE cutoff exists for this metric. As a rough community benchmark:
>2— generally considered adequate (more short NFR fragments than mononucleosomal)<1— likely poor library; mononucleosomal fragments dominate
Compare values across your own samples. A consistent drop within a batch is more informative than any single absolute threshold.
In this pipeline: extracted from ataqv JSON as short_mononucleosomal_ratio, reported in ataqv/{sample}.ataqv_mqc.tsv, shown in MultiQC.
Source: ataqv (Parker Lab); thresholds from ENCODE ATAC-seq Standards for GRCh38 + RefSeq TSS annotation
What it measures: Signal enrichment at Transcription Start Sites (TSS) relative to the flanking background. Calculated by ataqv.
TSS enrichment = mean signal in TSS ±150 bp window
─────────────────────────────────
mean signal in flanking regions (1400–2000 bp from TSS)
Why it matters: Tn5 transposase preferentially inserts at open chromatin concentrated at active promoters. A high TSS enrichment confirms the experiment captured nucleosome-free promoter regions. A flat score indicates high background, poor Tn5 enrichment, or degraded chromatin.
Thresholds (GRCh38 RefSeq TSS — scale down for non-human genomes):
≥7— ENCODE target; excellent signal-to-noise5–7— acceptable; usable but noisier peak calls<5— poor; re-check nuclei isolation and Tn5 titration
These thresholds are annotation-dependent. For mouse (mm10) or custom genomes, expect lower absolute values. Compare across your own samples rather than using human thresholds as hard cutoffs.
In this pipeline: reported in ataqv/{sample}.ataqv_mqc.tsv, shown in MultiQC.
Source: ATACseqQC R package (Ou et al., 2018, Genome Biology); thresholds are pipeline-defined heuristics (ATACseqQC does not publish fixed cutoffs)
What it measures: Whether Tn5 insertions (5′ read ends from shifted BAM) are enriched at promoters relative to gene bodies. Calculated by the ATACseqQC R package.
For each transcript:
promoter_window = [TSS−2000, TSS+500] (strand-aware)
body_window = next 2500 bp downstream of promoter
PT score (log2) = log2(mean_5prime_density_in_promoter + ε)
− log2(mean_5prime_density_in_body + ε)
Final: mean and median PT score across all transcripts
The mean PT score is in log2 scale; the equivalent linear ratio is 2^PT_score_mean.
Why it matters: ATAC-seq signal is concentrated at promoters. A high PT score confirms most signal comes from promoter-proximal nucleosome-free regions. Low PT scores indicate high background, poor Tn5 enrichment, or a ChIP-like signal profile.
Thresholds (linear scale, 2^mean_PT) — pipeline-defined heuristic:
≥10— strong enrichment; typical for high-quality libraries≥5— PASS (pipeline threshold incalc_pt_score.R)<5— FAIL flag written to log; ATACseqQC itself does not publish a fixed cutoff
The ≥5 cutoff is set in this pipeline's
calc_pt_score.R, not by the ATACseqQC package or its vignette. Compare across your own samples.
In this pipeline: calculated on shifted BAM (shifted.bam), reported alongside NFR score in atacseqqc/{sample}.atacseqqc_mqc.tsv. A [INFO] QC: PASS or [WARNING] QC: FAIL line is written to the rule log. Scatter plot saved to atacseqqc/{sample}.pt_score.png. Only available for narrow peak mode.
Source: ATACseqQC R package (NFRscore(), Ou et al., 2018, Genome Biology)
What it measures: Whether the 100 bp window centred on each TSS is more accessible than its flanking nucleosome positions. Computed per TSS and summarised as mean/median across all TSS.
For each TSS (400 bp window, strand-aware):
n1 = upstream 150 bp (nucleosome flank)
nf = middle 100 bp (nucleosome-free region)
n2 = downstream 150 bp (nucleosome flank)
NFR score = log2(nf) − log2((n1 + n2) / 2)
A positive score means the NFR window has more Tn5 insertion signal than the average nucleosome flank. Higher values indicate stronger nucleosome depletion at TSS.
Why it matters: Complements the PT score. PT score uses a broad 5 kb promoter vs gene-body window; NFR score zooms into the 400 bp TSS window and directly quantifies nucleosome eviction at the TSS itself.
Thresholds: ATACseqQC does not publish fixed cutoffs. A positive mean NFR score (> 0) indicates the expected TSS accessibility pattern; compare across samples within the same experiment.
In this pipeline: computed alongside PT score on shifted BAM; mean and median NFR score are reported as additional columns in atacseqqc/{sample}.atacseqqc_mqc.tsv. Scatter plot saved to atacseqqc/{sample}.nfr_score.png. Only available for narrow peak mode.
Source: ATACseqQC R package (TSSEscore(), Ou et al., 2018); definition from ENCODE data standards
What it measures: Aggregate read enrichment at TSS relative to flanking background — the same concept as ataqv's TSS enrichment score but computed independently by ATACseqQC.
For each TSS (±1000 bp window, 100 bp steps):
per-step score = depth at step / mean depth at 100 bp end flanks
TSSE = max(mean(per-step score across all TSS))
Why it matters: An independent TSS enrichment estimate that can be compared to ataqv's tss_enrichment_score. Discordance between the two may indicate annotation or BAM handling differences.
Thresholds (GRCh38 RefSeq, from ENCODE):
≥7— ENCODE target5–7— acceptable<5— poor; same interpretation as ataqv TSS enrichment
In this pipeline: computed alongside PT score and NFR score on shifted BAM; reported as TSSE_score column in atacseqqc/{sample}.atacseqqc_mqc.tsv. Plot with ENCODE threshold lines saved to atacseqqc/{sample}.tsse.png. Only available for narrow peak mode.
Source: General bioinformatics practice (not a direct ENCODE threshold — ENCODE uses NRF/PBC1/PBC2 which require a separate counting step not implemented in this pipeline)
What it measures: The fraction of reads flagged as PCR/optical duplicates by Picard MarkDuplicates.
Duplication rate = duplicate reads / total mapped reads
Why it matters: High duplication indicates over-amplification or low-complexity library — most reads are copies of the same fragment rather than independent Tn5 insertions. This artificially inflates peak signal.
Thresholds (community practice):
<20%— good library complexity20–30%— acceptable; consider using more input material next time>30%— poor complexity; reduce PCR cycles or increase input
In this pipeline: reported in bam/{sample}.markdup.sorted.MarkDuplicates.metrics.txt (PERCENT_DUPLICATION column), parsed automatically by MultiQC.
Pipeline removes some intermediates to reduce storage, for example:
- unsorted BAM after sort
- pre-filter BAM after filtering
- merged/trimmed FASTQ files in cleanup step after MultiQC
MissingInputExceptionin MultiQC:- check module toggles and corresponding outputs
- run dry-run first
- Rule env issues:
- ensure
--use-conda - delete broken env under
.snakemake/conda/and rerun
- ensure
- Large runs:
- increase
--cores - tune per-rule params in config (aligner/deeptools/featureCounts)
- increase
- Job killed / out of memory on HPC:
- check the LSF job log (
bpeek JOB_IDorbhist -l JOB_ID) to confirm out-of-memory (OOM) as the cause - quick fix: add or increase
mem_mbfor the failing rule inworkflow/profiles/lsf/config.yamlunderset-resources— this overrides the rule default without touching the code - permanent fix: if the rule's default in
workflow/modules/<rule>.smkunderresources:is too low, increasemem_mbthere so the default itself is correct for all runs
- check the LSF job log (
A huge thank you to Dr. Isabell Bludau, Dr.med.Abigail Suwala, Dr. Paul Kerbs, Quynh Nhu Nguyen and Temesvari-Nagy Levente from Heidelberg University Hospital and the German Cancer Research Center (DKFZ) for their support, feedback, and contributions to this pipeline.
Key resources and prior work this pipeline draws from:
-
Niu Y. ATAC-seq data analysis: from FASTQ to peaks. Published March 20, 2019. https://yiweiniu.github.io/blog/2019/03/ATAC-seq-data-analysis-from-FASTQ-to-peaks/
-
Patel H, Espinosa-Carrasco J, Langer B, Ewels P, et al. nf-core/atacseq [v2.1.2]. Zenodo; 2022. https://nf-co.re/atacseq/2.1.2/
-
Yuan B. ATAC-seq Data Analysis. Presented at: BaRC Hot Topics; April 4, 2024; Whitehead Institute for Biomedical Research. http://barc.wi.mit.edu/education/hot_topics/ATACseq_2024/ATACseq2024_4slidesPerPage.pdf
-
Ou J, Liu H, Yu J, et al. ATACseqQC: a Bioconductor package for post-alignment quality assessment of ATAC-seq data. BMC Genomics. 2018;19(1):169. Published 2018 Mar 1. doi:10.1186/s12864-018-4559-3
Follow the repository MIT License and tool licenses used in workflow/envs/.

