Bonsai Test Data

Overview

This repository contains curated test dataset for Bonsai.

It is intended to support:

Local development environments
Integration and end-to-end testing
Demo instances
PRP‑driven sample uploads
Database seeding during bootstrap

This repo is not part of any single microservice. It serves as a shared, versioned source of truth for reproducible Bonsai test environments.

Docker Init Container

This repository also provides a Docker image that Bonsai environments use to mount test data.

Build locally

docker build -t bonsai-test-data:local .

Usage in Docker Compose (Dev/E2E)

init-test-data:
  image: ghcr.io/clinicalgenomicslund/bonsai-test-data:v0.1.0
  volumes:
    - testdata:/mnt/testdata
  command: ["sh", "-c", "cp -r /dataset/* /mnt/testdata"]

Reanalyze datasets

Updates to JASEN can require the test data to be reanalyzed. Here are the steps to redownload the data and recompute the results.

Note See JASEN docs for installation instructions and how to run it.

Download the datasets if needed.

./scripts/download_fastqs.sh -i bioprojects/PRJEB77209.illumina.tsv

Create a JASEN input file using the path to the downloaded fastq files.

./scripts/make_jasen_input.sh              \
    -i bioprojects/PRJEB77209.illumina.tsv \
    -f /path/to/fastq/                     \
    -o /output/dir/

Then run JASEN to produce the output files.

NOTE: You have to add and assay column using the SMD convenience start_nextflow_analysis to run JASEN.

nextflow run main.nf                                      \
        -profile staphylococcus_aureus,illumina,apptainer \
        -config nextflow.config                           \
        --csv /output/dir/PRJEB77209.illumina.csv

Copy the files to the repo as either a new pipline version or overwrite existing result.

# if relevant change the version of jasen
jasen_version=1.2.0
resultPath=/fs1/results_dev/jasen/saureus
targetDir="/path/to/repo/results/v${jasen_version}/saureus"

# find all new result files
mkdir -p "${targetDir}"
tail -n +2 PRJEB77209.illumina.csv | awk -F',' '{print $1}' | while read -r id; do
  cd "$resultPath"
  find . -name "${id}*" -exec echo cp -R --parents {} "$targetDir" \;
done

Finally subset large files to reduce repo size and reindex bam indexes.

cd "${targetDir}"

find . -name '*.bam' -print0 | while IFS= read -r -d '' line; do
  # Create downsampled BAM
  samtools view -b -s 0.01 "$line" -o "${line}.mini" || continue

  # Replace original only if the new file was created successfully
  if [ -s "${line}.mini" ]; then
    rm -- "$line" && mv -- "${line}.mini" "$line"
  else
    echo "Warning: downsampled file is empty or missing for: $line" >&2
    rm -f -- "${line}.mini"
 fi
  samtools index "$line"
done

find . -name '*.fasta' -exec gzip {} \;

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
.github/workflows		.github/workflows
bioprojects		bioprojects
data		data
results		results
scripts		scripts
seed		seed
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bonsai Test Data

Overview

Docker Init Container

Build locally

Usage in Docker Compose (Dev/E2E)

Reanalyze datasets

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Bonsai Test Data

Overview

Docker Init Container

Build locally

Usage in Docker Compose (Dev/E2E)

Reanalyze datasets

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages