Skip to content

ScilifelabDataCentre/divbase

Repository files navigation

DivBase: Share and query genetic variant data at scale

Documentation License: MIT PyPI

DivBase is a service built and maintained by SciLifeLab Data Centre that enables life science researchers at Swedish institutions and their collaborators to manage, explore, and query genomic variants in VCF format alongside associated sample metadata. The service provides a secure platform for managing genomic variants and metadata files for non-human and non-sensitive data.

Note

DivBase is currently in pre-release for a limited number of users. We are actively seeking feedback to help shape the service. If you would like to be involved in testing or have suggestions, please reach out at dsn-eb@scilifelab.se or open a GitHub Issue.


Want to try out DivBase?

Key Features

Overview of DivBase Features Overview of DivBase Features

DivBase allows you to

  • Store all your variant data and metadata in one place - a single, centralised store for all VCF files and sample metadata from your research project.
  • Collaborate and share data with colleagues and collaborators, with full control over who has access to what.
  • Query across all your VCF files at once, or narrow down to a subset of your choosing.
  • Filter on variant data and sample metadata in the same query - no need to join results together manually.
  • Integrate the system into your pipelines and HPC jobs - use DivBase programmatically wherever your workflow needs it.
  • Keep all files under version control and backed up.
  • Checkpoint the state of your project's files to refer back to at a later date - making your research more easily reproducible.

Documentation

For guides, tutorials, and command references, visit our documentation website.

Install divbase-cli

To manage files, submit queries, and interact with DivBase, install our command line tool divbase-cli using uv or pipx:

uv tool install divbase-cli
# or with pipx
pipx install divbase-cli

For detailed instructions and alternative methods, see our Installation Guide.

Quick Start Guide

Tip

Go to our documentation website for the proper Quick Start Guide.

  1. Create an Account: Sign up at the DivBase Web Interface.

  2. Configure: Add your project to your CLI config and set it as default:

    divbase-cli config add MY_PROJECT_NAME --default
  3. Login:

    divbase-cli auth login your.email@example.com
  4. Upload Data and sync your data

    divbase-cli files upload data/my_metadata.tsv
    divbase-cli files upload data/my_samples/*.vcf.gz
    divbase-cli dimensions update
  5. Run a Query:

    # Filter samples based on metadata and subset a chromosomal region in one
    divbase-cli query vcf \
       --tsv-filter "Area:Northern Portugal" \
       --command "view -r 21:15000000-25000000"

This will submit a job to DivBase and once the job is complete, a new vcf.gz file containing the subset of data you requested will be available for download/streaming in your downstream analysis.

Get Support


Developers/Contributing

We welcome contributions! See our Contributing and Developer Setup guides to get started. Feel free to reach out if you have questions or aren't sure where to start.

About

(Work in Progress) DivBase - A service to manage, query and version VCF files and associated sample metadata

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors