pySec2Pri

Create and use mapping files for secondary (retired/withdrawn) biological database identifiers and symbols to primary (current) identifiers and symbols.

Outputs mappings in SSSOM format by default. Subjects are secondary, objects are primary.

Installation

uv pip install pysec2pri

Or install from source:

uv pip install git+https://github.com/jmillanacosta/pysec2pri.git

Quick Start

Generating mapping sets

To obtain the secondary to primary identifier SSSOM mapping set for ChEBI:

pysec2pri chebi

This will automatically download the latest ChEBI release and generate an SSSOM mapping file in your current directory.

To process locally and specify the output:

pysec2pri chebi ChEBI_complete_3star.sdf --output my_mappings.sssom.tsv

For more options and help on any command:

pysec2pri --help
pysec2pri chebi --help

The default output is in SSSOM (Simple Standard for Sharing Ontology Mappings) TSV format.

Updating IDs and symbols

A generated mapping set can be used to update IDs and symbols in Python:

from pysec2pri import generate_chebi_synonyms, resolve_symbols
cs = generate_chebi_synonyms()
resolve_symbols(["Glucose", "ATP", "Guanine"], cs)

Or from the command line, given a TSV file gene_ex.tsv:

gene	data
HGNC:131	3.5

Resolve the gene column to primary HGNC IDs (a new _primary column is added):

pysec2pri update-ids gene_ex.tsv hgnc --at gene -o gene_ex_primary.tsv
# gene        data    gene_primary
# HGNC:131    3.5     HGNC:145

The same pattern works for symbols with update-symbols, and multiple columns can be resolved by repeating --at:

pysec2pri update-ids data.tsv hgnc --at gene_id --at related_gene_id

To skip regenerating the mapping set, pass a pre-built mapping file:

pysec2pri hgnc ids  # outputs hgnc_{version}_sssom.tsv
pysec2pri update-ids gene_ex.tsv hgnc --at gene --mapping hgnc_{version}_sssom.tsv

Documentation

Full documentation: https://pysec2pri.readthedocs.io/

Supported Databases

Datasource	license	citation
ChEBI	CC BY 4.0.	Hastings J, Owen G, Dekker A, et al. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Research. 2016 Jan;44(D1):D1214-9. DOI: 10.1093/nar/gkv1031. PMID: 26467479; PMCID: PMC4702775.
HMDB	CC0	Wishart DS, Guo A, Oler E, Wang F, Anjum A, Peters H, Dizon R, Sayeeda Z, Tian S, Lee BL, Berjanskii M, Mah R, Yamamoto M, Jovel J, Torres-Calzada C, Hiebert-Giesbrecht M, Lui VW, Varshavi D, Varshavi D, Allen D, Arndt D, Khetarpal N, Sivakumaran A, Harford K, Sanford S, Yee K, Cao X, Budinski Z, Liigand J, Zhang L, Zheng J, Mandal R, Karu N, Dambrova M, Schiöth HB, Greiner R, Gautam V. HMDB 5.0: the Human Metabolome Database for 2022. Nucleic Acids Res. 2022 Jan 7;50(D1):D622-D631. doi: 10.1093/nar/gkab1062. PMID: 34986597; PMCID: PMC8728138.
HGNC	link	Seal RL, Braschi B, Gray K, Jones TEM, Tweedie S, Haim-Vilmovsky L, Bruford EA. Genenames.org: the HGNC resources in 2023. Nucleic Acids Res. 2023 Jan 6;51(D1):D1003-D1009. doi: 10.1093/nar/gkac888. PMID: 36243972; PMCID: PMC9825485.
NCBI	link	Sayers EW, Bolton EE, Brister JR, Canese K, Chan J, Comeau DC, Connor R, Funk K, Kelly C, Kim S, Madej T, Marchler-Bauer A, Lanczycki C, Lathrop S, Lu Z, Thibaud-Nissen F, Murphy T, Phan L, Skripchenko Y, Tse T, Wang J, Williams R, Trawick BW, Pruitt KD, Sherry ST. Database resources of the national center for biotechnology information. Nucleic Acids Res. 2022 Jan 7;50(D1):D20-D26. doi: 10.1093/nar/gkab1112. PMID: 34850941; PMCID: PMC8728269.
UniProt	CC BY 4.0	UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021 Jan 8;49(D1):D480-D489. doi: 10.1093/nar/gkaa1100. PMID: 33237286; PMCID: PMC7778908.
Wikidata		Vrandecic, D., Krotzsch, M. Wikidata: a free collaborative knowledgebase. Communications of the ACM. 2014. doi: 10.1145/2629489.

License

MIT License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
.github		.github
config		config
docs		docs
src/pysec2pri		src/pysec2pri
tests		tests
.cruft.json		.cruft.json
.gitignore		.gitignore
.gitlab-ci.yml		.gitlab-ci.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yml		.readthedocs.yml
CITATION.cff		CITATION.cff
LICENSE		LICENSE
README.md		README.md
justfile		justfile
pyproject.toml		pyproject.toml
tox.ini		tox.ini
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

pySec2Pri

Installation

Quick Start

Generating mapping sets

Updating IDs and symbols

Documentation

Supported Databases

License

About

Uh oh!

Releases 3

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

pySec2Pri

Installation

Quick Start

Generating mapping sets

Updating IDs and symbols

Documentation

Supported Databases

License

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 3

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages