Skip to content

Releases: Thomas-Rauter/scholid

scholid 0.2.0

04 Jun 08:54

Choose a tag to compare

scholid 0.2.0

This release expands scholid from 7 to 20 identifier types while keeping the same six exported functions: scholid_types(), is_scholid(), normalize_scholid(), extract_scholid(), classify_scholid(), and detect_scholid_type().

New identifier types

Each new type supports structural validation, normalization from URLs and labels, and extraction from free text through the existing APIs.

  • ROR — Research Organization Registry iDs (checksum-validated)
  • RRID — Research Resource Identifiers
  • SWHID — Software Heritage persistent identifiers
  • OpenAlex — OpenAlex entity keys (W, A, S, …)
  • bibcode — SAO/NASA ADS bibliographic codes
  • ISNI — International Standard Name Identifier (compact form; hyphenated ORCID-shaped strings remain orcid)
  • ARK — Archival Resource Keys (ark:/NAAN/Name)
  • UniProt — UniProtKB accessions
  • refseq — NCBI RefSeq accessions (versioned)
  • sra — INSDC Sequence Read Archive accessions (SRR, SRX, SRP, …)
  • geo — NCBI GEO accessions (GSE, GSM, GPL, GDS)
  • bioproject — INSDC BioProject accessions (PRJNA, PRJEB, …)
  • assembly — INSDC genome assembly accessions (GCA_, GCF_, versioned)

Per-type formats, validation rules, and classification order are documented in the How Scholarly Identifiers Are Defined vignette (About identifiers on the package site).

Internal improvements

  • Central identifier registry as the single source of truth for type names, classification order, extraction patterns, and per-type metadata
  • Refactored per-type implementations; exported APIs dispatch via is_<type>, normalize_<type>, extract_<type>
  • Faster classify_scholid() and detect_scholid_type() when resolving types

Documentation

  • Package-level ?scholid help and updated README / DESCRIPTION
  • pkgdown site updated for 20 types

Links

Full changelog: see NEWS.md