SD Search API

Description

The SD Search API enables search across different datasets.

Supported configurations:

Bigpicture image search

Dependencies

PostgreSQL: database for search metadata
OpenSearch: search indexes build from the search metadata
Snowstorm: SNOMED CT ontology server

OpenSearch

OpenSearch indexes:

Bigpicture: bp-image-index.json

Development

Setup

Install uv, then create the virtualenv and install all dependencies:

uv sync --dev

Formatting and linting

tox -e ruff
tox -e mypy

Unit tests

tox -e pytest

Integration tests

Integration tests require Postgres and OpenSearch to be running. Start them with Docker Compose:

docker compose --env-file tests/integration/.env --profile dev up --build

Then run:

uv run pytest tests/integration/

Environmental variables are defined in tests/integration/.env.

External dependencies

Snowstorm

Snowstorm is a SNOMED CT terminology server used by the SD Search API to resolve SNOMED CT terms to concepts.

A Snowstorm instance is available at https://snowstorm.rahtiapp.fi.
A SNOMED browser instance is available at: https://snomed-browser.rahtiapp.fi/.

Data import

This is only needed when importing a new SNOMED CT release into the shared instance. The full procedure is described in https://github.com/IHTSDO/snowstorm/blob/master/docs/loading-snomed.md.

First check that the Snowstorm service is healthy:

curl https://snowstorm.rahtiapp.fi/actuator/health

Expected output:

{"status":"UP","groups":["liveness","readiness"]}%

Create import job

curl -i --location 'https://snowstorm.rahtiapp.fi/imports' \
  --header 'Content-Type: application/json' \
  --data '{"type":"SNAPSHOT","branchPath":"MAIN","createCodeSystemVersion":true}'

Example output:

HTTP/1.1 201 
location: https://snowstorm.rahtiapp.fi/imports/<ID>

Get the import ID (e.g. f0801e81-3740-48bd-bc3e-848c7aa7468e) from the response location header and define the IMPORT_ID environmental variable:

export IMPORT_ID=<ID>

Import SNOMED release

Upload SNOMED release file (e.g. SnomedCT_InternationalRF2_PRODUCTION_20260601T120000Z.zip):

curl --location -X POST "https://snowstorm.rahtiapp.fi/imports/${IMPORT_ID}/archive" \
  -F "file=@<SNOMED release file>"

The upload and import can take several hours. Poll the import status until status is COMPLETED or until the import job is no longer available:

curl --location "https://snowstorm.rahtiapp.fi/imports/${IMPORT_ID}"

Example output while running:

{
  "status" : "RUNNING",
  "type" : "SNAPSHOT",
  "branchPath" : "MAIN",
  "internalRelease" : false,
  "moduleIds" : [ ],
  "createCodeSystemVersion" : true
}

You can monitor the import progress also from the logs:

oc logs -f deployment/snowstorm

Once finished, verify that the import has been completed.

Check the imported versions:

curl -s https://snowstorm.rahtiapp.fi/codesystems/SNOMEDCT/versions | jq '.items[] | {version, branchPath}'

Example output:

{
  "version": "2026-06-01",
  "branchPath": "MAIN/2026-06-01"
}

Check the MAIN branch:

curl -s https://snowstorm.rahtiapp.fi/branches/MAIN

Example output:

{
  "path" : "MAIN",
  "state" : "UP_TO_DATE",
  "containsContent" : true,
  "locked" : false,
  "creation" : "2026-06-11T05:12:34.688Z",
  "base" : "2026-06-11T05:12:34.688Z",
  "head" : "2026-06-11T05:52:38.457Z",
  "creationTimestamp" : 1781154754688,
  "baseTimestamp" : 1781154754688,
  "headTimestamp" : 1781157158457,
  ...
}

Get number of concepts:

curl -s "https://snowstorm.rahtiapp.fi/MAIN/concepts?limit=1&active=true" | jq '{total}'

Example output:

{
  "total": 532824
}

Get a concept:

curl -s "https://snowstorm.rahtiapp.fi/MAIN/concepts/337915000" | jq '{conceptId, active, fsn: .fsn.term}'

Example output:

{
  "conceptId": "337915000",
  "active": true,
  "fsn": "Homo sapiens (organism)"
}

Data loading

Bigpicture

Load Bigpicture XML data into the database with load.py.

Load a single dataset directory (default):

uv run python scripts/bigpicture/load.py /path/to/dataset/

Load from a parent directory containing multiple dataset subdirectories:

uv run python scripts/bigpicture/load.py /path/to/datasets/ --multi-dir

To also sync to OpenSearch immediately after loading, add --sync:

uv run python scripts/bigpicture/load.py /path/to/datasets/ --sync

LLM search

The experimental Bigpicture LLM search endpoint uses a small local Ollama model. Install and start it before running the API:

brew install ollama
ollama pull qwen2.5:14b
ollama serve

The /ai/query endpoint accepts a query for the LLM search. The LLM translates the query text into Beacon V2 filters and returns structured results.

Example:

curl -X POST "http://localhost:8000/ai/query" \
  -H "Content-Type: application/json" \
  -d '{"query": "images for human females"}'

Performance tests

See tests/performance/README.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SD Search API

Description

Dependencies

OpenSearch

Development

Setup

Formatting and linting

Unit tests

Integration tests

External dependencies

Snowstorm

Data import

Create import job

Import SNOMED release

Data loading

Bigpicture

LLM search

Performance tests

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

SD Search API

Description

Dependencies

OpenSearch

Development

Setup

Formatting and linting

Unit tests

Integration tests

External dependencies

Snowstorm

Data import

Create import job

Import SNOMED release

Data loading

Bigpicture

LLM search

Performance tests