Health Technology Assessment (HTA) documents published by HTA organisations assess the clinical effectiveness and cost-effectiveness of new drugs, and provide guidance and recommendations for use by policymakers, healthcare providers, and insurance companies. This project aims to extract data points from HTA documents published by HTA bodies in the EU. The goal is to create an Open Science database that can be used by other researchers and decision makers.
- Open an Anthropic AI account and generate an Anthropic AI API key.
- Install the requirements by typing the following on the terminal:
pip install -r requirements.txt
-
Create input data file: Put the HTA documents in the data directory.
-
Set Anthropic AI API key: Set the Anthropic AI API key as an environment variable.
- On Linux, type in the terminal:
export ANTHROPIC_API_KEY=<your Anthropic AI API key>- On Windows, type in the command prompt:
setx ANTHROPIC_API_KEY <your Anthropic AI API key> -
Create configuration files: In the
configdirectory, specify the user-defined variables in theconfig.yamlfile, the instructions part of the prompt inprompt_template.yaml, and the output JSON schema inschema.json. -
Run program: Run the program from the terminal:
python run.py
We want to extract the following attributes from each HTA document:
| Data point | Explanation |
|---|---|
| HTA ID | Name of HTA organisation performing the assessment |
| Treatment type | Type of treatment (medicine, device, therapy) |
| Assessment type | Is this the first assessment, a reassessment, or an indication broadening? |
| Internal identifier | Code or label identifying the document |
| INN | International non-proprietary name of assessed drug |
| Brand name | Brand name of assessed drug |
| Assessment date | When was the assessment finalised? |
| Indication | Medical condition for which the drug is assessed |
| Final recommendation | What is the final recommendation for this drug-indication combination? |
| Comparator | Drug with which the performance of the assessed drug is compared |
| Relative effectiveness assessment outcome | Outcome of the relative effectiveness assessment for this drug-indication combination |
| Cost-effectiveness assessment outcome | Outcome of the cost-effectiveness assessment for this drug-indication combination |
| Managed entry agreements | Was any OECD-defined managed entry agreement proposed? |
| Clinical restrictions | Were any clinical restrictions stated in the recommendation? |
Due to the complexity of the data to be extracted (presence of multiple drugs, health indications, etc.), we need a nested structure to represent the output data. We use this JSON schema, which can also be represented as the following tree structure:
schema {}
├── hta_id
├── treatment_type
├── assessment_type
├── assessment_date
├── internal_identifier
└── indications
└── indication_name
└── technologies
├── inn
├── brand_name
├── comparators
├── outcome_rea
├── outcome_cea
├── final_recommendation
├── managed_entry_agreement
└── clinical_restrictions
The input is a set of HTA documents.
If evaluation is desired, a corresponding ground truth in JSON format, following the same schema above, is also provided.
The output document is a list of JSON objects, with each object corresponding to one HTA document, and containing the extracted attributes of interest. Example: Using the generative AI model claude-3-opus-20240229 from Anthropic AI, this is the JSON object corresponding to the document Adefovir dipivoxil and peginterferon alfa-2a for the treatment of chronic hepatitis B (this is an HTA document published by the United Kingdom's National Institute of Health and Care Excellence (NICE)):
{
"hta_id": "NICE (UK)",
"treatment_type": "medicine",
"assessment_type": "initial assessment",
"assessment_date": "2006-02-22",
"internal_identifier": "TA96",
"indications": [
{
"indication_name": "Chronic hepatitis B (HBeAg-positive or HBeAg-negative) in adults with compensated liver disease and evidence of viral replication, increased ALT and histologically verified liver inflammation and/or fibrosis",
"technologies": [
{
"inn": "peginterferon alfa-2a",
"brand_name": "Pegasys",
"comparators": "interferon alfa-2a",
"outcome_rea": "equal",
"outcome_cea": "positive",
"final_recommendation": "positive",
"managed_entry_agreement": null,
"clinical_restrictions": null
}
]
},
{
"indication_name": "Chronic hepatitis B (HBeAg-positive or HBeAg-negative) in adults with compensated liver disease and evidence of active viral replication, persistently elevated serum ALT levels and histological evidence of active liver inflammation and fibrosis, or decompensated liver disease",
"technologies": [
{
"inn": "adefovir dipivoxil",
"brand_name": "Hepsera",
"comparators": "lamivudine, best supportive care",
"outcome_rea": "positive",
"outcome_cea": "positive",
"final_recommendation": "positive",
"managed_entry_agreement": null,
"clinical_restrictions": "Adefovir dipivoxil is recommended as an option for the treatment of chronic hepatitis B for patients in whom prolonged oral antiviral treatment is required, only after the use of an interferon unless this is contraindicated. The decision to use adefovir dipivoxil (alone or in combination with lamivudine) should take into account various factors including HBeAg status, stage of disease process (for example the presence of compensated or decompensated cirrhosis) and the presence of, or likelihood of the emergence of, virus resistance."
}
]
}
],
"filename": "ta96.pdf"
}
If evaluation is performed, then a file containing the performance metrics (precision, recall, accuracy, F1 score) per attribute, as well as the overall performance metrics, is produced. A file containing detailed comparisons between the extracted attributes and the ground truth is also produced.
This project is licensed under the terms of the MIT License.
Date: September 2023 -
Researchers:
- Jan-Willem Versteeg (j.versteeg@uu.nl)
- Lourens Bloem (l.t.bloem@uu.nl)
- Marie L. de Bruin (m.l.debruin@uu.nl)
Research Engineers:
- Modhurita Mitra (m.mitra@uu.nl)
- Maarten Schermer (m.d.schermer@uu.nl)
- Shiva Nadi Najafabadi (s.nadinajafabadi@uu.nl)