This project is an attempt to implement the task of ICD code tagging with modern machine learning techniques.
Much of this project was inspired by the following papers:
- An Empirical Evaluation of Deep Learning for ICD-9 CodeAssignment using MIMIC-III Clinical Notes
- Tagging Patient Notes With ICD-9 Codes
Additionally, ALL code in the icd9/ folder was directly cloned from this repository. This code is used for navigating the ICD9 code hierarchy via python objects. It was not available from a package manager, so it was incorporated directly to the project.
The scripts/ directory contains python files with code for frontloading tasks that are cumbersome/expensive.
The build.py file acts as the controller for the above tasks, accomodating command-line interaction. This file can also be executed through the Makefile as described in the below section.
The report.ipynb file contains all code for exploratory/evaluation visualizations.
In order to run any of this code, you must first obtain access to MIMIC-III and deploy the dataset to your AWS account.
Run pipenv install to install all dependencies.
You should create a .env file in the project root. AWS credentials should be stored here in the following variables:
- ACCESS_KEY
- SECRET_KEY
- S3_DIR
- REGION_NAME
Prerprocessing and model training can be conducted through the Makefile as follows:
make preprocessto conduct all preprocessing steps.make splitto split the data into train/test datamake trainto train models
Once this is all complete, you should be able to execute all cells in the report.ipynb.