# Social Determinants of Health Prediction A model that predicts Social Determinants of Health. ## Documentation See the [full documentation](https://sunlabuiuc.github.io/sdoh/index.html). The [API reference](https://sunlabuiuc.github.io/sdoh/api.html) is also available. ## Setup 1. Create a new Conda environment: `conda env create -f src/python/environment-lock.yml` 1. Activate it: `activate sdoh` 1. Download the [corpus] 1. Rename the corpus: ```bash cd download mv mimic-iii-clinical-care-database-1.0.1.zip annotation-dataset-of-social-determinants-of-health-from-mimic-iii-clinical-care-database-1.0.1.zip ``` ## Installing Lituiev et al. (2023) SDoH Models Optionally install the Lituiev et al. (2023) SDoH Models. ### spaCy hybrid model 1. Clone the spaCy package: `git clone https://github.com/BCHSI/social-determinants-of-health-clbp` 1. Change working directory: `cd social-determinants-of-health-clbp/model-hybrid-bow/package/en_sdoh_bow-0.0.2` 1. Build the wheel: `pip install wheel ; python setup.py bdist_wheel` 1. Install the wheel: `pip install --no-deps dist/en_sdoh_bow-0.0.2-py3-none-any.whl` 1. Install dependencies: `pip install lemma_tokenizer` 1. Install NLTK's dependencies: `pip install nltk ; python -c "import nltk ; nltk.download('stopwords') ; nltk.download('averaged_perceptron_tagger') ; nltk.download('wordnet')"` ### spaCy CNN Model 1. Clone the model repo: `git clone https://github.com/BCHSI/social-determinants-of-health-clbp` 1. Change working directory: `cd social-determinants-of-health-clbp/model-cnn-ner/packages/en_sdoh_cnn_ner_cui-0.0.0` 1. Build the wheel: `pip install wheel ; python setup.py bdist_wheel` 1. Install the wheel: `( cd dist ; pip install en_sdoh_cnn_ner_cui-0.0.0-py3-none-any.whl )` ### Convert the spaCy model to PyTorch The HuggingFace 2023 model is for an old version of spaCy (3.2). This converts it to a HuggingFace model from a spaCy [source model]. It needs a previous version of pip, so install an old version, install the spaCy model, then restore pip. Then the PyTorch model is converted to `sdoh-roberta-base`. 1. Remember the old version: `OLDVER=$(pip --version | awk '{print $2'})` 1. Compatible pip version for package: `pip install --upgrade pip==23`. 1. Install Git Large File System: `brew install git-lfs` 1. Download the model: `git clone https://huggingface.co/dlituiev/en_sdoh_roberta_cui` 1. Install it: `pip install --no-deps en_sdoh_roberta_cui/en_sdoh_roberta_cui-any-py3-none-any.whl` 1. Install dependencies: `pip install spacy-transformers` 1. The conversion script needs an older HF package: `pip install transformers==4.26` 1. Convert to the PyTorch model: `src/bin/topytorch.py` 1. Revert the pip version: `pip install --upgrade pip==${OLDVER}` 1. Cleanup: `rm -rf en_sdoh_roberta_cui` ## Training 1. Set the path to the configuration file: ```bash export SDOHRC=etc/model.conf ``` 1. All and testing commands are given with the `harness` script. See the command line help: `./harness -h` 1. Run the fewshot LLM tests: ```bash for i in mimic synthetic mimthetic ; do ./harness fewshot $i done ``` 1. Supervise-fine tune the LLM models, then test ```bash for i in mimic synthetic mimthetic ; do ./harness train $i done for i in mimic synthetic mimthetic ; do ./harness test $i done ``` 1. Train and ablation test the traditional deep learning models: ```bash ./harness binaryabl for i in mimic synthetic mimthetic ; do ./harness multilabelabl done ``` ## Changelog An extensive changelog is available [here](CHANGELOG.md). ## Community Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome. ## License [MIT License](LICENSE.md) Copyright (c) 2024 Paul Landes [corpus]: https://physionet.org/content/annotation-dataset-sdoh/1.0.1/ [source model]: https://github.com/BCHSI/social-determinants-of-health-clbp