The HuggingFace 2023 model is for an old version of spaCy (3.2). This converts
it to a HuggingFace model from a spaCy source model. It needs a previous
version of pip, so install an old version, install the spaCy model, then
restore pip. Then the PyTorch model is converted to sdoh-roberta-base.
Remember the old version: OLDVER=$(pip--version|awk'{print$2'})
Compatible pip version for package: pipinstall--upgradepip==23.
Install Git Large File System: brewinstallgit-lfs
Download the model: gitclonehttps://huggingface.co/dlituiev/en_sdoh_roberta_cui
Social Determinants of Health Prediction¶
A model that predicts Social Determinants of Health.
Documentation¶
See the full documentation. The API reference is also available.
Setup¶
Create a new Conda environment:
conda env create -f src/python/environment-lock.yml
Activate it:
activate sdoh
Download the corpus
Rename the corpus:
Installing Lituiev et al. (2023) SDoH Models¶
Optionally install the Lituiev et al. (2023) SDoH Models.
spaCy hybrid model¶
Clone the spaCy package:
git clone https://github.com/BCHSI/social-determinants-of-health-clbp
Change working directory:
cd social-determinants-of-health-clbp/model-hybrid-bow/package/en_sdoh_bow-0.0.2
Build the wheel:
pip install wheel ; python setup.py bdist_wheel
Install the wheel:
pip install --no-deps dist/en_sdoh_bow-0.0.2-py3-none-any.whl
Install dependencies:
pip install lemma_tokenizer
Install NLTK’s dependencies:
pip install nltk ; python -c "import nltk ; nltk.download('stopwords') ; nltk.download('averaged_perceptron_tagger') ; nltk.download('wordnet')"
spaCy CNN Model¶
Clone the model repo:
git clone https://github.com/BCHSI/social-determinants-of-health-clbp
Change working directory:
cd social-determinants-of-health-clbp/model-cnn-ner/packages/en_sdoh_cnn_ner_cui-0.0.0
Build the wheel:
pip install wheel ; python setup.py bdist_wheel
Install the wheel:
( cd dist ; pip install en_sdoh_cnn_ner_cui-0.0.0-py3-none-any.whl )
Convert the spaCy model to PyTorch¶
The HuggingFace 2023 model is for an old version of spaCy (3.2). This converts it to a HuggingFace model from a spaCy source model. It needs a previous version of pip, so install an old version, install the spaCy model, then restore pip. Then the PyTorch model is converted to
sdoh-roberta-base
.Remember the old version:
OLDVER=$(pip --version | awk '{print $2'})
Compatible pip version for package:
pip install --upgrade pip==23
.Install Git Large File System:
brew install git-lfs
Download the model:
git clone https://huggingface.co/dlituiev/en_sdoh_roberta_cui
Install it:
pip install --no-deps en_sdoh_roberta_cui/en_sdoh_roberta_cui-any-py3-none-any.whl
Install dependencies:
pip install spacy-transformers
The conversion script needs an older HF package:
pip install transformers==4.26
Convert to the PyTorch model:
src/bin/topytorch.py
Revert the pip version:
pip install --upgrade pip==${OLDVER}
Cleanup:
rm -rf en_sdoh_roberta_cui
Training¶
Set the path to the configuration file:
All and testing commands are given with the
harness
script. See the command line help:./harness -h
Run the fewshot LLM tests:
Supervise-fine tune the LLM models, then test
Train and ablation test the traditional deep learning models:
Changelog¶
An extensive changelog is available here.
Community¶
Please star this repository and let me know how and where you use this API. Contributions as pull requests, feedback and any input is welcome.
License¶
MIT License
Copyright (c) 2024 Paul Landes