File size: 6,964 Bytes
e95475c 0cd4931 e95475c 0cd4931 e95475c c364fd8 9f4d4c7 e95475c 9f4d4c7 c364fd8 e95475c f56441f 308f777 e95475c 51fb783 e95475c 3fc02e7 e95475c bf10cb0 e95475c bf10cb0 e95475c b8a615b e95475c b8a615b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 |
---
license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: feature-extraction
tags:
- structurization
- EHR
- medical
- information extraction
---
# Model Card for GENIE
## Model Details
Model Size: 8B (English)
Max Tokens: 8192
Base model: Llama 3.1 8B (English)
### Model Description
GENIE (Generative Note Information Extraction) is an end-to-end model designed to structure free text from electronic health records (EHRs). It processes EHRs in a single pass, extracting biomedical named entities along with their assertion statuses, body locations, modifiers, values, units, and intended purposes, outputting this information in a structured JSON format. This streamlined approach simplifies traditional natural language processing workflows by replacing all the analysis components with a single model, making the system easier to maintain while leveraging the advanced analytical capabilities of large language models (LLMs). Comparing with general-purpose LLMs, GENIE does not require prompt engineering or few-shot examples. Additionally, it generates all relevant attributes in one pass, significantly reducing both runtime and operational costs.
GENIE is co-developed by the groups of Sheng Yu (https://www.stat.tsinghua.edu.cn/teachers/shengyu/), Tianxi Cai (https://dbmi.hms.harvard.edu/people/tianxi-cai), and Isaac Kohane (https://dbmi.hms.harvard.edu/people/isaac-kohane).
## Usage
```python
from vllm import LLM, SamplingParams
model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1)
#model = LLM(model=path/to/your/local/model, tensor_parallel_size=1)
PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:"
sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
EHR = ['xxxxx1','xxxxx2']
texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
output = model.generate(texts, sampling_params)
res = json.loads(output[0].outputs[0].text)
```
## An example:
Input:
```python
EHR = ["""Unit No:___
Admission Date:___
Discharge Date:___
Date of Birth:___
Sex: F
Service: MEDICINE
Allergies:
Sulfur / Norvasc
Attending:___
Addendum:
See below
Chief Complaint:
abdominal pain
Major Surgical or Invasive Procedure:
none
History of Present Illness:
84 F with PMHx of Renovascular HTN c/b NSTEMI now s/p renal
stents, Gout and h/o Crohn's disease who presented to the ED on
___with RLQ pain for approx 2 days. She denies any
nausea/vomiting/diarrhea or constipation but has not been taking
po well and felt dehydrated."""]
```
Output:
```python
res = [{'phrase': 'allergies',
'semantic_type': 'Disease, Syndrome or Pathologic Function',
'assertion_status': 'title',
'body_location': 'null',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'sulfur',
'semantic_type': 'Chemical or Drug',
'assertion_status': 'conditional',
'body_location': 'not applicable',
'modifier': 'not applicable',
'value': 'null',
'unit': 'units: null',
'purpose': 'null'},
{'phrase': 'norvasc',
'semantic_type': 'Chemical or Drug',
'assertion_status': 'conditional',
'body_location': 'not applicable',
'modifier': 'not applicable',
'value': 'null',
'unit': 'units: null',
'purpose': 'null'},
{'phrase': 'abdominal pain',
'semantic_type': 'Sign, Symptom, or Finding',
'assertion_status': 'present',
'body_location': 'Abdominal',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'surgical or invasive procedure',
'semantic_type': 'Therapeutic or Preventive Procedure',
'assertion_status': 'title',
'body_location': 'null',
'modifier': 'not applicable',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'null'},
{'phrase': 'renovascular hypertension',
'semantic_type': 'Disease, Syndrome or Pathologic Function',
'assertion_status': 'present',
'body_location': 'renal',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'non-st elevation myocardial infarction',
'semantic_type': 'Disease, Syndrome or Pathologic Function',
'assertion_status': 'present',
'body_location': 'null',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'gout',
'semantic_type': 'Disease, Syndrome or Pathologic Function',
'assertion_status': 'present',
'body_location': 'null',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': "crohn ' s disease",
'semantic_type': 'Disease, Syndrome or Pathologic Function',
'assertion_status': 'present',
'body_location': 'not applicable',
'modifier': 'not applicable',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'emergency department',
'semantic_type': 'Therapeutic or Preventive Procedure',
'assertion_status': 'present',
'body_location': 'null',
'modifier': 'not applicable',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'null'},
{'phrase': 'pain',
'semantic_type': 'Sign, Symptom, or Finding',
'assertion_status': 'present',
'body_location': 'right lower quadrant',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'nausea',
'semantic_type': 'Sign, Symptom, or Finding',
'assertion_status': 'absent',
'body_location': 'null',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'vomiting',
'semantic_type': 'Sign, Symptom, or Finding',
'assertion_status': 'absent',
'body_location': 'null',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'diarrhea',
'semantic_type': 'Sign, Symptom, or Finding',
'assertion_status': 'absent',
'body_location': 'null',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'},
{'phrase': 'constipation',
'semantic_type': 'Sign, Symptom, or Finding',
'assertion_status': 'absent',
'body_location': 'null',
'modifier': 'null',
'value': 'not applicable',
'unit': 'not applicable',
'purpose': 'not applicable'}]
```
## Citation
If you find our paper or models helpful, please consider cite:
**BibTeX:**
```
@misc{ying2025geniegenerativenoteinformation,
title={GENIE: Generative Note Information Extraction model for structuring EHR data},
author={Huaiyuan Ying and Hongyi Yuan and Jinsen Lu and Zitian Qu and Yang Zhao and Zhengyun Zhao and Isaac Kohane and Tianxi Cai and Sheng Yu},
year={2025},
eprint={2501.18435},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.18435},
}
``` |