File size: 6,964 Bytes
e95475c
 
 
 
 
 
 
 
0cd4931
e95475c
 
0cd4931
e95475c
 
 
 
 
 
c364fd8
9f4d4c7
e95475c
9f4d4c7
c364fd8
e95475c
 
 
f56441f
308f777
e95475c
 
 
 
 
 
 
51fb783
 
 
e95475c
 
 
 
 
3fc02e7
e95475c
 
bf10cb0
e95475c
bf10cb0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e95475c
b8a615b
e95475c
 
b8a615b
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
---
license: apache-2.0
language:
- en
base_model:
- meta-llama/Llama-3.1-8B-Instruct
pipeline_tag: feature-extraction
tags:
- structurization
- EHR
- medical
- information extraction
---
# Model Card for GENIE


## Model Details

Model Size: 8B (English)

Max Tokens: 8192

Base model: Llama 3.1 8B (English)

### Model Description

GENIE (Generative Note Information Extraction) is an end-to-end model designed to structure free text from electronic health records (EHRs). It processes EHRs in a single pass, extracting biomedical named entities along with their assertion statuses, body locations, modifiers, values, units, and intended purposes, outputting this information in a structured JSON format. This streamlined approach simplifies traditional natural language processing workflows by replacing all the analysis components with a single model, making the system easier to maintain while leveraging the advanced analytical capabilities of large language models (LLMs). Comparing with general-purpose LLMs, GENIE does not require prompt engineering or few-shot examples. Additionally, it generates all relevant attributes in one pass, significantly reducing both runtime and operational costs.
GENIE is co-developed by the groups of Sheng Yu (https://www.stat.tsinghua.edu.cn/teachers/shengyu/), Tianxi Cai (https://dbmi.hms.harvard.edu/people/tianxi-cai), and Isaac Kohane (https://dbmi.hms.harvard.edu/people/isaac-kohane).


## Usage

```python
from vllm import LLM, SamplingParams

model = LLM(model='THUMedInfo/GENIE_en_8b', tensor_parallel_size=1)
#model = LLM(model=path/to/your/local/model, tensor_parallel_size=1)

PROMPT_TEMPLATE = "Human:\n{query}\n\n Assistant:"
sampling_params = SamplingParams(temperature=temperature, max_tokens=max_new_token)
EHR = ['xxxxx1','xxxxx2']
texts = [PROMPT_TEMPLATE.format(query=k) for k in EHR]
output = model.generate(texts, sampling_params)
res = json.loads(output[0].outputs[0].text)
```

## An example:

Input:
```python
EHR = ["""Unit No:___

Admission Date:___



  Discharge Date:___

Date of Birth:___



 Sex:   F

Service: MEDICINE

Allergies:
Sulfur / Norvasc

Attending:___
Addendum:
See below

Chief Complaint:
abdominal pain

Major Surgical or Invasive Procedure:
none

History of Present Illness:
84 F with PMHx of Renovascular HTN c/b NSTEMI now s/p renal
stents, Gout and h/o Crohn's disease who presented to the ED on
___with RLQ pain for approx 2 days.  She denies any
nausea/vomiting/diarrhea or constipation but has not been taking

po well and felt dehydrated."""]
```

Output:
```python
res = [{'phrase': 'allergies',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'title',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'sulfur',
  'semantic_type': 'Chemical or Drug',
  'assertion_status': 'conditional',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'null',
  'unit': 'units: null',
  'purpose': 'null'},
 {'phrase': 'norvasc',
  'semantic_type': 'Chemical or Drug',
  'assertion_status': 'conditional',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'null',
  'unit': 'units: null',
  'purpose': 'null'},
 {'phrase': 'abdominal pain',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'present',
  'body_location': 'Abdominal',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'surgical or invasive procedure',
  'semantic_type': 'Therapeutic or Preventive Procedure',
  'assertion_status': 'title',
  'body_location': 'null',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'null'},
 {'phrase': 'renovascular hypertension',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'renal',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'non-st elevation myocardial infarction',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'gout',
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': "crohn ' s disease",
  'semantic_type': 'Disease, Syndrome or Pathologic Function',
  'assertion_status': 'present',
  'body_location': 'not applicable',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'emergency department',
  'semantic_type': 'Therapeutic or Preventive Procedure',
  'assertion_status': 'present',
  'body_location': 'null',
  'modifier': 'not applicable',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'null'},
 {'phrase': 'pain',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'present',
  'body_location': 'right lower quadrant',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'nausea',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'vomiting',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'diarrhea',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'},
 {'phrase': 'constipation',
  'semantic_type': 'Sign, Symptom, or Finding',
  'assertion_status': 'absent',
  'body_location': 'null',
  'modifier': 'null',
  'value': 'not applicable',
  'unit': 'not applicable',
  'purpose': 'not applicable'}]

```



## Citation

If you find our paper or models helpful, please consider cite: 

**BibTeX:**
```
@misc{ying2025geniegenerativenoteinformation,
      title={GENIE: Generative Note Information Extraction model for structuring EHR data}, 
      author={Huaiyuan Ying and Hongyi Yuan and Jinsen Lu and Zitian Qu and Yang Zhao and Zhengyun Zhao and Isaac Kohane and Tianxi Cai and Sheng Yu},
      year={2025},
      eprint={2501.18435},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2501.18435}, 
}
```