Update README.md
Browse files
README.md
CHANGED
@@ -78,6 +78,22 @@ from typhoon_ocr import ocr_document
|
|
78 |
markdown = ocr_document("test.png")
|
79 |
print(markdown)
|
80 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
81 |
**Run Manually**
|
82 |
|
83 |
Below is a partial snippet. You can run inference using either the API or a local model.
|
@@ -149,7 +165,8 @@ response = openai.chat.completions.create(
|
|
149 |
text_output = response.choices[0].message.content
|
150 |
print(text_output)
|
151 |
```
|
152 |
-
|
|
|
153 |
```python
|
154 |
# Initialize the model
|
155 |
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("scb10x/typhoon-ocr-7b", torch_dtype=torch.bfloat16 ).eval()
|
@@ -191,7 +208,7 @@ print(text_output[0])
|
|
191 |
|
192 |
This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
|
193 |
|
194 |
-
```
|
195 |
PROMPTS_SYS = {
|
196 |
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
197 |
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
@@ -212,16 +229,23 @@ PROMPTS_SYS = {
|
|
212 |
### Generation Parameters
|
213 |
|
214 |
We suggest using the following generation parameters. Since this is an OCR model, we do not recommend using a high temperature. Make sure the temperature is set to 0 or 0.1, not higher.
|
215 |
-
```
|
216 |
temperature=0.1,
|
217 |
top_p=0.6,
|
218 |
repetition_penalty: 1.2
|
219 |
```
|
220 |
|
221 |
## Hosting
|
222 |
-
|
|
|
|
|
223 |
vllm serve scb10x/typhoon-ocr-7b --max-model-len 32000 # OpenAI Compatible at http://localhost:8000
|
224 |
-
# then you can supply base_url in to ocr_document
|
|
|
|
|
|
|
|
|
|
|
225 |
```
|
226 |
|
227 |
## **Intended Uses & Limitations**
|
|
|
78 |
markdown = ocr_document("test.png")
|
79 |
print(markdown)
|
80 |
```
|
81 |
+
|
82 |
+
**(Recommended): Local Model via vllm (GPU Required)**:
|
83 |
+
|
84 |
+
```bash
|
85 |
+
pip install vllm
|
86 |
+
vllm serve scb10x/typhoon-ocr-7b --max-model-len 32000 --served-model-name typhoon-ocr-preview # OpenAI Compatible at http://localhost:8000 (or other port)
|
87 |
+
# then you can supply base_url in to ocr_document
|
88 |
+
```
|
89 |
+
|
90 |
+
```python
|
91 |
+
from typhoon_ocr import ocr_document
|
92 |
+
markdown = ocr_document('image.png', base_url='http://localhost:8000/v1', api_key='anything-is-ok')
|
93 |
+
print(markdown)
|
94 |
+
```
|
95 |
+
To read more about [vllm](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
|
96 |
+
|
97 |
**Run Manually**
|
98 |
|
99 |
Below is a partial snippet. You can run inference using either the API or a local model.
|
|
|
165 |
text_output = response.choices[0].message.content
|
166 |
print(text_output)
|
167 |
```
|
168 |
+
|
169 |
+
*(Not Recommended): Local Model - Transformers (GPU Required)*:
|
170 |
```python
|
171 |
# Initialize the model
|
172 |
model = Qwen2_5_VLForConditionalGeneration.from_pretrained("scb10x/typhoon-ocr-7b", torch_dtype=torch.bfloat16 ).eval()
|
|
|
208 |
|
209 |
This model only works with the specific prompts defined below, where `{base_text}` refers to information extracted from the PDF metadata using the `get_anchor_text` function from the `typhoon-ocr` package. It will not function correctly with any other prompts.
|
210 |
|
211 |
+
```python
|
212 |
PROMPTS_SYS = {
|
213 |
"default": lambda base_text: (f"Below is an image of a document page along with its dimensions. "
|
214 |
f"Simply return the markdown representation of this document, presenting tables in markdown format as they naturally appear.\n"
|
|
|
229 |
### Generation Parameters
|
230 |
|
231 |
We suggest using the following generation parameters. Since this is an OCR model, we do not recommend using a high temperature. Make sure the temperature is set to 0 or 0.1, not higher.
|
232 |
+
```python
|
233 |
temperature=0.1,
|
234 |
top_p=0.6,
|
235 |
repetition_penalty: 1.2
|
236 |
```
|
237 |
|
238 |
## Hosting
|
239 |
+
|
240 |
+
We recommend to inference typhoon-ocr using [vllm](https://github.com/vllm-project/vllm) instead of huggingface transformers, and using typhoon-ocr library to ocr documents. To read more about [vllm](https://docs.vllm.ai/en/latest/getting_started/quickstart.html)
|
241 |
+
```bash
|
242 |
vllm serve scb10x/typhoon-ocr-7b --max-model-len 32000 # OpenAI Compatible at http://localhost:8000
|
243 |
+
# then you can supply base_url in to ocr_document
|
244 |
+
```
|
245 |
+
|
246 |
+
```python
|
247 |
+
from typhoon_ocr import ocr_document
|
248 |
+
ocr_document('image.jpg', base_url='http://localhost:8000/v1')
|
249 |
```
|
250 |
|
251 |
## **Intended Uses & Limitations**
|