This repository contains the quantized DISC-MedLLM, version of Baichuan-13b-base as the base model.

The weights are converted to GGML format using baichuan13b.cpp (based on llama.cpp)

Model GGML quantize method HDD size
ggml-model-q4_0.bin q4_0 7.55 GB
ggml-model-q4_1.bin q4_1 8.36 GB
ggml-model-q5_0.bin q5_0 9.17 GB
ggml-model-q5_1.bin q5_1 9.97 GB
ggml-model-q8_0.bin q8_0 14 GB

How to inference

  1. Compile baichuan13b, a main executable baichuan13b/build/bin/main and a server baichuan13b/build/bin/server will be generated.

  2. Download the weight in this repository to baichuan13b/build/bin/

  3. For command line interface, the following command is useful. You can also read the doc including other command line parameters

    cd baichuan13b/build/bin/
    ./main -m ggml-model-q4_0.bin --prompt "I feel sick. Nausea and Vomiting."
    
  4. For API interface, the following command is usefule. You can also read the doc about server command line options

    cd baichuan13b/build/bin/
    ./server -m ggml-model-q4_0.bin -c 2048
    
  5. To test API interface, you can use curl:

    curl --request POST \
    --url http://localhost:8080/completion \
    --data '{"prompt": "I feel sick. Nausea and Vomiting.", "n_predict": 512}'
    

Use it in Python

To use it in Python script like cli_demo.py all you need to do is replacing the model.chat() using import requests, POST to localhost:8080 in JSON and decode HTTP return.

import requests
llm_output = requests.post(
  "http://localhost:8080/completion"
).json({
  "prompt": "I feel sick. Nausea and Vomiting.",
  "n_predict": 512
}).json()
print(llm_output)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Dataset used to train npc0/DISC-MedLLM-ggml