Introduction

This repo contains Physician-Ko-8B, a medical language model with 8 billion parameters. This model builds upon the foundation of LLaMA-3-physician-8b-instruct model fine-tuned with a Korean dataset.

Datasets

Approach 1

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "eded0902/Physician-Ko-8B"
tokenizer_name = "YiDuo1999/Llama-3-Physician-8B-Instruct"
device_map = 'auto'

model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True,use_cache=False,device_map=device_map)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)

tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
eos_token_id = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>"), tokenizer.convert_tokens_to_ids("<|im_end|>")]
tokenizer.pad_token = tokenizer.eos_token

def askme(question):
    sys_message = ''' 
    You are an AI Medical Assistant trained on a vast dataset of health information. Please be thorough and
    provide an informative answer. If you don't know the answer to a specific medical inquiry, advise seeking professional help.
    '''   
    # Create messages structured for the chat template
    messages = [{"role": "system", "content": sys_message}, {"role": "user", "content": question}]
    
    # Applying chat template
    prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
    outputs = model.generate(**inputs, max_new_tokens=1000, use_cache=True)
    
    # Extract and return the generated text, removing the prompt
    response_text = tokenizer.batch_decode(outputs)[0].strip()
    answer = response_text.split('<|im_start|>assistant')[-1].split('<|im_end|>')[0].strip()
    return answer

# Example usage
# - Context: First describe your problem.
# - Question: Then make the question.
question = '''HIV๊ฐ€ ๋ญ์•ผ?'''
print(askme(question))

the type of answer is:

'HIV๋Š” Human Immunodeficiency Virus์˜ ์•ฝ์ž๋กœ, ์ธ์ฒด ๋ฉด์—ญ๊ฒฐํ• ๋ฐ”์ด๋Ÿฌ์Šค๋ผ๊ณ ๋„ ๋ถˆ๋ฆฝ๋‹ˆ๋‹ค. ์ด ๋ฐ”์ด๋Ÿฌ์Šค๋Š” ์ธ๊ฐ„์˜ ๋ฉด์—ญ ์ฒด๊ณ„๋ฅผ ์•ฝํ™”์‹œํ‚ค๋Š” ๋ฐ”์ด๋Ÿฌ์Šค๋กœ, ์ธ์ฒด์˜ ๋ฉด์—ญ ์„ธํฌ๋ฅผ ๊ณต๊ฒฉํ•˜์—ฌ ๋ฉด์—ญ๋ ฅ์„ ๊ฐ์†Œ์‹œํ‚ต๋‹ˆ๋‹ค. HIV์— ๊ฐ์—ผ๋˜๋ฉด ์ธ์ฒด์˜ ๋ฉด์—ญ ์ฒด๊ณ„๊ฐ€ ์•ฝํ•ด์ ธ ๋‹ค์–‘ํ•œ ๊ฐ์—ผ์„ฑ ์งˆํ™˜๊ณผ ์ข…์–‘์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. HIV ๊ฐ์—ผ์„ ์˜ˆ๋ฐฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•ˆ์ „ํ•œ ์„ฑ๊ด€๊ณ„ ์œ ์ง€, ํ˜ˆ์•ก ๋ฐ ํ˜ˆ์•ก ์ œ์ œ์˜ ๊ณต์œ ๋ฅผ ํ”ผํ•˜๋Š” ๋“ฑ์˜ ์˜ˆ๋ฐฉ ์กฐ์น˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

Approach 2

Using langchain

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from langchain.llms.huggingface_pipeline import HuggingFacePipeline
from langchain_core.prompts import PromptTemplate

model_name = "eded0902/Physician-Ko-8B"
tokenizer_name = "YiDuo1999/Llama-3-Physician-8B-Instruct"
device_map = 'auto'

model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True,use_cache=False,device_map=device_map)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_name, trust_remote_code=True)

tokenizer.chat_template = "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"
eos_token_id = [tokenizer.eos_token_id, tokenizer.convert_tokens_to_ids("<|eot_id|>"), tokenizer.convert_tokens_to_ids("<|im_end|>")]
tokenizer.pad_token = tokenizer.eos_token

pipe = pipeline("text-generation", 
               model=model, 
               tokenizer=tokenizer, 
               max_new_tokens=512
               )
hf = HuggingFacePipeline(pipeline=pipe)

sys_message = """ You are an AI Medical Assistant trained on a vast dataset of health information. Please be thorough and
    provide an informative answer. If you don't know the answer to a specific medical inquiry, advise seeking professional help.
    """
question = "HIV๊ฐ€ ๋ญ์•ผ?"
# Create messages structured for the chat template
messages = [{"role": "system", "content": sys_message}, {"role": "user", "content": question}]

# Applying chat template
template = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
prompt = PromptTemplate.from_template(template)

chain = prompt | hf

print(chain.invoke({"question": question})[len(template):].split('<|im_end|>')[0].strip())

the type of answer is:

HIV๋Š” ์ธ๊ฐ„ ๋ฉด์—ญ ๊ฒฐํ• ๋ฐ”์ด๋Ÿฌ์Šค(Human Immunodeficiency Virus, HIV)์˜ ์•ฝ์ž์ž…๋‹ˆ๋‹ค. ์ด ๋ฐ”์ด๋Ÿฌ์Šค๋Š” ์ธ์ฒด์˜ ๋ฉด์—ญ ์ฒด๊ณ„๋ฅผ ์•ฝํ™”์‹œ์ผœ ๊ฐ์—ผ์„ ์ผ์œผํ‚ค๋Š” ๋ฐ”์ด๋Ÿฌ์Šค์ž…๋‹ˆ๋‹ค. HIV๋Š” ์ฃผ๋กœ ์„ฑ์  ์ ‘์ด‰, ํ˜ˆ์•ก ์ „ํŒŒ, ํƒœ์•„ ๊ฐ์—ผ ๋“ฑ์„ ํ†ตํ•ด ์ „ํŒŒ๋ฉ๋‹ˆ๋‹ค. HIV์— ๊ฐ์—ผ๋˜๋ฉด ๋ฉด์—ญ ์„ธํฌ๋“ค์ด ํŒŒ๊ดด๋˜์–ด ๋‹ค์–‘ํ•œ ๊ฐ์—ผ์„ฑ ์งˆํ™˜๊ณผ ์ข…์–‘์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. HIV ๊ฐ์—ผ์„ ์˜ˆ๋ฐฉํ•˜๊ธฐ ์œ„ํ•ด์„œ๋Š” ์•ˆ์ „ํ•œ ์„ฑํ–‰์œ„์™€ ํ˜ˆ์•ก ๋ฐ ํ˜ˆ์•ก ์ œํ’ˆ์˜ ์•ˆ์ „ํ•œ ์‚ฌ์šฉ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ HIV ๊ฐ์—ผ ์—ฌ๋ถ€๋ฅผ ํ™•์ธํ•˜๊ธฐ ์œ„ํ•ด ์ •๊ธฐ์ ์ธ ๊ฒ€์‚ฌ๋ฅผ ๋ฐ›๋Š” ๊ฒƒ์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
Downloads last month
17
Safetensors
Model size
8.03B params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for eded0902/Physician-Ko-8B

Finetuned
(1)
this model

Datasets used to train eded0902/Physician-Ko-8B