otpensource-vision LoRA

๋ชจ๋ธ ์„ค๋ช…

otpensource-vision LoRA๋Š” otpensource-vision ๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜์œผ๋กœ LoRA (Low-Rank Adaptation) ๊ธฐ๋ฒ•์„ ํ™œ์šฉํ•˜์—ฌ ํ•™์Šต๋œ ๊ฒฝ๋Ÿ‰ Vision-Language ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค. ๊ธฐ์กด ๋ชจ๋ธ ๋Œ€๋น„ ์ ์€ ์—ฐ์‚ฐ๋Ÿ‰์œผ๋กœ ํŠน์ • ๋„๋ฉ”์ธ์— ์ตœ์ ํ™”๋œ ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ํ•œ๊ตญ์–ด์™€ ์˜์–ด๋ฅผ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ํŠน์ง•

  • LoRA ๊ธฐ๋ฐ˜ ๊ฒฝ๋Ÿ‰ ์–ด๋Œ‘ํ„ฐ: ๊ธฐ์กด ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋ฉด์„œ๋„ ์ ์€ ์ž์›์œผ๋กœ ์ถ”๊ฐ€ ํ•™์Šต์ด ๊ฐ€๋Šฅ
  • Vision-Language ํƒœ์Šคํฌ ์ง€์›: ์ด๋ฏธ์ง€๋ฅผ ์ž…๋ ฅ๋ฐ›์•„ ํ…์ŠคํŠธ ์ •๋ณด๋ฅผ ์ƒ์„ฑํ•˜๊ณ , ํ…์ŠคํŠธ ์ž…๋ ฅ๋งŒ์œผ๋กœ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ์ˆ˜ํ–‰
  • ํŒจ์…˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ํ•™์Šต: otpensource_data๋ฅผ ํ™œ์šฉํ•ด ํŒจ์…˜ ์นดํ…Œ๊ณ ๋ฆฌ, ์ƒ‰์ƒ, ๊ณ„์ ˆ ๋“ฑ์˜ ์ •๋ณด๋ฅผ ๋ถ„์„ํ•˜๋Š” ๋ฐ ์ตœ์ ํ™”
  • ๋น ๋ฅธ ์ ์šฉ ๋ฐ ํ™•์žฅ์„ฑ: ๊ธฐ์กด ๋ชจ๋ธ์„ ๋ฏธ์„ธ ์กฐ์ •(Fine-tuning)ํ•  ๋•Œ LoRA ์–ด๋Œ‘ํ„ฐ๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๋น ๋ฅด๊ฒŒ ์ ์šฉ ๊ฐ€๋Šฅ

๋ชจ๋ธ ์„ธ๋ถ€์‚ฌํ•ญ

ํ•™์Šต ๋ฐ์ดํ„ฐ

๋ชจ๋ธ ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๋ฐ์ดํ„ฐ์…‹:

  • otpensource_dataset:
    • ์•ฝ 9000๊ฐœ์˜ ํŒจ์…˜ ๋ฐ์ดํ„ฐ๋กœ ๊ตฌ์„ฑ
    • ์˜ท์˜ ์นดํ…Œ๊ณ ๋ฆฌ, ์ƒ‰์ƒ, ๊ณ„์ ˆ, ํŠน์ง•, ์ด๋ฏธ์ง€ URL ๋“ฑ์„ ํฌํ•จํ•˜์—ฌ Vision-Language ํ•™์Šต์— ์ตœ์ ํ™”

ํ•™์Šต ๋ฐฉ์‹

  • ๊ธฐ๋ฐ˜ ๋ชจ๋ธ: Bllossom/llama-3.2-Korean-Bllossom-AICA-5B
  • ์ตœ์ ํ™” ๊ธฐ๋ฒ•: LoRA ์ ์šฉ
  • GPU ์š”๊ตฌ์‚ฌํ•ญ: A100 40GB ์ด์ƒ ๊ถŒ์žฅ
  • ํ›ˆ๋ จ ํšจ์œจ์„ฑ: LoRA๋ฅผ ํ™œ์šฉํ•˜์—ฌ ๊ธฐ์กด ๋ชจ๋ธ ๋Œ€๋น„ 2๋ฐฐ ๋น ๋ฅธ ํ•™์Šต ์ˆ˜ํ–‰

์ฃผ์š” ์‚ฌ์šฉ ์‚ฌ๋ก€

Vision-Language ํƒœ์Šคํฌ

  1. ์ด๋ฏธ์ง€ ๋ถ„์„ ๋ฐ ์„ค๋ช…

    • ์ž…๋ ฅ๋œ ์ด๋ฏธ์ง€์—์„œ ์˜ท์˜ ์นดํ…Œ๊ณ ๋ฆฌ, ์ƒ‰์ƒ, ๊ณ„์ ˆ, ํŠน์ง•์„ ์ถ”์ถœํ•˜์—ฌ JSON ํ˜•์‹์œผ๋กœ ๋ฐ˜ํ™˜.
    • ์˜ˆ์‹œ:
      {
        "category": "ํŠธ๋ Œ์น˜์ฝ”ํŠธ",
        "gender": "์—ฌ",
        "season": "SS",
        "color": "๋„ค์ด๋น„",
        "material": "",
        "feature": "ํŠธ๋ Œ์น˜์ฝ”ํŠธ"
      }
      
  2. ํ…์ŠคํŠธ ๋ถ„์„ ๋ฐ ๋ถ„๋ฅ˜

    • ํ…์ŠคํŠธ ์ž…๋ ฅ๋งŒ์œผ๋กœ ๊ฐ์ • ๋ถ„์„, ์งˆ๋ฌธ ์‘๋‹ต, ํ…์ŠคํŠธ ์š”์•ฝ ๋“ฑ์˜ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ํƒœ์Šคํฌ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅ.

์ฝ”๋“œ ์˜ˆ์‹œ

Vision-Language ํƒœ์Šคํฌ

from transformers import MllamaForConditionalGeneration, MllamaProcessor
import torch
from PIL import Image
import requests

model = MllamaForConditionalGeneration.from_pretrained(
  'otpensource-vision-lora',
  torch_dtype=torch.bfloat16,
  device_map='auto'
)
processor = MllamaProcessor.from_pretrained('otpensource-vision-lora')

url = "https://image.msscdn.net/thumbnails/images/prd_img/20240710/4242307/detail_4242307_17205916382801_big.jpg?w=1200"
image = Image.open(requests.get(url, stream=True).raw)

messages = [
  {'role': 'user', 'content': [
    {'type': 'image', 'image': image},
    {'type': 'text', 'text': '์ด ์˜ท์˜ ์ •๋ณด๋ฅผ JSON์œผ๋กœ ์•Œ๋ ค์ค˜.'}
  ]}
]

input_text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = processor(
    image=image,
    text=input_text,
    add_special_tokens=False,
    return_tensors="pt",
).to(model.device)

output = model.generate(**inputs, max_new_tokens=256, temperature=0.1)
print(processor.decode(output[0]))

์—…๋กœ๋“œ๋œ ๋ชจ๋ธ ์ •๋ณด

  • ๊ฐœ๋ฐœ์ž: hateslopacademy
  • ๋ผ์ด์„ ์Šค: CC-BY-4.0
  • LoRA ํ•™์Šต ๋ชจ๋ธ: otpensource-vision ๊ธฐ๋ฐ˜

์ด ๋ชจ๋ธ์€ Unsloth ๋ฐ Hugging Face TRL ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ํ™œ์šฉํ•ด ๊ธฐ์กด ๋ชจ๋ธ ๋Œ€๋น„ 2๋ฐฐ ๋น ๋ฅด๊ฒŒ ํ•™์Šต๋˜์—ˆ์Šต๋‹ˆ๋‹ค.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for hateslopacademy/otpensource-vision-lora

Adapter
(1)
this model