You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

1. Description

SPARK-Report is a specialized report writing model developed by the Korea Institute of S&T Evaluation and Planning (KISTEP). The model is trained to generate reports in a two-step process: first creating a table of contents, then generating the main content.

2. Key Features

  • Content Structure: Generates report outlines based on title, keywords, and desired length
  • Writing Styles: Capable of producing reports in both descriptive and bullet-point formats
  • Structured Output: Delivers well-formatted content for enhanced readability
  • Base Model: Built on Mistral-nemo as the foundation model
  • Training Method: Trained with Supervised Fine-Tuning (SFT)
  • Context Length: The maximum context length for training data is 16,384

3. Data

source KISTEP Document
count 31,058

4. Usage

  • When using ollama, you can utilize the Modelfile.
  • Python code
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM


model_id = "kistepAI/SPARK-Report"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

model.eval()

messages = [
    {"role": "user", "content": "์•ˆ๋…•ํ•˜์„ธ์š”."}
]

input_ids = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

terminators = [
    tokenizer.eos_token_id,
    tokenizer.convert_tokens_to_ids("</s>")
]

outputs = model.generate(
    input_ids,
    max_new_tokens=512,
    eos_token_id=terminators,
    do_sample=True,
    temperature=0.3,
    top_p=0.95,
)

print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
  • Recommended Prompt Template (Table of Contents) (input: {TITLE}, {KEYWORDS}, {LEGNTH})
propmt_template: |
    ๋‹น์‹ ์€ ๋ณด๊ณ ์„œ ๋ชฉ์ฐจ ์ƒ์„ฑ ์ „๋ฌธ๊ฐ€์ž…๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ๋ณด๊ณ ์„œ ์ œ๋ชฉ๊ณผ ํ‚ค์›Œ๋“œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ฒด๊ณ„์ ์ธ ๋ชฉ์ฐจ๋ฅผ ์ƒ์„ฑํ•ด ์ฃผ์„ธ์š”.
    
    # ์ž…๋ ฅ์ •๋ณด
    - ์ œ๋ชฉ: {TITLE}
    - ํ‚ค์›Œ๋“œ: {KEYWORDS}
    - ๋ถ„๋Ÿ‰: {LENGTH}
    
    # ๋ชฉ์ฐจ ์ž‘์„ฑ ์ง€์นจ
    - ๊ธฐ๋ณธ๊ตฌ์กฐ(head1)๋Š” ์„œ๋ก , ๋ณธ๋ก , ๊ฒฐ๋ก ์œผ๋กœ ๊ตฌ์„ฑ
    - ๊ฐ head1 ํ•ญ๋ชฉ ๋ณ„๋กœ 3-4๊ฐœ์˜ ์ƒ์„ธ ๋ชฉ์ฐจ(head2) ์ƒ์„ฑ
    - ํ‚ค์›Œ๋“œ๋ฅผ ์ ๊ทน ํ™œ์šฉํ•˜์—ฌ ํŠน์ง•์ ์ธ ํ‘œํ˜„ ์‚ฌ์šฉ
    - ์‹œ๊ฐ„ ํ‘œํ˜„์ด๋‚˜ ํŠน์ˆ˜๋ฌธ์ž ์‚ฌ์šฉ ์ง€์–‘
    - ์ œ๋ชฉ ์—ฐ๊ด€์„ฑ๊ณผ ์ „์ฒด ์ผ๊ด€์„ฑ ์œ ์ง€
    - ๊ฐœ์กฐ์‹ ๋ฌธ์ฒด์™€ ๋ช…์‚ฌํ˜• ์ข…๊ฒฐ์–ด๋ฏธ ์‚ฌ์šฉ
  • Recommended Prompt Template (Main Content) (input: {TITLE}, {SECTIONS}, {SECTION}, {TYPE}, {DOCUMENTS})
propmt_template: |
๋‹น์‹ ์€ ๋ณด๊ณ ์„œ ์ž‘์„ฑ ์ „๋ฌธ๊ฐ€์ž…๋‹ˆ๋‹ค. ๋ณด๊ณ ์„œ ์ œ๋ชฉ์€ {TITLE}์ด๋ฉฐ, ์ „์ฒด ๋ชฉ์ฐจ๋Š” ์•„๋ž˜์™€ ๊ฐ™์Šต๋‹ˆ๋‹ค:

{SECTIONS}

ํ˜„์žฌ ์ž‘์„ฑํ•  ์„น์…˜์€ {SECTION}์ž…๋‹ˆ๋‹ค. ์ž‘์„ฑ ์Šคํƒ€์ผ์€ {TYPE}์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

์ฃผ์š” ์ง€์นจ:
1. ๋‹ต๋ณ€์€ ๋ฐ˜๋“œ์‹œ ์•„๋ž˜์˜ ์ •๋ณด๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž‘์„ฑํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
{DOCUMENTS}

1. ๋ณธ๋ฌธ ์ž‘์„ฑ ์ „์— <reason> ํƒœ๊ทธ ์•ˆ์— ์•„๋ž˜ ๋‚ด์šฉ์„ ํฌํ•จํ•˜์—ฌ ์ถ”๋ก  ๊ณผ์ •์„ ์ตœ๋Œ€ 10๋ฌธ์žฅ ์ด๋‚ด๋กœ ์„ค๋ช…ํ•˜์„ธ์š”:
- ๋ณธ๋ฌธ ์ž‘์„ฑ์— ์‚ฌ์šฉํ•œ ์ฒญํฌ ์ธ๋ฑ์Šค ํ‘œ๊ธฐ
- ๊ฐ ์ฒญํฌ์˜ ๊ด€๋ จ ์ •๋ณด ์„ค๋ช…
- ํ•„์š”์‹œ ๋ถ€๊ฐ€์ ์ธ ์ œ์•ˆ ์‚ฌํ•ญ (๋‹จ, ์‚ฌ์‹คํ™•์ธ ํ•„์š”์„ฑ ์–ธ๊ธ‰ ํ•„์ˆ˜)

2. ๋‹ค์Œ ํ˜•์‹์œผ๋กœ ๋‹ต๋ณ€์„ ์ž‘์„ฑํ•˜์„ธ์š”:
- ์ž‘์„ฑ ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ: <reason>์ถ”๋ก  ๊ณผ์ •</reason> <answer>๋ณธ๋ฌธ ๋‚ด์šฉ</answer>
- ์ž‘์„ฑ ๋ถˆ๊ฐ€๋Šฅํ•œ ๊ฒฝ์šฐ: <reason>๊ด€๋ จ ๋‚ด์šฉ ๋ถ€์žฌ</reason> <answer>์ œ๊ณต๋œ ๋ฌธ์„œ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ๋‹ต๋ณ€ํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.</answer>

์„œ์ˆ ํ˜•(descriptive) ์ž‘์„ฑ ๊ทœ์น™:
- ์ตœ๋Œ€ 30๋ฌธ์žฅ ์ด๋‚ด๋กœ ์ž‘์„ฑ
- ๋ฌธ๋‹จ์€ ๋‚ด์šฉ์— ๋”ฐ๋ผ 1~3๊ฐœ๋กœ ๊ตฌ์„ฑ
- ์—ฐ๊ฒฐ์–ด ์‚ฌ์šฉ ์ œํ•œ
- ํ‚ค์›Œ๋“œ ๋ฐ˜๋ณต ์ œํ•œ

๊ฐœ์กฐ์‹(bullet_point) ์ž‘์„ฑ ๊ทœ์น™:
โ–ก Level 1: ํ•ต์‹ฌ ๋‚ด์šฉ
โ—ฆ Level 2: ํ•˜์œ„ ๋‚ด์šฉ(1~3๊ฐœ ํ•ญ๋ชฉ)
- Level 3: ๋ถ€์—ฐ ์„ค๋ช…

- ๋งˆ์นจํ‘œ ์ƒ๋žต
- ๊ฐ„๊ฒฐํ•˜๊ณ  ๋ช…ํ™•ํ•œ ๋ฌธ์žฅ ๊ตฌ์„ฑ
- ์ „๋ฌธ์ ์ธ ์šฉ์–ด ์‚ฌ์šฉ
- ์˜๋ฏธ ๋‹จ์œ„๋กœ ๋‹จ๋ฝ ๊ตฌ๋ถ„

5. Benchmark

TBD

Downloads last month
0
Safetensors
Model size
12.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for kistepAI/SPARK-Report

Finetuned
(64)
this model