1. Description
SPARK-Report is a specialized report writing model developed by the Korea Institute of S&T Evaluation and Planning (KISTEP). The model is trained to generate reports in a two-step process: first creating a table of contents, then generating the main content.
2. Key Features
- Content Structure: Generates report outlines based on title, keywords, and desired length
- Writing Styles: Capable of producing reports in both descriptive and bullet-point formats
- Structured Output: Delivers well-formatted content for enhanced readability
- Base Model: Built on Mistral-nemo as the foundation model
- Training Method: Trained with Supervised Fine-Tuning (SFT)
- Context Length: The maximum context length for training data is 16,384
3. Data
source | KISTEP Document |
---|---|
count | 31,058 |
4. Usage
- When using ollama, you can utilize the Modelfile.
- Python code
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "kistepAI/SPARK-Report"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
messages = [
{"role": "user", "content": "์๋
ํ์ธ์."}
]
input_ids = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
terminators = [
tokenizer.eos_token_id,
tokenizer.convert_tokens_to_ids("</s>")
]
outputs = model.generate(
input_ids,
max_new_tokens=512,
eos_token_id=terminators,
do_sample=True,
temperature=0.3,
top_p=0.95,
)
print(tokenizer.decode(outputs[0][input_ids.shape[-1]:], skip_special_tokens=True))
- Recommended Prompt Template (Table of Contents)
(input: {TITLE}, {KEYWORDS}, {LEGNTH})
propmt_template: |
๋น์ ์ ๋ณด๊ณ ์ ๋ชฉ์ฐจ ์์ฑ ์ ๋ฌธ๊ฐ์
๋๋ค. ์ฃผ์ด์ง ๋ณด๊ณ ์ ์ ๋ชฉ๊ณผ ํค์๋๋ฅผ ๋ฐํ์ผ๋ก ์ฒด๊ณ์ ์ธ ๋ชฉ์ฐจ๋ฅผ ์์ฑํด ์ฃผ์ธ์.
# ์
๋ ฅ์ ๋ณด
- ์ ๋ชฉ: {TITLE}
- ํค์๋: {KEYWORDS}
- ๋ถ๋: {LENGTH}
# ๋ชฉ์ฐจ ์์ฑ ์ง์นจ
- ๊ธฐ๋ณธ๊ตฌ์กฐ(head1)๋ ์๋ก , ๋ณธ๋ก , ๊ฒฐ๋ก ์ผ๋ก ๊ตฌ์ฑ
- ๊ฐ head1 ํญ๋ชฉ ๋ณ๋ก 3-4๊ฐ์ ์์ธ ๋ชฉ์ฐจ(head2) ์์ฑ
- ํค์๋๋ฅผ ์ ๊ทน ํ์ฉํ์ฌ ํน์ง์ ์ธ ํํ ์ฌ์ฉ
- ์๊ฐ ํํ์ด๋ ํน์๋ฌธ์ ์ฌ์ฉ ์ง์
- ์ ๋ชฉ ์ฐ๊ด์ฑ๊ณผ ์ ์ฒด ์ผ๊ด์ฑ ์ ์ง
- ๊ฐ์กฐ์ ๋ฌธ์ฒด์ ๋ช
์ฌํ ์ข
๊ฒฐ์ด๋ฏธ ์ฌ์ฉ
- Recommended Prompt Template (Main Content)
(input: {TITLE}, {SECTIONS}, {SECTION}, {TYPE}, {DOCUMENTS})
propmt_template: |
๋น์ ์ ๋ณด๊ณ ์ ์์ฑ ์ ๋ฌธ๊ฐ์
๋๋ค. ๋ณด๊ณ ์ ์ ๋ชฉ์ {TITLE}์ด๋ฉฐ, ์ ์ฒด ๋ชฉ์ฐจ๋ ์๋์ ๊ฐ์ต๋๋ค:
{SECTIONS}
ํ์ฌ ์์ฑํ ์น์
์ {SECTION}์
๋๋ค. ์์ฑ ์คํ์ผ์ {TYPE}์ ์ฌ์ฉํฉ๋๋ค.
์ฃผ์ ์ง์นจ:
1. ๋ต๋ณ์ ๋ฐ๋์ ์๋์ ์ ๋ณด๋ง์ ์ฌ์ฉํ์ฌ ์์ฑํด์ผ ํฉ๋๋ค.
{DOCUMENTS}
1. ๋ณธ๋ฌธ ์์ฑ ์ ์ <reason> ํ๊ทธ ์์ ์๋ ๋ด์ฉ์ ํฌํจํ์ฌ ์ถ๋ก ๊ณผ์ ์ ์ต๋ 10๋ฌธ์ฅ ์ด๋ด๋ก ์ค๋ช
ํ์ธ์:
- ๋ณธ๋ฌธ ์์ฑ์ ์ฌ์ฉํ ์ฒญํฌ ์ธ๋ฑ์ค ํ๊ธฐ
- ๊ฐ ์ฒญํฌ์ ๊ด๋ จ ์ ๋ณด ์ค๋ช
- ํ์์ ๋ถ๊ฐ์ ์ธ ์ ์ ์ฌํญ (๋จ, ์ฌ์คํ์ธ ํ์์ฑ ์ธ๊ธ ํ์)
2. ๋ค์ ํ์์ผ๋ก ๋ต๋ณ์ ์์ฑํ์ธ์:
- ์์ฑ ๊ฐ๋ฅํ ๊ฒฝ์ฐ: <reason>์ถ๋ก ๊ณผ์ </reason> <answer>๋ณธ๋ฌธ ๋ด์ฉ</answer>
- ์์ฑ ๋ถ๊ฐ๋ฅํ ๊ฒฝ์ฐ: <reason>๊ด๋ จ ๋ด์ฉ ๋ถ์ฌ</reason> <answer>์ ๊ณต๋ ๋ฌธ์๋ฅผ ๋ฐํ์ผ๋ก ๋ต๋ณํ ์ ์์ต๋๋ค.</answer>
์์ ํ(descriptive) ์์ฑ ๊ท์น:
- ์ต๋ 30๋ฌธ์ฅ ์ด๋ด๋ก ์์ฑ
- ๋ฌธ๋จ์ ๋ด์ฉ์ ๋ฐ๋ผ 1~3๊ฐ๋ก ๊ตฌ์ฑ
- ์ฐ๊ฒฐ์ด ์ฌ์ฉ ์ ํ
- ํค์๋ ๋ฐ๋ณต ์ ํ
๊ฐ์กฐ์(bullet_point) ์์ฑ ๊ท์น:
โก Level 1: ํต์ฌ ๋ด์ฉ
โฆ Level 2: ํ์ ๋ด์ฉ(1~3๊ฐ ํญ๋ชฉ)
- Level 3: ๋ถ์ฐ ์ค๋ช
- ๋ง์นจํ ์๋ต
- ๊ฐ๊ฒฐํ๊ณ ๋ช
ํํ ๋ฌธ์ฅ ๊ตฌ์ฑ
- ์ ๋ฌธ์ ์ธ ์ฉ์ด ์ฌ์ฉ
- ์๋ฏธ ๋จ์๋ก ๋จ๋ฝ ๊ตฌ๋ถ
5. Benchmark
TBD
- Downloads last month
- 0
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The model has no library tag.
Model tree for kistepAI/SPARK-Report
Base model
mistralai/Mistral-Nemo-Base-2407
Finetuned
mistralai/Mistral-Nemo-Instruct-2407