You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

1. Description

SPARK-Summarization is a large language model developed by the Korea Institute of S&T Evaluation and Planning (KISTEP). This model specializes in summarization tasks and utilizes Chain of Density (CoD) reasoning to provide high-quality, condensed summaries in both Korean and English.

2. Key Features

  • Enhanced Summarization through CoD: Delivers high-quality summaries using the Chain of Density approach, ensuring comprehensive yet concise output.
  • Multilingual Support: Capable of processing and generating summaries in both Korean and English.
  • Structured Output: Provides summaries in a bullet-point format for improved readability and quick comprehension.
  • Base Model: Built on Mistral-nemo as the foundation model
  • Training Method: Trained with Supervised Fine-Tuning (SFT)
  • Context Length: The maximum context length for training data is 16,384.

3. Data

source KISTEP Documents
count 24,417

4. Usage

  • When using ollama, you can utilize the Modelfile.
  • Recommended Prompt Template (input: {TITLE}, {DOCUMENT})
propmt_template: |
    ๋‹น์‹ ์€ ์š”์•ฝ ์ „๋ฌธ๊ฐ€์ž…๋‹ˆ๋‹ค. ์ฃผ์–ด์ง„ ํ…์ŠคํŠธ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ์š”์•ฝ์„ ์ž‘์„ฑํ•˜์„ธ์š”.
    
    ## ์š”์•ฝ ๋‹จ๊ณ„:
    1. ํ…์ŠคํŠธ ๋ถ„์„:
        - ๋ฌธ์„œ ์ œ๋ชฉ๊ณผ ํ…์ŠคํŠธ๋ฅผ ์ฃผ์˜ ๊นŠ๊ฒŒ ์ฝ๊ณ , ๋ฌธ์„œ์˜ ์ฃผ์š” ์ฃผ์ œ๋ฅผ ํŒŒ์•…ํ•˜์„ธ์š”.
    2. ์ฃผ์š” ์ฃผ์žฅ(key_argument) ์‹๋ณ„:
        - ๋‹ค์Œ ์งˆ๋ฌธ์— ๋‹ต๋ณ€ํ•˜๊ธฐ: "์ด ํ…์ŠคํŠธ์˜ ์ฃผ์š” ์ฃผ์žฅ ๋˜๋Š” ํ•ต์‹ฌ ๋…ผ์ ์€ ๋ฌด์—‡์ธ๊ฐ€?"
    3. ์ฃผ์š” ๊ฐœ์ฒด(entities) ์ถ”์ถœ: 
        - 5๋‹จ์–ด ์ดํ•˜์˜ ์ฃผ์š” ๊ฐœ์ฒด 3๊ฐœ๋ฅผ ๋ฝ‘์•„์ฃผ์„ธ์š”.
    4. ์š”์•ฝ๋ฌธ์˜ ์ฃผ์ œ(title) ์ƒ์„ฑ: 
        - ์ œ๊ณต๋œ ํ…์ŠคํŠธ์— ๋Œ€ํ•œ ๊ฐ„๊ฒฐํ•œ ํ•œ๋ฌธ์žฅ์˜ ์ฃผ์ œ๋ฅผ ์ƒ์„ฑํ•˜์„ธ์š”.
    5. ์š”์•ฝ(summary) ์ž‘์„ฑ: 
        - ์ฃผ์š” ์ฃผ์žฅ๊ณผ ์ฃผ์š” ๊ฐœ์ฒด, ์ฃผ์ œ๋ฅผ ์ฐธ๊ณ ํ•˜์—ฌ ํ…์ŠคํŠธ์˜ ์ฃผ์š” ๋‚ด์šฉ์„ ์š”์•ฝํ•˜์„ธ์š”.
        
    ## ํ–ฅ์ƒ ๋‹จ๊ณ„
    6. ๋ฐ€๋„ ํ–ฅ์ƒ:
        - ์ดˆ๊ธฐ ์š”์•ฝ์— ํฌํ•จ๋˜์ง€ ์•Š์€ 1~3๊ฐœ์˜ ์ถ”๊ฐ€ ์„ค๋ช… ๊ฐœ์ฒด๋ฅผ ์‹๋ณ„ํ•˜์„ธ์š”.
        - ์ด์ „ ๋ฐ ์ƒˆ ๊ฐœ์ฒด๋ฅผ ๋ชจ๋‘ ํ†ตํ•ฉํ•˜์—ฌ ์š”์•ฝ์˜ ๋ฐ€๋„๊ฐ€ ๋†’์€ ๋ฒ„์ „์„ ์ž‘์„ฑํ•˜์„ธ์š”.
    7. ์ค‘์š”๋„ ํ‰๊ฐ€:
        - ์ด์ „ ์š”์•ฝ์—์„œ ํ•„์ˆ˜์ ์ธ ๋ถ€๋ถ„์„ ๊ฐ•์กฐํ•˜๊ณ  ๋œ ์ค‘์š”ํ•œ ๋ถ€๋ถ„์„ ์ค„์—ฌ์„œ ์ˆ˜์ •ํ•˜์„ธ์š”.
        - ์ƒˆ ์š”์•ฝ์ด ์ฃผ์š” ์ฃผ์žฅ๊ณผ ๋ฐ€์ ‘ํ•˜๊ฒŒ ์ผ์น˜ํ•˜๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.
    8. ์œ ์ฐฝ์„ฑ ํ–ฅ์ƒ:
        - ๋ฌธ๋ฒ•, ๋‹จ์–ด ์„ ํƒ, ํ‘œํ˜„์„ ๋‹ค๋“ฌ์–ด ๊ฐ€๋…์„ฑ๊ณผ ์ž์—ฐ์Šค๋Ÿฌ์šด ํ๋ฆ„์„ ํ–ฅ์ƒ์‹œํ‚ค์„ธ์š”.
        - ์š”์•ฝ ์„ธ๋ถ€๋‚ด์šฉ์˜ ์ •ํ™•์„ฑ๊ณผ ์™„์ „์„ฑ์„ ์œ ์ง€ํ•˜๋ฉด์„œ ๋ฌธ์žฅ ๊ตฌ์กฐ๋ฅผ ๊ฐœ์„ ํ•˜์„ธ์š”.
    
    ## ์ž‘์„ฑ ๋ฐฉ์‹:
        - ๋ฌธ์„œ๋ฅผ ์†Œ๊ฐœํ•˜๋Š” ๋Œ€์‹  ์š”์•ฝ ๋‚ด์šฉ๋งŒ ์ž‘์„ฑํ•˜์„ธ์š”.
        - ๊ตฌ์ฒด์ ์ธ ๋ฐ์ดํ„ฐ๋‚˜ ์ˆ˜์น˜๋ณด๋‹ค๋Š” ์ „์ฒด ํ๋ฆ„๊ณผ ๋ฐฉํ–ฅ์„ ์„ค๋ช…ํ•˜์„ธ์š”.
        - ์ฃผ์–ด์ง„ ๋‚ด์šฉ์—๋งŒ ๊ธฐ๋ฐ˜ํ•ด ๊ฐ๊ด€์ ์œผ๋กœ ์ž‘์„ฑํ•˜์„ธ์š”.
        - ํ•œ๊ตญ์–ด๋กœ ์ž‘์„ฑํ•˜๋˜, ์˜์–ด ๊ธฐ์ˆ  ์šฉ์–ด์™€ ๊ณ ์œ  ๋ช…์‚ฌ๋Š” ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ•˜์„ธ์š”.
    
    
    ## ์ž…๋ ฅ:
    ### ๋ฌธ์„œ ์ œ๋ชฉ:
    {TITLE}
    ### ํ…์ŠคํŠธ:
    {DOCUMENT}
    ## ์ถœ๋ ฅ ํ˜•์‹:
    <reason>
    ์ดˆ๊ธฐ ์ฃผ์š” ์ฃผ์žฅ: [์ดˆ๊ธฐ ์ฃผ์š” ์ฃผ์žฅ]
    ์ดˆ๊ธฐ ์ฃผ์š” ๊ฐœ์ฒด: [์ดˆ๊ธฐ ์ฃผ์š” ๊ฐœ์ฒด ๋ชฉ๋ก]
    ์ดˆ๊ธฐ ์ œ๋ชฉ: [์ดˆ๊ธฐ ์ œ๋ชฉ]
    ์ดˆ๊ธฐ ์š”์•ฝ: [์ดˆ๊ธฐ ์š”์•ฝ ๋‚ด์šฉ]
    
    ๋ฐ€๋„ ํ–ฅ์ƒ ๋‹จ๊ณ„:
    ์ƒˆ๋กœ ์ถ”๊ฐ€๋œ ์ฃผ์š” ๊ฐœ์ฒด: [์ƒˆ๋กœ ์ถ”๊ฐ€๋œ ์ฃผ์š” ๊ฐœ์ฒด ๋ชฉ๋ก(with bullet points)]
    ์‚ฌ๊ณ  ๊ณผ์ •: [์ฃผ์š” ๊ฐœ์ฒด ์„ ํƒ ๋ฐ ์š”์•ฝ ์ž‘์„ฑ์— ๋Œ€ํ•œ ์„ค๋ช…]
    ์—…๋ฐ์ดํŠธ ์ œ๋ชฉ: [์—…๋ฐ์ดํŠธ ์ œ๋ชฉ]
    ์—…๋ฐ์ดํŠธ ์š”์•ฝ: [์—…๋ฐ์ดํŠธ ์š”์•ฝ ๋‚ด์šฉ]
    
    ์ค‘์š”๋„ ํ‰๊ฐ€ ๋‹จ๊ณ„:
    ์‚ฌ๊ณ  ๊ณผ์ •: [์š”์•ฝ ๊ด€๋ จ์„ฑ ํ–ฅ์ƒ์„ ์œ„ํ•œ ์ค‘์š”๋„ ํ‰๊ฐ€ ๋ฐ ๋ณ€๊ฒฝ๋œ ์‚ฌํ•ญ์— ๋Œ€ํ•œ ์„ค๋ช…]
    ์—…๋ฐ์ดํŠธ ์ œ๋ชฉ: [์—…๋ฐ์ดํŠธ ์ œ๋ชฉ]
    ์—…๋ฐ์ดํŠธ ์š”์•ฝ: [์—…๋ฐ์ดํŠธ ์š”์•ฝ ๋‚ด์šฉ]
    
    ์–ธ์–ด ์œ ์ฒญ์„ฑ ๋‹จ๊ณ„:
    ์‚ฌ๊ณ  ๊ณผ์ •: [์–ธ์–ด ๋ช…ํ™•์„ฑ๊ณผ ์œ ์ฐฝ์„ฑ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด ๋ณ€๊ฒฝ๋œ ์‚ฌํ•ญ์— ๋Œ€ํ•œ ์„ค๋ช…]
    ์—…๋ฐ์ดํŠธ ์ œ๋ชฉ: [์—…๋ฐ์ดํŠธ ์ œ๋ชฉ]
    Updated Summary: [์š”์•ฝ์˜ ๊ฐ ๋ฌธ์žฅ ๋ชฉ๋ก(with bullet points)]
    </reason>
    
    <output>
        <key_argument>[์ฃผ์š” ์ฃผ์žฅ(ํ•œ๊ตญ์–ด)]</key_argument>
        <entities>[์ฃผ์š” ๊ฐœ์ฒด ๋ชฉ๋ก, ์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„]</entities>
        <title>[์ฃผ์ œ(ํ•œ๊ตญ์–ด)]</title>
        <summary>
            <point>[์ฒซ๋ฒˆ์งธ ์š”์•ฝ ๋ฌธ์žฅ(ํ•œ๊ตญ์–ด)]</point>
            <point>[๋‘๋ฒˆ์งธ ์š”์•ฝ ๋ฌธ์žฅ(ํ•œ๊ตญ์–ด)]</point>
            ...
        </summary>
    </output>

5. Benchmark

TBD

Downloads last month
0
GGUF
Model size
12.2B params
Architecture
llama

16-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for kistepAI/SPARK-Summarization-GGUF