DeepSeek R1 Distill Qwen 1.5B - Fine-tuned Version (Q4_K_M.gguf)

Overview

This is a fine-tuned version of DeepSeek R1 Distill Qwen 1.5B, optimized for extracting actionable insights and scheduling events from conversations. The model has undergone 2500 steps / 9 epochs of fine-tuning with 2194 examples, ensuring high accuracy and efficiency in structured information extraction.

Model Details

  • Base Model: DeepSeek R1 Distill Qwen 1.5B
  • Fine-tuning Steps: 2500
  • Epochs: 9
  • Dataset Size: 2194 examples
  • License: MIT
  • File Format: GGUF
  • Released Version: Q4_K_M.gguf
Metric 3090 Ti Raspberry Pi 5
Prompt Eval Time 33.78 ms / 406 tokens (0.08 ms per token, 12017.88 tokens/sec) 17831.25 ms / 535 tokens (33.33 ms per token, 30.00 tokens/sec)
Eval Time 7133.93 ms / 1694 tokens (4.21 ms per token, 237.46 tokens/sec) 52006.54 ms / 529 tokens (98.31 ms per token, 10.17 tokens/sec)
Total Time 7167.72 ms / 2100 tokens 70881.95 ms / 1064 tokens
Decoding Speed N/A 529 tokens in 70.40s (7.51 tokens/sec)
Sampling Speed N/A 149.33 ms / 530 runs (0.28 ms per token, 3549.26 tokens/sec)

Observations:

  • The 3090 Ti is significantly faster, handling 12017.88 tokens/sec for prompt evaluation, compared to 30 tokens/sec on the Pi 5.
  • In token evaluation, the 3090 Ti manages 237.46 tokens/sec, whereas the Pi 5 achieves just 10.17 tokens/sec.
  • The Pi 5's total execution time (70.88s) is close to the 3090 Ti, but it processes far fewer tokens in that time.

Usage Instructions

System Prompt

To use this model effectively, initialize it with the following system prompt:

### Instruction:
Purpose:  
Extract actionable information from the provided dialog and metadata, generating bullet points with importance rankings and identifying relevant calendar events.

### Steps:

1. **Context Analysis:**
   - Use `CurrentDateTime` to interpret relative time references (e.g., "tomorrow").
   - Prioritize key information based on `InformationRankings`:
     - Higher rank values indicate more emphasis on that aspect.

2. **Bullet Points:**
   - Summarize key points concisely.
   - Assign an importance rank (1-100).
   - Format: `<Bullet_Point>"[Summary]"</Bullet_Point><Rank>[1-100]</Rank>`

3. **Event Detection:**
   - Identify and structure events with clear scheduling details.
   - Format:  
     `<Calendar_Event>EventTitle:"[Title]",StartDate:"[YYYY-MM-DD,HH:MM]",EndDate:"[YYYY-MM-DD,HH:MM or N/A]",Recurrence:"[Daily/Weekly/Monthly or N/A]",Details:"[Summary]"</Calendar_Event>`

4. **Filtering:**
   - Exclude vague, non-actionable statements.
   - Only create events for clear, actionable scheduling needs.

5. **Output Consistency:**
   - Follow the exact XML format.
   - Ensure structured, relevant output.

ONLY REPLY WITH THE XML AFTER YOU END THINK.

Dialog: "{conversations}"
CurrentDateTime: "{date_and_time_and_day}"
InformationRankings: "{information_rankings}"
<think>

How to Run the Model

Using llama.cpp

If you are using llama.cpp, run the model with:

./main -m Q4_K_M.gguf --prompt "<your prompt>" --temp 0.7 --n-gpu-layers 50

Using Text Generation WebUI

  1. Download and place the Q4_K_M.gguf file in the models folder.
  2. Start the WebUI:
python server.py --model Q4_K_M.gguf
  1. Use the system prompt above for structured output.

Expected Output Format

Example response when processing a conversation:

<Bullet_Point>"Team meeting scheduled for next Monday to finalize project details."</Bullet_Point><Rank>85</Rank>
<Calendar_Event>EventTitle:"Team Meeting",StartDate:"2025-06-10,10:00",EndDate:"2025-06-10,11:00",Recurrence:"N/A",Details:"Finalizing project details with the team."</Calendar_Event>

License

This model is released under the MIT License, allowing free usage, modification, and distribution.

Contact & Support

For any inquiries or support, please visit Hugging Face Discussions or open an issue on the repository.

Downloads last month
59
GGUF
Model size
1.78B params
Architecture
qwen2

4-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Evaluation results

  • Prompt Eval Time (ms per token) on Custom Dataset for Event & Bullet Extraction
    self-reported
    33.780