Transformers
GGUF
English
Chinese
Inference Endpoints
conversational
A newer version of this model is available: XeTute/SaplingDream_V1-0.5B

Sapling Dream V0.5

This is a finetune ontop of the model mentioned below using the datasets mentioned below through 60% of the training process from V1. V1 will be released of Feb. 22 2025, and it will have better performance than this one. Consider this a demo.

We are currently in the process of training our model, with an official release scheduled for February 22, 2025 at 17:00 according to the timezone of the Islamic Republic of Pakistan.

Introducing SaplingDream, a compact GPT model with 0.5 billion parameters, based on the Qwen/Qwen2.5-0.5B-Instruct architecture. This model has been fine-tuned on reasoning datasets with meticulous attention to detail, ensuring the highest quality—hence the name "SaplingDream." See this as advanced "instruction" tuning for the base model to support reasoning to make up for its size efficiently.

To enhance generalization, we are fine-tuning the base model using Stochastic Gradient Descent (SGD) alongside a "Polynomial" learning rate scheduler, starting with a learning rate of 1e-4. Our goal is to ensure that the model not only learns the tokens but also develops the ability to reason through problems effectively.

For training, we are utilizing the open-thoughts/OpenThoughts-114k and prithivMLmods/Deepthink-Reasoning-Ins datasets across the entire epoch. You can find the FP32 version here.


Our Apps & Socials

Chat with our Assistant | Support us Financially | Visit our GitHub

Long live the Islamic Republic of Pakistan; Glory to the Islamic Republic of Pakistan 🇵🇰
The Flag of the Islamic Federal Republic of Pakistan

Downloads last month
25
GGUF
Model size
494M params
Architecture
qwen2
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Model tree for XeTute/SaplingDream_V0.5-0.5B-GGUF

Base model

Qwen/Qwen2.5-0.5B
Quantized
(1)
this model

Datasets used to train XeTute/SaplingDream_V0.5-0.5B-GGUF

Collection including XeTute/SaplingDream_V0.5-0.5B-GGUF