Information
Advanced, high-quality and an easy to run reasoning for a small size that you can even run on your phone, it supports 128K context.
At its original quality, it runs at ~250 tokens/second on a single Friendli H100 Nvidia GPU.
Trained similarly to Deepseek R1, we used Phi-3.5-mini (and it uses the same chat template as phi-3) as a base model, then we've SFT fine tuned on reasoning using our own private superthoughts instruct dataset which includes a mix of code, website generation, day-to-day chats, math and counting problems & summerization. after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities. Unlike the LITE version of superthoughts, we've fixed a few issues in our instruction/sft dataset, and changed the GRPO code so the model would be more significantly conversational (eg; if you ask it "Whats the weather like?" the lite version would think and probably just answer with "good", unlike superthoughts mini which would start an actual conversation). This model has very strong reasoning abillities yet does not over-think. We personally have found that at least in GSM8K, over thinking did not help much, rather it just made the model get confused and waste a lot of tokens. so due to this, superthoughts mini does not usually over-think.
Format & Examples
<|system|> respond in the following format:
<think>
...
</think>
<output>
...
</output>
<|end|><|user|> How many R's in strawberry<|end|><|assistant|> <think>
Alright, the user has asked how many R's in the word strawberry, that's easy! I just need to count each instance of the letter 'R' in the word 's-t-r-a-w-b-e-r-r-y' and then find out how many R's there are, lets count!
S - Not an R,
T - Not an R,
R - First instance of the letter R! (1),
A - Not an R,
W - Not an R,
B - Not an R,
E - Not an R,
R - Great! Second instance of the letter R. (2),
R - Third instance of the letter R. (3),
Y - Not an R.
So, i've counted all the letters correctly, meaning that I am sure that there are 3 R's in the word Strawberry. I should probably let the user know.
</think>
<output>
There are **3** R's in the word Strawberry.
</output><|end|><|endoftext|>
<|system|> respond in the following format:
<think>
...
</think>
<output>
...
</output>
<|end|><|user|>whats heavier, 20 pounds of wood or 20 feathers<|end|><|assistant|> <think>
Well I need to decide which is heavier between 20 pounds of wood and 20 feathers, let's think through this:
20 pounds is a unit of weight, and it is also the weight of the first object, wood, in this case.
20 feathers, on the other hand, are very light compared to a pound. A single feather might weigh a fraction of a pound, so 20 feathers would be even less.
Since 20 feathers are less than 20 pounds, the answer is clear: 20 pounds of wood is heavier than 20 feathers.\nI will now output the request in a formal and concise manner.
</think>
<output>
20 pounds of wood is heavier than 20 feathers. This is because a single feather is very light, typically weighing less than a pound. Therefore, 20 feathers would weigh significantly less than 20 pounds of wood.
</output><|end|>
It is very reccomend to use a low temperature like 0.6 or 0.8, higher temperatures may cause it to not think properly.
system prompt
(important to ensure it would always think, output, without this thinking is disabled).
respond in the following format:
<think>
...
</think>
<output>
...
</output>
Uploaded model
- Developed by: Pinkstack
- License: apache-2.0
- Finetuned from model : microsoft/Phi-3.5-mini-instruct
- Downloads last month
- 78