Triangle104 commited on
Commit
fd3508e
·
verified ·
1 Parent(s): af0b48c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +0 -40
README.md CHANGED
@@ -15,46 +15,6 @@ tags:
15
  This model was converted to GGUF format from [`ArliAI/QwQ-32B-ArliAI-RpR-v4`](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
16
  Refer to the [original model card](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) for more details on the model.
17
 
18
- ---
19
- RpR (RolePlay with Reasoning) is a new series of models from ArliAI. This series builds directly upon the successful dataset curation methodology and training methods developed for the RPMax series.
20
-
21
- RpR models use the same curated, deduplicated RP and creative writing
22
- dataset used for RPMax, with a focus on variety to ensure high
23
- creativity and minimize cross-context repetition. Users familiar with
24
- RPMax will recognize the unique, non-repetitive writing style unlike
25
- other finetuned-for-RP models.
26
-
27
- With the release of QwQ as the first high performing open-source
28
- reasoning model that can be easily trained, it was clear that the
29
- available instruct and creative writing reasoning datasets contains only
30
- one response per example. This is type of single response dataset used
31
- for training reasoning models causes degraded output quality in long
32
- multi-turn chats. Which is why Arli AI decided to create a real RP model
33
- capable of long multi-turn chat with reasoning.
34
-
35
- In order to create RpR, we first had to actually create the reasoning
36
- RP dataset by re-processing our existing known-good RPMax dataset into a
37
- reasoning dataset. This was possible by using the base QwQ Instruct
38
- model itself to create the reasoning process for every turn in the RPMax
39
- dataset conversation examples, which is then further refined in order
40
- to make sure the reasoning is in-line with the actual response examples
41
- from the dataset.
42
-
43
- Another important thing to get right is to make sure the model is
44
- trained on examples that present reasoning blocks in the same way as it
45
- encounters it during inference. Which is, never seeing the reasoning
46
- blocks in it's context. In order to do this, the training run was
47
- completed using axolotl with manual template-free segments dataset in
48
- order to make sure that the model is never trained to see the reasoning
49
- block in the context. Just like how the model will be used during
50
- inference time.
51
-
52
- The result of training QwQ on this dataset with this method are
53
- consistently coherent and interesting outputs even in long multi-turn RP
54
- chats. This is as far as we know the first true correctly-trained
55
- reasoning model trained for RP and creative writing.
56
-
57
- ---
58
  ## Use with llama.cpp
59
  Install llama.cpp through brew (works on Mac and Linux)
60
 
 
15
  This model was converted to GGUF format from [`ArliAI/QwQ-32B-ArliAI-RpR-v4`](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
16
  Refer to the [original model card](https://huggingface.co/ArliAI/QwQ-32B-ArliAI-RpR-v4) for more details on the model.
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ## Use with llama.cpp
19
  Install llama.cpp through brew (works on Mac and Linux)
20