BeaverAI
/

MS-2501-DPE-QwQify-v0.1-24B

@@ -1,36 +1,302 @@
 ---
-base_model:
-- PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
-- BeaverAI/MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS
 library_name: transformers
 tags:
-- mergekit
-- merge
 ---
-# merge
-This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
-## Merge Details
-### Merge Method
-This model was merged using the Passthrough merge method.
-### Models Merged
-The following models were included in the merge:
-* [PocketDoc/Dans-PersonalityEngine-V1.2.0-24b](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.2.0-24b) + [BeaverAI/MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS](https://huggingface.co/BeaverAI/MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS)
-### Configuration
-The following YAML configuration was used to produce this model:
 ```yaml
-dtype: bfloat16
-merge_method: passthrough
-tokenizer:
-  source: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
-models:
-  - model: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b+BeaverAI/MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS
 ```

 ---
 library_name: transformers
+license: apache-2.0
+base_model: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
 tags:
+- axolotl
+- generated_from_trainer
+datasets:
+- PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled
+- PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled
+model-index:
+- name: MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS
+  results: []
 ---
+# BeaverAI/MS-2501-DPE-QwQify-v0.1-24B
+Test model to try to give an existing model QwQ's thoughts. For this version it is ontop of [`PocketDoc/Dans-PersonalityEngine-V1.2.0-24b`](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.2.0-24b) (a jack of all trades instruct model), which was trained ontop of [`mistralai/Mistral-Small-24B-Base-2501`](https://huggingface.co/mistralai/Mistral-Small-24B-Base-2501).
+The prompt formatting and usage should be the same as with QwQ; Use ChatML, and remove the thinking from previous turns. If thoughts arent being generated automatically, add `<think>\n` to the start of the assistant turn.
+It should follow previous model turns formatting. On first turns of the conversation you may need to regen a few times, and maybe edit the model responses for the first few turns to get it to your liking.
+You may want to disable inserting `{{char}}:` prefix for the character, and instead add something like `Only speak as "{{char}}" in conversation with "{{user}}". Output your final response with a "{{char}}:" prefix.` to the end of you system prompt.
+![image/png](https://i.imgur.com/EnULiEI.png)
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.8.0.dev0`
 ```yaml
+mlflow_tracking_uri: http://127.0.0.1:7860
+mlflow_experiment_name: MS-2501-DPE-QwQify-v0.1-24B-LoRA
+# Hugging Face saving config
+hub_model_id: BeaverAI/MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS
+hub_strategy: every_save
+# Model checkpointing config
+output_dir: ./Outputs/MS-2501-DPE-QwQify-v0.1-24B-LoRA
+resume_from_checkpoint:
+save_steps: 50
+save_safetensors: true
+save_total_limit: 3
+save_only_model: false
+# Model architecture config
+base_model: PocketDoc/Dans-PersonalityEngine-V1.2.0-24b
+model_type: MistralForCausalLM
+tokenizer_type: AutoTokenizer
+# Mixed precision training config
+bf16: true
+fp16: false
+tf32: false
+# Model loading config
+load_in_8bit: false
+load_in_4bit: false
+strict: false
+# Sequence config
+sequence_len: 8192
+min_sample_len: 256
+sample_packing: true
+eval_sample_packing: true
+pad_to_sequence_len: true
+train_on_inputs: false
+group_by_length: false
+# LoRA adapter config
+adapter: lora
+lora_model_dir:
+lora_r: 128
+lora_alpha: 128
+lora_dropout: 0.125
+peft_layers_to_transform:
+peft_use_dora:
+peft_use_rslora:
+peft_layer_replication:
+lora_target_modules:
+  - gate_proj
+  - down_proj
+  - up_proj
+  - q_proj
+  - v_proj
+  - k_proj
+  - o_proj
+lora_modules_to_save:
+# Fix uninitialized tokens (such as <|start_header_id|> on the base L3 models)
+fix_untrained_tokens:
+# Dataset config
+# https://github.com/xzuyn/axolotl/blob/came-plus-formatters/src/axolotl/prompt_strategies/customchatml-regex-last-only.py
+datasets:
+  - path: PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled
+    split: train
+    type: customchatml-regex-last-only
+test_datasets:
+  - path: PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+  - path: PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled
+    split: test
+    type: customchatml-regex-last-only
+val_set_size: 0
+eval_strategy: steps
+eval_steps: 50
+dataset_prepared_path: ./00-Tokenized-Datasets/MS-2501-DPE-QwQify-v0.1-24B-customchatml-regex-last-only
+shuffle_merged_datasets: true
+dataset_processes:
+# Training hyperparameters
+num_epochs: 2
+gradient_accumulation_steps: 1
+micro_batch_size: 8  # x4 GPUs = 32
+eval_batch_size: 8   # x4 GPUs = 32
+warmup_steps: 0
+optimizer: came_pytorch
+optim_args:
+optim_target_modules:
+lr_scheduler: rex
+learning_rate: 2e-5
+cosine_min_lr_ratio:
+loraplus_lr_ratio:
+loraplus_lr_embedding:
+weight_decay: 0.1
+max_grad_norm: 1
+logging_steps: 1
+# Model optimization
+gradient_checkpointing: unsloth
+flash_attention: true
+plugins:
+  - axolotl.integrations.liger.LigerPlugin
+  - axolotl.integrations.cut_cross_entropy.CutCrossEntropyPlugin
+cut_cross_entropy: true
+liger_rope: true
+liger_rms_norm: true
+liger_layer_norm: true
+liger_glu_activation: true
+liger_cross_entropy: false
+liger_fused_linear_cross_entropy: false
+lora_mlp_kernel: false
+lora_qkv_kernel: false
+lora_o_kernel: false
+# DeepSpeed
+deepspeed: deepspeed_configs/zero3_bf16.json
+# Garbage Collection
+gc_steps: 1
+# Debug config
+debug: true
+seed: 42
+# Token config
+special_tokens:
+  bos_token: "<s>"
+  eos_token: "<|im_end|>"
+  pad_token: "<pad>"
+tokens:
 ```
+</details><br>
+# MS-2501-DPE-QwQify-v0.1-24B-LoRA-WS
+This model is a fine-tuned version of [PocketDoc/Dans-PersonalityEngine-V1.2.0-24b](https://huggingface.co/PocketDoc/Dans-PersonalityEngine-V1.2.0-24b) on the PJMixers-Dev/allura-org_gryphe-sonnet-3.5-charcards-names-added-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/anthracite-org_c2_logs_32k_llama3_qwen2_v1.3-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_aicg-logs-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_jannie-log-augmented-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/grimulkan_PIPPA-augmented-dedup-system-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/lemonilia_LimaRP-Only-NonSus-Simple-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/MinervaAI_Aesir-Preview-Anon-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/NyxKrage_chub-logs-sharegpt-longest-CustomShareGPT-qwq-all-aphrodite-Shuffled, the PJMixers-Dev/PocketDoc_Dans-Prosemaxx-Cowriter-XL-8192-shrunk-l3-qwq-all-aphrodite-Shuffled and the PJMixers-Dev/PocketDoc_Dans-Personamaxx-Rainy-qwq-all-aphrodite-Shuffled datasets.
+It achieves the following results on the evaluation set:
+- Loss: 1.1949
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 2e-05
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 4
+- total_train_batch_size: 32
+- total_eval_batch_size: 32
+- optimizer: Use OptimizerNames.ADAMW_HF with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- num_epochs: 2.0
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 1.9925        | 0.0019 | 1    | 1.9225          |
+| 1.4228        | 0.0936 | 50   | 1.4329          |
+| 1.3473        | 0.1873 | 100  | 1.3722          |
+| 1.3259        | 0.2809 | 150  | 1.3414          |
+| 1.2795        | 0.3745 | 200  | 1.3199          |
+| 1.2817        | 0.4682 | 250  | 1.3029          |
+| 1.2365        | 0.5618 | 300  | 1.2910          |
+| 1.2134        | 0.6554 | 350  | 1.2803          |
+| 1.2655        | 0.7491 | 400  | 1.2700          |
+| 1.2297        | 0.8427 | 450  | 1.2614          |
+| 1.178         | 0.9363 | 500  | 1.2524          |
+| 1.1525        | 1.0300 | 550  | 1.2467          |
+| 1.1751        | 1.1236 | 600  | 1.2411          |
+| 1.216         | 1.2172 | 650  | 1.2366          |
+| 1.1706        | 1.3109 | 700  | 1.2302          |
+| 1.1363        | 1.4045 | 750  | 1.2256          |
+| 1.1563        | 1.4981 | 800  | 1.2194          |
+| 1.1559        | 1.5918 | 850  | 1.2147          |
+| 1.1263        | 1.6854 | 900  | 1.2090          |
+| 1.099         | 1.7790 | 950  | 1.2038          |
+| 1.1786        | 1.8727 | 1000 | 1.1994          |
+| 1.1057        | 1.9663 | 1050 | 1.1949          |
+### Framework versions
+- PEFT 0.14.0
+- Transformers 4.49.0
+- Pytorch 2.6.0+cu124
+- Datasets 3.2.0
+- Tokenizers 0.21.1