muhtasham
/

spark-llm-finetune-tj

+---
+library_name: transformers
+tags:
+- axolotl
+- generated_from_trainer
+datasets:
+- data/output_prompt.jsonl
+model-index:
+- name: spark-llm-finetune-tj
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+[<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
+<details><summary>See axolotl config</summary>
+axolotl version: `0.9.2`
+```yaml
+base_model: pretrained_models/Spark-TTS-0.5B/LLM
+# Automatically upload checkpoint and final model to HF
+hub_model_id: muhtasham/spark-llm-finetune-tj
+trust_remote_code: true
+strict: false
+datasets:
+  - path: data/output_prompt.jsonl
+    type: completion
+dataset_prepared_path:
+val_set_size: 0.05
+output_dir: ./outputs/out
+sequence_len: 4098
+sample_packing: true
+eval_sample_packing: true
+pad_to_sequence_len: true
+wandb_project: spark-tts
+wandb_entity:
+wandb_watch:
+wandb_name:
+wandb_log_model:
+gradient_accumulation_steps: 4
+micro_batch_size: 4
+num_epochs: 50
+optimizer: adamw_torch_fused
+lr_scheduler: cosine
+learning_rate: 0.0002
+train_on_inputs: false
+group_by_length: false
+bf16: auto
+fp16:
+tf32: true
+gradient_checkpointing: true
+gradient_checkpointing_kwargs:
+  use_reentrant: false
+early_stopping_patience:
+resume_from_checkpoint:
+local_rank:
+logging_steps: 50
+xformers_attention:
+flash_attention: true
+warmup_steps: 10
+evals_per_epoch: 1
+save_steps: 5000
+debug:
+deepspeed:
+weight_decay: 0.0
+```
+</details><br>
+# spark-llm-finetune-tj
+This model was trained from scratch on the data/output_prompt.jsonl dataset.
+It achieves the following results on the evaluation set:
+- Loss: 5.2546
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0002
+- train_batch_size: 4
+- eval_batch_size: 4
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 16
+- optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 10
+- num_epochs: 50.0
+### Training results
+| Training Loss | Epoch   | Step | Validation Loss |
+|:-------------:|:-------:|:----:|:---------------:|
+| No log        | 0.0088  | 1    | 9.9240          |
+| 5.5236        | 0.9978  | 114  | 5.5667          |
+| 5.0799        | 1.9891  | 228  | 5.3932          |
+| 4.9292        | 2.9803  | 342  | 5.3107          |
+| 4.7729        | 3.9716  | 456  | 5.2529          |
+| 4.7022        | 4.9628  | 570  | 5.2174          |
+| 4.6598        | 5.9540  | 684  | 5.1988          |
+| 4.6176        | 6.9453  | 798  | 5.1833          |
+| 4.5814        | 7.9365  | 912  | 5.1737          |
+| 4.5422        | 8.9278  | 1026 | 5.1687          |
+| 4.506         | 9.9190  | 1140 | 5.1643          |
+| 4.492         | 10.9103 | 1254 | 5.1646          |
+| 4.4605        | 11.9015 | 1368 | 5.1670          |
+| 4.4384        | 12.8928 | 1482 | 5.1699          |
+| 4.4151        | 13.8840 | 1596 | 5.1751          |
+| 4.4053        | 14.8753 | 1710 | 5.1766          |
+| 4.3875        | 15.8665 | 1824 | 5.1807          |
+| 4.3684        | 16.8578 | 1938 | 5.1879          |
+| 4.3624        | 17.8490 | 2052 | 5.1921          |
+| 4.3413        | 18.8403 | 2166 | 5.1983          |
+| 4.3302        | 19.8315 | 2280 | 5.2020          |
+| 4.3179        | 20.8228 | 2394 | 5.2081          |
+| 4.3152        | 21.8140 | 2508 | 5.2157          |
+| 4.306         | 22.8053 | 2622 | 5.2180          |
+| 4.2989        | 23.7965 | 2736 | 5.2243          |
+| 4.2982        | 24.7877 | 2850 | 5.2282          |
+| 4.2862        | 25.7790 | 2964 | 5.2328          |
+| 4.2827        | 26.7702 | 3078 | 5.2339          |
+| 4.2775        | 27.7615 | 3192 | 5.2368          |
+| 4.2802        | 28.7527 | 3306 | 5.2417          |
+| 4.2686        | 29.7440 | 3420 | 5.2434          |
+| 4.2713        | 30.7352 | 3534 | 5.2432          |
+| 4.2689        | 31.7265 | 3648 | 5.2476          |
+| 4.2687        | 32.7177 | 3762 | 5.2481          |
+| 4.2651        | 33.7090 | 3876 | 5.2508          |
+| 4.266         | 34.7002 | 3990 | 5.2509          |
+| 4.2644        | 35.6915 | 4104 | 5.2517          |
+| 4.2626        | 36.6827 | 4218 | 5.2517          |
+| 4.2646        | 37.6740 | 4332 | 5.2525          |
+| 4.2617        | 38.6652 | 4446 | 5.2524          |
+| 4.2603        | 39.6565 | 4560 | 5.2544          |
+| 4.2633        | 40.6477 | 4674 | 5.2537          |
+| 4.2561        | 41.6389 | 4788 | 5.2522          |
+| 4.2612        | 42.6302 | 4902 | 5.2546          |
+| 4.2618        | 43.6214 | 5016 | 5.2530          |
+| 4.2602        | 44.6127 | 5130 | 5.2540          |
+| 4.2619        | 45.6039 | 5244 | 5.2543          |
+| 4.263         | 46.5952 | 5358 | 5.2549          |
+| 4.2625        | 47.5864 | 5472 | 5.2547          |
+| 4.2611        | 48.5777 | 5586 | 5.2545          |
+| 4.2621        | 49.5689 | 5700 | 5.2546          |
+### Framework versions
+- Transformers 4.51.3
+- Pytorch 2.7.1+cu126
+- Datasets 3.5.1
+- Tokenizers 0.21.1