IbrahimSalah/F5-TTS-Arabic · How can I use your model to continue my training process?

4 days ago

Is your model suitable for practicing another language in the Middle East? And if so, how can I use your model to continue the learning process?

IbrahimSalah

Owner 4 days ago

You can follow the instructions for the F5-TTS model https://github.com/SWivid/F5-TTS :

1-Download the checkpoint – The best option is the 380000.pt checkpoint, along with the vocab.txt file.
2-Verify the vocabulary file – Ensure that vocab.txt contains all the characters in your dataset. If you are continuing fine-tuning in Arabic, you should be fine, as I have already included all Arabic characters and diacritics.
3-Set up the F5-TTS environment – Once the environment is ready, choose the fine-tuning option. Then, set the model path to the checkpoint file and the tokenizer to the vocab.txt file.

Toto44220

3 days ago

Hello,
Thank you for sharing this file! What was the structure of the training?
It seems the default one doesn't work: dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4)

IbrahimSalah

Owner 3 days ago

•

edited 3 days ago

it is the default , did not change it . model_cfg = dict(dim=1024, depth=22, heads=16, ff_mult=2, text_dim=512, conv_layers=4).I did changed the vocab thouth ,so make sure you are using the one here in the repo

IbrahimSalah changed discussion status to closed 3 days ago

IbrahimSalah changed discussion status to open 3 days ago

Toto44220

2 days ago

Thank you!! now it works - the vocab file was exactly my problem

IbrahimSalah changed discussion status to closed 2 days ago

butterflyoiio

1 day ago

•

edited 1 day ago

Thanks for your help, but:

I can't use your vocab file. When I give the path to your vocab file to Tokenizer File , it gives this error:
Loading model cost 0.440 seconds.
Prefix dict has been built successfully.

vocab : 2580

vocoder : vocos
Using logger: None
Loading dataset ...
Traceback (most recent call last):
File "C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\model\dataset.py", line 259, in load_dataset
train_dataset = load_from_disk(f"{rel_data_path}/raw")
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\load.py", line 2207, in load_from_disk
raise FileNotFoundError(f"Directory {dataset_path} not found")
FileNotFoundError: Directory C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts....\data\my_Language_custom/raw not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\train\finetune_cli.py", line 182, in
main()
File "C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\train\finetune_cli.py", line 173, in main
train_dataset = load_dataset(args.dataset_name, tokenizer, mel_spec_kwargs=mel_spec_kwargs)
File "C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\model\dataset.py", line 261, in load_dataset
train_dataset = Dataset_.from_file(f"{rel_data_path}/raw.arrow")
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\arrow_dataset.py", line 742, in from_file
table = ArrowReader.read_table(filename, in_memory=in_memory)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\arrow_reader.py", line 329, in read_table
return table_cls.from_file(filename)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\table.py", line 1017, in from_file
table = _memory_mapped_arrow_table_from_file(filename)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\table.py", line 63, in _memory_mapped_arrow_table_from_file
opened_stream = _memory_mapped_record_batch_reader_from_file(filename)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\datasets\table.py", line 48, in _memory_mapped_record_batch_reader_from_file
memory_mapped_stream = pa.memory_map(filename)
File "pyarrow\io.pxi", line 1147, in pyarrow.lib.memory_map
File "pyarrow\io.pxi", line 1094, in pyarrow.lib.MemoryMappedFile._open
File "pyarrow\error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow\error.pxi", line 92, in pyarrow.lib.check_status
FileNotFoundError: [WinError 3] Failed to open local file 'C:/pinokio/api/e2-f5-tts.git/app/src/f5_tts/../../data/my_Language_custom/raw.arrow'. Detail: [Windows error 3] The system cannot find the path specified.

Traceback (most recent call last):
File "C:\pinokio\bin\miniconda\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\pinokio\bin\miniconda\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\pinokio\api\e2-f5-tts.git\app\env\Scripts\accelerate.exe_main.py", line 7, in
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\accelerate\commands\accelerate_cli.py", line 48, in main
args.func(args)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\accelerate\commands\launch.py", line 1172, in launch_command
simple_launcher(args)
File "C:\pinokio\api\e2-f5-tts.git\app\env\lib\site-packages\accelerate\commands\launch.py", line 762, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\pinokio\api\e2-f5-tts.git\app\env\Scripts\python.exe', 'C:\pinokio\api\e2-f5-tts.git\app\src\f5_tts\train\finetune_cli.py', '--exp_name', 'F5TTS_Base', '--learning_rate', '1e-05', '--batch_size_per_gpu', '800', '--batch_size_type', 'frame', '--max_samples', '64', '--grad_accumulation_steps', '1', '--max_grad_norm', '1', '--epochs', '14', '--num_warmup_updates', '66', '--save_per_updates', '500', '--keep_last_n_checkpoints', '1', '--last_per_updates', '200', '--dataset_name', 'my_Language', '--finetune', '--pretrain', 'C:\pinokio\api\e2-f5-tts.git\app\ckpts\model_380000.pt', '--tokenizer_path', 'C:\pinokio\api\e2-f5-tts.git\app\ckpts\vocab.txt', '--tokenizer', 'custom', '--log_samples', '--logger', 'wandb']' returned non-zero exit status 1.

butterflyoiio

1 day ago

When I take the path to your vocab file from Tokenizer File, it gives this:

Creating dynamic batches with 800 audio frames per gpu: 0%| | 0/1335 [00:00<?, ?it/s]
Creating dynamic batches with 800 audio frames per gpu: 100%|##########| 1335/1335 [00:00<?, ?it/s]
Saved last checkpoint at update 380000

IbrahimSalah

Owner 1 day ago

for this issue "Saved last checkpoint at update 380000",How many epochs you are using ?, I think you should increase the number of epochs.

IbrahimSalah changed discussion status to open 1 day ago

butterflyoiio

1 day ago

•

edited 1 day ago

14 epochs
Does that mean I should set the number of epochs so that when multiplied by the samples, it exceeds 380,000?

I saved your file with two names in the project path:
model_last.pt
pretrained_model_380000.pt
Does the number of epochs matter? Because I was just testing

And one more thing: when I reduce the size of your file in the Reduce Checkpoint , these problems do not occur, can this method be used to continue training the model?

IbrahimSalah

Owner 1 day ago

If you're just testing, use infer_cli.py instead of finetune_cli.py. Make sure the checkpoint and vocabulary paths are correct. Additionally, provide reference audio along with Arabic reference text for optimal results. Experiment with the hyperparameters to fine-tune performance, and it should work without issues.

For fine-tuning, I haven't tested the reduced checkpoints myself, but if it includes the model's state dict, it should work just fine.

butterflyoiio

about 23 hours ago

•

edited about 22 hours ago

What I mean by testing is that I was experimenting with a small dataset so that I could eventually use a larger dataset. Actually, I wanted to calculate how much time it takes to train the model for each hour of dataset.
I want to collect quality data every few days and keep going.
Main problem is with the vocab file. I can't use the my vocab and your pretrained model.

size mismatch for ema_model.transformer.text_embed.text_embed.weight: copying a param with shape torch.Size([2581, 512]) from checkpoint, the shape in current model is torch. Size([2590, 512]).

butterflyoiio

about 22 hours ago

•

edited about 22 hours ago

Problem solved:
When we click the Extend option in the Vocab Check tab, it creates a pretrained_model_1200000 and vocab.txt file inside our project that uses them.
But because it was using the model_1200000.pt file that was in the
C:\pinokio\api\e2-f5-tts.git\cache\HF_HOME\hub\models--SWivid--F5-TTS\snapshots\4dcc16f297f2ff98a17b3726b16f5de5a5e45672\F5TTS_Base path,
the new vocab file was not compatible with your pretrained model.
I replaced your pretrained model file (model_380000.pt) with the original file project (model_1200000.pt) and the problem was solved.

IbrahimSalah

Owner about 21 hours ago

Since you're using a new vocabulary with a different size, you need to expand the model's embeddings accordingly. When you expanded the vocabulary, the code automatically adjusted the model original embeddings to match. If you want to use the 380000.pt checkpoint, you'll need to do the same—expand the embeddings to ensure compatibility

butterflyoiio

about 20 hours ago

Thanks for the help and quick response.

IbrahimSalah changed discussion status to closed about 20 hours ago