arXiv Update (2025-02-13): Add Llasa finetune instruction.

Update (2025-02-07): Our paper has been released!

Paper

LLaSA: Scaling Train Time and Inference Time Compute for LLaMA based Speech Synthesis

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model (AAAI 2025, xcodec 1.0)

Getting Started with XCodec2 on Hugging Face

XCodec2 is a speech tokenizer that offers the following key features:

  1. Single Vector Quantization
  2. 50 Tokens per Second
  3. Multilingual Speech Semantic Support and High-Quality Speech Reconstruction

To use xcodec2, ensure you have it installed. You can install it using the following command:

conda create -n xcodec2 python=3.9
conda activate xcodec2
pip install xcodec2==0.1.3 (Fix the bug in the previous version to achieve better sound quality)

Then,

import torch
import soundfile as sf
from transformers import AutoConfig

 
from xcodec2.modeling_xcodec2 import XCodec2Model
 
model_path = "HKUSTAudio/xcodec2"  
 
model = XCodec2Model.from_pretrained(model_path)
model.eval().cuda()   

 
wav, sr = sf.read("test.wav")   
wav_tensor = torch.from_numpy(wav).float().unsqueeze(0)  # Shape: (1, T)

 
with torch.no_grad():
   # Only 16khz speech
   # Only supports single input. For batch inference, please refer to the link below.
    vq_code = model.encode_code(input_waveform=wav_tensor)
    print("Code:", vq_code )  

    recon_wav = model.decode_code(vq_code).cpu()       # Shape: (1, 1, T')

 
sf.write("reconstructed.wav", recon_wav[0, 0, :].numpy(), sr)
print("Done! Check reconstructed.wav")

If you want to train your own xcodec2, batch inference, or large-scale code extraction, the code is released here.

Downloads last month
27,046
Safetensors
Model size
823M params
Tensor type
F32
ยท
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for HKUSTAudio/xcodec2

Finetunes
2 models

Spaces using HKUSTAudio/xcodec2 2

Collection including HKUSTAudio/xcodec2