hubertsiuzdak
/

snac_32khz

Inference Endpoints

Model card Files Files and versions Community

hubertsiuzdak commited on Feb 27, 2024

Commit

8f97ac2

·

verified ·

1 Parent(s): 2491d40

Update README.md

Files changed (1) hide show

README.md +48 -0

README.md CHANGED Viewed

@@ -1,3 +1,51 @@
 ---
 license: mit
 ---

 ---
 license: mit
+tags:
+- audio
 ---
+# SNAC 🍿
+Multi-**S**cale **N**eural **A**udio **C**odec (SNAC) compressess 44.1 kHz audio into discrete codes at a low bitrate.
+See GitHub repository: https://github.com/hubertsiuzdak/snac/
+## Overview
+SNAC encodes audio into hierarchical tokens similarly to SoundStream, EnCodec, and DAC. However, SNAC introduces a simple change where coarse tokens are sampled less frequently,
+covering a broader time span.
+This model compresses 32 kHz audio into discrete codes at a 1.9 kbps bitrate. It uses 4 RVQ levels with token rates of 10, 21, 42, and
+83 Hz.
+## Usage
+Install it using:
+```bash
+pip install snac
+```
+To encode (and reconstruct) audio with SNAC in Python, use the following code:
+```python
+import torch
+from snac import SNAC
+model = SNAC.from_pretrained("hubertsiuzdak/snac_32khz").eval().cuda()
+audio = torch.randn(1, 1, 32000).cuda()  # B, 1, T
+with torch.inference_mode():
+    audio_hat, _, codes, _, _ = model(audio)
+```
+⚠️ Note that `codes` is a list of token sequences of variable lengths, each corresponding to a different temporal
+resolution.
+```
+>>> [code.shape[1] for code in codes]
+[12, 24, 48, 96]
+```
+## Acknowledgements
+Module definitions are adapted from the [Descript Audio Codec](https://github.com/descriptinc/descript-audio-codec).