Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,15 @@ tags:
|
|
14 |
- music
|
15 |
- art
|
16 |
---
|
17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
|
19 |
JAM is a rectified flow-based model for lyrics-to-song generation that addresses the lack of fine-grained word-level controllability in existing lyrics-to-song models. Built on a compact 530M-parameter architecture with 16 LLaMA-style Transformer layers as the Diffusion Transformer (DiT) backbone, JAM enables precise vocal control that musicians desire in their workflows. Unlike previous models, JAM provides word and phoneme-level timing control, allowing musicians to specify the exact placement of each vocal sound for improved rhythmic flexibility and expressive timing.
|
20 |
|
|
|
14 |
- music
|
15 |
- art
|
16 |
---
|
17 |
+
<div align="center">
|
18 |
+
<img src="https://declare-lab.github.io/jamify-logo-new.png" width="200"/ >
|
19 |
+
<br/>
|
20 |
+
<h1>JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment</h1>
|
21 |
+
<br/>
|
22 |
+
|
23 |
+
[](https://arxiv.org/abs/2507.) [](https://huggingface.co/declare-lab/JAM-0.5) [](https://declare-lab.github.io/jamify)
|
24 |
+
|
25 |
+
</div>
|
26 |
|
27 |
JAM is a rectified flow-based model for lyrics-to-song generation that addresses the lack of fine-grained word-level controllability in existing lyrics-to-song models. Built on a compact 530M-parameter architecture with 16 LLaMA-style Transformer layers as the Diffusion Transformer (DiT) backbone, JAM enables precise vocal control that musicians desire in their workflows. Unlike previous models, JAM provides word and phoneme-level timing control, allowing musicians to specify the exact placement of each vocal sound for improved rhythmic flexibility and expressive timing.
|
28 |
|