declare-lab
/

JAM-0.5

song-generation

direct-preference-optimization

Model card Files Files and versions

soujanyaporia commited on 25 days ago

Commit

1e36220

·

verified ·

1 Parent(s): 786e957

Update README.md

Files changed (1) hide show

README.md +9 -1

README.md CHANGED Viewed

@@ -14,7 +14,15 @@ tags:
 - music
 - art
 ---
-# JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment
 JAM is a rectified flow-based model for lyrics-to-song generation that addresses the lack of fine-grained word-level controllability in existing lyrics-to-song models. Built on a compact 530M-parameter architecture with 16 LLaMA-style Transformer layers as the Diffusion Transformer (DiT) backbone, JAM enables precise vocal control that musicians desire in their workflows. Unlike previous models, JAM provides word and phoneme-level timing control, allowing musicians to specify the exact placement of each vocal sound for improved rhythmic flexibility and expressive timing.

 - music
 - art
 ---
+<div align="center">
+  <img src="https://declare-lab.github.io/jamify-logo-new.png" width="200"/ >
+ <br/>
+  <h1>JAM: A Tiny Flow-based Song Generator with Fine-grained Controllability and Aesthetic Alignment</h1>
+  <br/>
+  [![arXiv](https://img.shields.io/badge/Read_the_paper-blue?style=flat&logoColor=blue&link=https%3A%2F%2Farxiv.org%2Fabs%2F)](https://arxiv.org/abs/2507.) [![Static Badge](https://img.shields.io/badge/JAM-Huggingface-violet?style=flat&logo=huggingface&logoColor=blue&link=https%3A%2F%2Fhuggingface.co%2Fdeclare-lab%2FJAM-0.5)](https://huggingface.co/declare-lab/JAM-0.5) [![Static Badge](https://img.shields.io/badge/Project-Jamify-pink?style=flat&logo=homepage&link=https%3A%2F%2Fdeclare-lab.github.io%2Fjamify)](https://declare-lab.github.io/jamify)
+</div>
 JAM is a rectified flow-based model for lyrics-to-song generation that addresses the lack of fine-grained word-level controllability in existing lyrics-to-song models. Built on a compact 530M-parameter architecture with 16 LLaMA-style Transformer layers as the Diffusion Transformer (DiT) backbone, JAM enables precise vocal control that musicians desire in their workflows. Unlike previous models, JAM provides word and phoneme-level timing control, allowing musicians to specify the exact placement of each vocal sound for improved rhythmic flexibility and expressive timing.