nielsr HF Staff commited on
Commit
8552cf2
·
verified ·
1 Parent(s): 0ac33b2

Add project page link and introductory sentence to model card

Browse files

This PR improves the model card by:
- Adding an introductory sentence to directly link to the paper, [Sequential Diffusion Language Models](https://huggingface.co/papers/2509.24007), for better discoverability.
- Including an explicit link to the project page: [https://internvl.github.io/blog/2025-09-29-SDLM/](https://internvl.github.io/blog/2025-09-29-SDLM/) in the header links.

The metadata, existing GitHub and arXiv links, and usage examples remain unchanged as they are already complete and accurate.

Files changed (1) hide show
  1. README.md +81 -79
README.md CHANGED
@@ -1,18 +1,6 @@
1
  ---
2
- license: apache-2.0
3
- license_name: qwen
4
- license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE
5
- pipeline_tag: text-generation
6
- library_name: transformers
7
  base_model:
8
  - Qwen/Qwen2.5-3B
9
- base_model_relation: finetune
10
- language:
11
- - en
12
- tags:
13
- - sdlm
14
- - diffusion language model
15
- - custom_code
16
  datasets:
17
  - dyyyyyyyy/ScaleQuest-Math
18
  - OpenCoder-LLM/opc-sft-stage2
@@ -20,15 +8,29 @@ datasets:
20
  - HuggingFaceTB/smoltalk2
21
  - LipengCS/Table-GPT
22
  - allenai/SciRIFF
 
 
 
 
 
 
 
 
 
 
 
 
23
  ---
24
 
25
  # SDLM-3B-D4
26
 
27
- [\[📂 GitHub\]](https://github.com/OpenGVLab/SDLM) [\[📜 Tech Report\]](https://arxiv.org/abs/2509.24007) [\[🤗 HuggingFace\]](https://huggingface.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552)
 
 
28
 
29
  ## Introduction
30
 
31
- We propose a <b>S</b>equential <b>D</b>iffusion <b>L</b>anguage <b>M</b>odel (<b>SDLM</b>), to cheaply stimulate the parallel prediction capabilities of diffusion models. Specifically, SDLM reduces distribution shift by limiting the prediction range to a fixed block length and enforces decoding order through the longest prefix decoding method, thereby significantly improving prediction efficiency while ensuring generation quality. Our method can be viewed as a further generalization of the autoregressive (AR) paradigm. Therefore, it is possible to use pre-trained AR weights and quickly migrate to the diffusion framework with only minimal instruction fine-tuning.
32
 
33
  ![image/png](https://huggingface.co/OpenGVLab/SDLM-32B-D4/resolve/main/assets/three_framework.png)
34
 
@@ -46,8 +48,8 @@ In the following table, we provide an overview of the SDLM series.
46
 
47
  We propose a sequential blockwise masked prediction method that reduces error accumulation in diffusion-based generation. Our method leverages the observation that predictions for tokens at lower positional indices typically benefit from more reliable contextual information, resulting in lower deviation and improved accuracy.
48
 
49
- * **(a) Training pipeline.** Reordered input enables structured mask with causal prefix (top-left), visible cross-block prefix (bottom-left), and intra-block bidirectional attention (bottom-right).
50
- * **(b) Sampling Pipeline.** Confidence-based dynamic block decoding with KV cache reuse. At each step, a block of B tokens is predicted with B-1 padding masks. The longest high-confidence prefix is selected as dynamic output. Cached KV states enable efficient decoding.
51
 
52
  ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/framework.png)
53
 
@@ -75,68 +77,68 @@ Trade-off between performance and speed under different confidence thresholds τ
75
 
76
  ## Inference
77
 
78
- 1. Install Dependencies
79
-
80
- Key package versions:
81
-
82
- ```
83
- transformers==4.37.2
84
- torch>=2.5.0
85
- ```
86
-
87
- 2. Download the model generation script [sdlm_inference.py](https://github.com/OpenGVLab/SDLM/blob/main/sdlm_inference.py) to your working directory.
88
-
89
- 3. We provide an example code to run `SDLM-3B-D4` using `transformers`.
90
-
91
- ```python
92
- import torch
93
- from transformers import AutoModelForCausalLM, AutoTokenizer
94
- from sdlm_inference import SDLM_generate
95
-
96
- if __name__ == "__main__":
97
- ckpt_hf = 'OpenGVLab/SDLM-3B-D4'
98
-
99
- model = AutoModelForCausalLM.from_pretrained(
100
- ckpt_hf,
101
- attn_implementation="eager",
102
- trust_remote_code=True
103
- ).to(dtype=torch.float16)
104
- tokenizer = AutoTokenizer.from_pretrained(ckpt_hf)
105
-
106
- prompt = 'Write a Fibonacci function in Python.'
107
- messages = [
108
- {"role": "system", "content": "You are a helpful assistant."},
109
- {"role": "user", "content": prompt}
110
- ]
111
- text = tokenizer.apply_chat_template(
112
- messages,
113
- tokenize=False,
114
- add_generation_prompt=True
115
- )
116
-
117
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
118
-
119
- response, history = SDLM_generate(
120
- model,
121
- tokenizer,
122
- model_inputs,
123
- max_gen_len = 1024,
124
- temperature = 0,
125
- threshold = 0.5,
126
- n_future_tokens = 4,
127
- alg = 'prob_conf', # prob_conf | entropy_conf | self_speculative
128
- save_history = True,
129
- use_cache = True
130
- )
131
-
132
- print('response: ', response[0])
133
-
134
- print('=======histroy')
135
- for item in history:
136
- print('cur total token ', item[1])
137
- print(item[0][0])
138
- print('--------')
139
- ```
140
 
141
 
142
 
@@ -151,4 +153,4 @@ If you find this project useful in your research, please consider citing:
151
  journal={arXiv preprint arXiv:2509.24007},
152
  year={2025}
153
  }
154
- ```
 
1
  ---
 
 
 
 
 
2
  base_model:
3
  - Qwen/Qwen2.5-3B
 
 
 
 
 
 
 
4
  datasets:
5
  - dyyyyyyyy/ScaleQuest-Math
6
  - OpenCoder-LLM/opc-sft-stage2
 
8
  - HuggingFaceTB/smoltalk2
9
  - LipengCS/Table-GPT
10
  - allenai/SciRIFF
11
+ language:
12
+ - en
13
+ library_name: transformers
14
+ license: apache-2.0
15
+ license_name: qwen
16
+ license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE
17
+ pipeline_tag: text-generation
18
+ tags:
19
+ - sdlm
20
+ - diffusion language model
21
+ - custom_code
22
+ base_model_relation: finetune
23
  ---
24
 
25
  # SDLM-3B-D4
26
 
27
+ This model repository contains the SDLM-3B-D4 model, as presented in the paper [Sequential Diffusion Language Models](https://huggingface.co/papers/2509.24007).
28
+
29
+ [\[📂 GitHub\]](https://github.com/OpenGVLab/SDLM) [\[📜 Tech Report\]](https://arxiv.org/abs/2509.24007) [\[🚀 Project Page\]](https://internvl.github.io/blog/2025-09-29-SDLM/) [\[🤗 HuggingFace\]](https://huggingface.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552)
30
 
31
  ## Introduction
32
 
33
+ We propose a **S**equential **D**iffusion **L**anguage **M**odel (**SDLM**), to cheaply stimulate the parallel prediction capabilities of diffusion models. Specifically, SDLM reduces distribution shift by limiting the prediction range to a fixed block length and enforces decoding order through the longest prefix decoding method, thereby significantly improving prediction efficiency while ensuring generation quality. Our method can be viewed as a further generalization of the autoregressive (AR) paradigm. Therefore, it is possible to use pre-trained AR weights and quickly migrate to the diffusion framework with only minimal instruction fine-tuning.
34
 
35
  ![image/png](https://huggingface.co/OpenGVLab/SDLM-32B-D4/resolve/main/assets/three_framework.png)
36
 
 
48
 
49
  We propose a sequential blockwise masked prediction method that reduces error accumulation in diffusion-based generation. Our method leverages the observation that predictions for tokens at lower positional indices typically benefit from more reliable contextual information, resulting in lower deviation and improved accuracy.
50
 
51
+ * **(a) Training pipeline.** Reordered input enables structured mask with causal prefix (top-left), visible cross-block prefix (bottom-left), and intra-block bidirectional attention (bottom-right).
52
+ * **(b) Sampling Pipeline.** Confidence-based dynamic block decoding with KV cache reuse. At each step, a block of B tokens is predicted with B-1 padding masks. The longest high-confidence prefix is selected as dynamic output. Cached KV states enable efficient decoding.
53
 
54
  ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/framework.png)
55
 
 
77
 
78
  ## Inference
79
 
80
+ 1. Install Dependencies
81
+
82
+ Key package versions:
83
+
84
+ ```
85
+ transformers==4.37.2
86
+ torch>=2.5.0
87
+ ```
88
+
89
+ 2. Download the model generation script [sdlm_inference.py](https://github.com/OpenGVLab/SDLM/blob/main/sdlm_inference.py) to your working directory.
90
+
91
+ 3. We provide an example code to run `SDLM-3B-D4` using `transformers`.
92
+
93
+ ```python
94
+ import torch
95
+ from transformers import AutoModelForCausalLM, AutoTokenizer
96
+ from sdlm_inference import SDLM_generate
97
+
98
+ if __name__ == "__main__":
99
+ ckpt_hf = 'OpenGVLab/SDLM-3B-D4'
100
+
101
+ model = AutoModelForCausalLM.from_pretrained(
102
+ ckpt_hf,
103
+ attn_implementation="eager",
104
+ trust_remote_code=True
105
+ ).to(dtype=torch.float16)
106
+ tokenizer = AutoTokenizer.from_pretrained(ckpt_hf)
107
+
108
+ prompt = 'Write a Fibonacci function in Python.'
109
+ messages = [
110
+ {"role": "system", "content": "You are a helpful assistant."},
111
+ {"role": "user", "content": prompt}
112
+ ]
113
+ text = tokenizer.apply_chat_template(
114
+ messages,
115
+ tokenize=False,
116
+ add_generation_prompt=True
117
+ )
118
+
119
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
120
+
121
+ response, history = SDLM_generate(
122
+ model,
123
+ tokenizer,
124
+ model_inputs,
125
+ max_gen_len = 1024,
126
+ temperature = 0,
127
+ threshold = 0.5,
128
+ n_future_tokens = 4,
129
+ alg = 'prob_conf', # prob_conf | entropy_conf | self_speculative
130
+ save_history = True,
131
+ use_cache = True
132
+ )
133
+
134
+ print('response: ', response[0])
135
+
136
+ print('=======histroy')
137
+ for item in history:
138
+ print('cur total token ', item[1])
139
+ print(item[0][0])
140
+ print('--------')
141
+ ```
142
 
143
 
144
 
 
153
  journal={arXiv preprint arXiv:2509.24007},
154
  year={2025}
155
  }
156
+ ```