lll2343 commited on
Commit
79fbdab
·
verified ·
1 Parent(s): a4ef5eb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +154 -3
README.md CHANGED
@@ -1,3 +1,154 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ license_name: qwen
4
+ license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE
5
+ pipeline_tag: text-generation
6
+ library_name: transformers
7
+ base_model:
8
+ - Qwen/Qwen2.5-3B
9
+ base_model_relation: finetune
10
+ language:
11
+ - en
12
+ tags:
13
+ - sdlm
14
+ - diffusion language model
15
+ - custom_code
16
+ datasets:
17
+ - dyyyyyyyy/ScaleQuest-Math
18
+ - OpenCoder-LLM/opc-sft-stage2
19
+ - allenai/tulu-3-sft-mixture
20
+ - HuggingFaceTB/smoltalk2
21
+ - LipengCS/Table-GPT
22
+ - allenai/SciRIFF
23
+ ---
24
+
25
+ # SDLM-3B-D4
26
+
27
+ [\[📂 GitHub\]](https://github.com/OpenGVLab/SDLM) [\[📜 Tech Report\]](https://huggingface.co/papers/xxx) [\[🤗 HuggingFace\]](https://huggingface.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552)
28
+
29
+ ## Introduction
30
+
31
+ We propose a <b>S</b>equential <b>D</b>iffusion <b>L</b>anguage <b>M</b>odel (<b>SDLM</b>), to cheaply stimulate the parallel prediction capabilities of diffusion models. Specifically, SDLM reduces distribution shift by limiting the prediction range to a fixed block length and enforces decoding order through the longest prefix decoding method, thereby significantly improving prediction efficiency while ensuring generation quality. Our method can be viewed as a further generalization of the autoregressive (AR) paradigm. Therefore, it is possible to use pre-trained AR weights and quickly migrate to the diffusion framework with only minimal instruction fine-tuning.
32
+
33
+ ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/three_framework.png)
34
+
35
+ ## SDLM Family
36
+
37
+ In the following table, we provide an overview of the SDLM series.
38
+
39
+ | Model Name | Base Model 🤗 | HF Link 🤗 |
40
+ | ----------- | ------------------------------------------------------------ | -------------------------------------------- |
41
+ | SDLM-3B-D4 | <a href="https://huggingface.co/Qwen/Qwen2.5-3B">Qwen2.5-3B</a> | https://huggingface.co/OpenGVLab/SDLM-3B-D4 |
42
+ | SDLM-3B-D8 | <a href="https://huggingface.co/Qwen/Qwen2.5-3B">Qwen2.5-3B</a> | https://huggingface.co/OpenGVLab/SDLM-3B-D8 |
43
+ | SDLM-32B-D4 | <a href="https://huggingface.co/Qwen/Qwen2.5-32B">Qwen2.5-32B</a> | https://huggingface.co/OpenGVLab/SDLM-32B-D4 |
44
+
45
+ ## Model Architecture
46
+
47
+ We propose a sequential blockwise masked prediction method that reduces error accumulation in diffusion-based generation. Our method leverages the observation that predictions for tokens at lower positional indices typically benefit from more reliable contextual information, resulting in lower deviation and improved accuracy.
48
+
49
+ * **(a) Training pipeline.** Reordered input enables structured mask with causal prefix (top-left), visible cross-block prefix (bottom-left), and intra-block bidirectional attention (bottom-right).
50
+ * **(b) Sampling Pipeline.** Confidence-based dynamic block decoding with KV cache reuse. At each step, a block of B tokens is predicted with B-1 padding masks. The longest high-confidence prefix is selected as dynamic output. Cached KV states enable efficient decoding.
51
+
52
+ ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/framework.png)
53
+
54
+ ## Performance
55
+
56
+ ### Long-Form Benchmarks
57
+
58
+ SDLM delivers strong performance with significantly faster decoding speed. It operates approximately 2x faster than comparable autoregressive models while matching their accuracy, and achieves up to 5x speedup over other diffusion language models, as evidenced by results on the MATH-500 benchmark.
59
+
60
+ ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/main_exp1.png)
61
+
62
+ ### General Mutiple-Choice Benchmarks
63
+
64
+ ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/main_exp2.png)
65
+
66
+ ### Block Size & Self-Speculative Decoding
67
+
68
+ ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/self_speculative_decoding.png)
69
+
70
+ ## Trade-off Between Performance and Speed
71
+
72
+ Trade-off between performance and speed under different confidence thresholds τ for SDLM-3B (B=4) and SDLM-3B (B=8). By adjusting τ, a controllable trade-off between speed and performance can be achieved. SpeedUp denotes the average number of tokens output per forward pass.
73
+
74
+ ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/ablation_tau.png)
75
+
76
+ ## Inference
77
+
78
+ 1. Install Dependencies
79
+
80
+ Key package versions:
81
+
82
+ ```
83
+ transformers==4.37.2
84
+ torch>=2.5.0
85
+ ```
86
+
87
+ 2. Download the model generation script [sdlm_inference.py](https://github.com/OpenGVLab/SDLM/blob/main/sdlm_inference.py) to your working directory.
88
+
89
+ 3. We provide an example code to run `SDLM-3B-D4` using `transformers`.
90
+
91
+ ```python
92
+ import torch
93
+ from transformers import AutoModelForCausalLM, AutoTokenizer
94
+ from sdlm_inference import SDLM_generate
95
+
96
+ if __name__ == "__main__":
97
+ ckpt_hf = 'OpenGVLab/SDLM-3B-D4'
98
+
99
+ model = AutoModelForCausalLM.from_pretrained(
100
+ ckpt_hf,
101
+ attn_implementation="eager",
102
+ trust_remote_code=True
103
+ ).to(dtype=torch.float16)
104
+ tokenizer = AutoTokenizer.from_pretrained(ckpt_hf)
105
+
106
+ prompt = 'Write a Fibonacci function in Python.'
107
+ messages = [
108
+ {"role": "system", "content": "You are a helpful assistant."},
109
+ {"role": "user", "content": prompt}
110
+ ]
111
+ text = tokenizer.apply_chat_template(
112
+ messages,
113
+ tokenize=False,
114
+ add_generation_prompt=True
115
+ )
116
+
117
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
118
+
119
+ response, history = SDLM_generate(
120
+ model,
121
+ tokenizer,
122
+ model_inputs,
123
+ max_gen_len = 1024,
124
+ temperature = 0,
125
+ threshold = 0.5,
126
+ n_future_tokens = 4,
127
+ alg = 'prob_conf', # prob_conf | entropy_conf | self_speculative
128
+ save_history = True,
129
+ use_cache = True
130
+ )
131
+
132
+ print('response: ', response[0])
133
+
134
+ print('=======histroy')
135
+ for item in history:
136
+ print('cur total token ', item[1])
137
+ print(item[0][0])
138
+ print('--------')
139
+ ```
140
+
141
+
142
+
143
+ ## Citation
144
+
145
+ If you find this project useful in your research, please consider citing:
146
+
147
+ ```BibTeX
148
+ @article{SDLM,
149
+ title={Sequential Diffusion Language Models},
150
+ author={},
151
+ journal={arXiv preprint arXiv:2025.xxxxx},
152
+ year={2025}
153
+ }
154
+ ```