Add project page link and introductory sentence to model card

This PR improves the model card by:
- Adding an introductory sentence to directly link to the paper, [Sequential Diffusion Language Models](https://huggingface.co/papers/2509.24007), for better discoverability.
- Including an explicit link to the project page: [https://internvl.github.io/blog/2025-09-29-SDLM/](https://internvl.github.io/blog/2025-09-29-SDLM/) in the header links.

The metadata, existing GitHub and arXiv links, and usage examples remain unchanged as they are already complete and accurate.

Files changed (1) hide show

README.md +81 -79

README.md CHANGED Viewed

@@ -1,18 +1,6 @@
 ---
-license: apache-2.0
-license_name: qwen
-license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE
-pipeline_tag: text-generation
-library_name: transformers
 base_model:
 - Qwen/Qwen2.5-3B
-base_model_relation: finetune
-language:
-- en
-tags:
-- sdlm
-- diffusion language model
-- custom_code
 datasets:
 - dyyyyyyyy/ScaleQuest-Math
 - OpenCoder-LLM/opc-sft-stage2
@@ -20,15 +8,29 @@ datasets:
 - HuggingFaceTB/smoltalk2
 - LipengCS/Table-GPT
 - allenai/SciRIFF
 ---
 # SDLM-3B-D4
-[\[📂 GitHub\]](https://github.com/OpenGVLab/SDLM)  [\[📜 Tech Report\]](https://arxiv.org/abs/2509.24007)  [\[🤗 HuggingFace\]](https://huggingface.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552)
 ## Introduction
-We propose a <b>S</b>equential <b>D</b>iffusion <b>L</b>anguage <b>M</b>odel (<b>SDLM</b>), to cheaply stimulate the parallel prediction capabilities of diffusion models. Specifically, SDLM reduces distribution shift by limiting the prediction range to a fixed block length and enforces decoding order through the longest prefix decoding method, thereby significantly improving prediction efficiency while ensuring generation quality. Our method can be viewed as a further generalization of the autoregressive (AR) paradigm. Therefore, it is possible to use pre-trained AR weights and quickly migrate to the diffusion framework with only minimal instruction fine-tuning.
 ![image/png](https://huggingface.co/OpenGVLab/SDLM-32B-D4/resolve/main/assets/three_framework.png)
@@ -46,8 +48,8 @@ In the following table, we provide an overview of the SDLM series.
 We propose a sequential blockwise masked prediction method that reduces error accumulation in diffusion-based generation. Our method leverages the observation that predictions for tokens at lower positional indices typically benefit from more reliable contextual information, resulting in lower deviation and improved accuracy.
-* **(a) Training pipeline.** Reordered input enables structured mask with causal prefix (top-left), visible cross-block prefix (bottom-left), and intra-block bidirectional attention (bottom-right).
-* **(b) Sampling Pipeline.** Confidence-based dynamic block decoding with KV cache reuse. At each step, a block of B tokens is predicted with B-1 padding masks. The longest high-confidence prefix is selected as dynamic output. Cached KV states enable efficient decoding.
 ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/framework.png)
@@ -75,68 +77,68 @@ Trade-off between performance and speed under different confidence thresholds τ
 ## Inference
-1. Install Dependencies
-   Key package versions:
-   ```
-   transformers==4.37.2
-   torch>=2.5.0
-   ```
-2. Download the model generation script [sdlm_inference.py](https://github.com/OpenGVLab/SDLM/blob/main/sdlm_inference.py) to your working directory.
-3. We provide an example code to run `SDLM-3B-D4` using `transformers`.
-   ```python
-   import torch
-   from transformers import AutoModelForCausalLM, AutoTokenizer
-   from sdlm_inference import SDLM_generate
-   if __name__ == "__main__":
-       ckpt_hf = 'OpenGVLab/SDLM-3B-D4'
-       model = AutoModelForCausalLM.from_pretrained(
-           ckpt_hf,
-           attn_implementation="eager",
-           trust_remote_code=True
-       ).to(dtype=torch.float16)
-       tokenizer = AutoTokenizer.from_pretrained(ckpt_hf)
-       prompt = 'Write a Fibonacci function in Python.'
-       messages = [
-           {"role": "system", "content": "You are a helpful assistant."},
-           {"role": "user", "content": prompt}
-       ]
-       text = tokenizer.apply_chat_template(
-           messages,
-           tokenize=False,
-           add_generation_prompt=True
-       )
-       model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
-       response, history = SDLM_generate(
-           model,
-           tokenizer,
-           model_inputs,
-           max_gen_len = 1024,
-           temperature = 0,
-           threshold = 0.5,
-           n_future_tokens = 4,
-           alg = 'prob_conf', #  prob_conf | entropy_conf | self_speculative
-           save_history = True,
-           use_cache = True
-       )
-       print('response: ', response[0])
-       print('=======histroy')
-       for item in history:
-           print('cur total token ', item[1])
-           print(item[0][0])
-           print('--------')
-   ```
@@ -151,4 +153,4 @@ If you find this project useful in your research, please consider citing:
   journal={arXiv preprint arXiv:2509.24007},
   year={2025}
 }
-```

 ---
 base_model:
 - Qwen/Qwen2.5-3B
 datasets:
 - dyyyyyyyy/ScaleQuest-Math
 - OpenCoder-LLM/opc-sft-stage2
 - HuggingFaceTB/smoltalk2
 - LipengCS/Table-GPT
 - allenai/SciRIFF
+language:
+- en
+library_name: transformers
+license: apache-2.0
+license_name: qwen
+license_link: https://huggingface.co/Qwen/Qwen2.5-3B/blob/main/LICENSE
+pipeline_tag: text-generation
+tags:
+- sdlm
+- diffusion language model
+- custom_code
+base_model_relation: finetune
 ---
 # SDLM-3B-D4
+This model repository contains the SDLM-3B-D4 model, as presented in the paper [Sequential Diffusion Language Models](https://huggingface.co/papers/2509.24007).
+[\[📂 GitHub\]](https://github.com/OpenGVLab/SDLM)  [\[📜 Tech Report\]](https://arxiv.org/abs/2509.24007)  [\[🚀 Project Page\]](https://internvl.github.io/blog/2025-09-29-SDLM/)  [\[🤗 HuggingFace\]](https://huggingface.co/collections/OpenGVLab/sdlm-68ac82709d7c343ad36aa552)
 ## Introduction
+We propose a **S**equential **D**iffusion **L**anguage **M**odel (**SDLM**), to cheaply stimulate the parallel prediction capabilities of diffusion models. Specifically, SDLM reduces distribution shift by limiting the prediction range to a fixed block length and enforces decoding order through the longest prefix decoding method, thereby significantly improving prediction efficiency while ensuring generation quality. Our method can be viewed as a further generalization of the autoregressive (AR) paradigm. Therefore, it is possible to use pre-trained AR weights and quickly migrate to the diffusion framework with only minimal instruction fine-tuning.
 ![image/png](https://huggingface.co/OpenGVLab/SDLM-32B-D4/resolve/main/assets/three_framework.png)
 We propose a sequential blockwise masked prediction method that reduces error accumulation in diffusion-based generation. Our method leverages the observation that predictions for tokens at lower positional indices typically benefit from more reliable contextual information, resulting in lower deviation and improved accuracy.
+*   **(a) Training pipeline.** Reordered input enables structured mask with causal prefix (top-left), visible cross-block prefix (bottom-left), and intra-block bidirectional attention (bottom-right).
+*   **(b) Sampling Pipeline.** Confidence-based dynamic block decoding with KV cache reuse. At each step, a block of B tokens is predicted with B-1 padding masks. The longest high-confidence prefix is selected as dynamic output. Cached KV states enable efficient decoding.
 ![image/png](https://huggingface.co/OpenGVLab/SDLM-3B-D4/resolve/main/assets/framework.png)
 ## Inference
+1.  Install Dependencies
+    Key package versions:
+    ```
+    transformers==4.37.2
+    torch>=2.5.0
+    ```
+2.  Download the model generation script [sdlm_inference.py](https://github.com/OpenGVLab/SDLM/blob/main/sdlm_inference.py) to your working directory.
+3.  We provide an example code to run `SDLM-3B-D4` using `transformers`.
+    ```python
+    import torch
+    from transformers import AutoModelForCausalLM, AutoTokenizer
+    from sdlm_inference import SDLM_generate
+    if __name__ == "__main__":
+        ckpt_hf = 'OpenGVLab/SDLM-3B-D4'
+        model = AutoModelForCausalLM.from_pretrained(
+            ckpt_hf,
+            attn_implementation="eager",
+            trust_remote_code=True
+        ).to(dtype=torch.float16)
+        tokenizer = AutoTokenizer.from_pretrained(ckpt_hf)
+        prompt = 'Write a Fibonacci function in Python.'
+        messages = [
+            {"role": "system", "content": "You are a helpful assistant."},
+            {"role": "user", "content": prompt}
+        ]
+        text = tokenizer.apply_chat_template(
+            messages,
+            tokenize=False,
+            add_generation_prompt=True
+        )
+        model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+        response, history = SDLM_generate(
+            model,
+            tokenizer,
+            model_inputs,
+            max_gen_len = 1024,
+            temperature = 0,
+            threshold = 0.5,
+            n_future_tokens = 4,
+            alg = 'prob_conf', #  prob_conf | entropy_conf | self_speculative
+            save_history = True,
+            use_cache = True
+        )
+        print('response: ', response[0])
+        print('=======histroy')
+        for item in history:
+            print('cur total token ', item[1])
+            print(item[0][0])
+            print('--------')
+    ```
   journal={arXiv preprint arXiv:2509.24007},
   year={2025}
 }
+```