Upload 5 files
Browse files- README.md +740 -0
- bias.md +10 -0
- explainability.md +14 -0
- privacy.md +13 -0
- safety.md +9 -0
README.md
ADDED
@@ -0,0 +1,740 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# NVIDIA-Nemotron-Nano-9B-v2
|
2 |
+
|
3 |
+
**Model Developer:** NVIDIA Corporation
|
4 |
+
|
5 |
+
**Model Dates:**
|
6 |
+
|
7 |
+
June 2025 \- August 2025
|
8 |
+
|
9 |
+
**Data Freshness:**
|
10 |
+
|
11 |
+
September 2024
|
12 |
+
|
13 |
+
The pretraining data has a cutoff date of September 2024.
|
14 |
+
|
15 |
+
## Model Overview
|
16 |
+
|
17 |
+
NVIDIA-Nemotron-Nano-9B-v2 is a large language model (LLM) trained from scratch by NVIDIA, and designed as a unified model for both reasoning and non-reasoning tasks. It responds to user queries and tasks by first generating a reasoning trace and then concluding with a final response. The model's reasoning capabilities can be controlled via a system prompt. If the user prefers the model to provide its final answer without intermediate reasoning traces, it can be configured to do so, albeit with a slight decrease in accuracy for harder prompts that require reasoning. Conversely, allowing the model to generate reasoning traces first generally results in higher-quality final solutions to queries and tasks.
|
18 |
+
|
19 |
+
The model uses a hybrid architecture consisting primarily of Mamba-2 and MLP layers combined with just four Attention layers. For the architecture, please refer to the [Nemotron-H tech report](https://arxiv.org/abs/2504.03624).
|
20 |
+
|
21 |
+
The supported languages include: English, German, Spanish, French, Italian, and Japanese. Improved using Qwen.
|
22 |
+
|
23 |
+
This model is ready for commercial use.
|
24 |
+
|
25 |
+
## Evaluation Results
|
26 |
+
|
27 |
+
#### Benchmark Results (Reasoning On)
|
28 |
+
|
29 |
+
We evaluated our model in \*\*Reasoning-On\*\* mode across all benchmarks.
|
30 |
+
|
31 |
+
| Benchmark | NVIDIA-Nemotron-Nano-9B-v2 |
|
32 |
+
| :---- | ----- |
|
33 |
+
| AIME25 | 72.1% |
|
34 |
+
| MATH500 | 97.8% |
|
35 |
+
| GPQA | 64.0% |
|
36 |
+
| LCB | 71.1% |
|
37 |
+
| BFCL v3 | 66.9% |
|
38 |
+
| IFEVAL-Prompt | 85.4% |
|
39 |
+
| IFEVAL-Instruction | 90.3% |
|
40 |
+
|
41 |
+
All evaluations were done using [NeMo-Skills](https://github.com/NVIDIA/NeMo-Skills/tree/main/docs).
|
42 |
+
|
43 |
+
### Reasoning Budget Control
|
44 |
+
|
45 |
+
This model supports runtime “thinking” budget control. During inference, the user can specify how many tokens the model is allowed to "think".
|
46 |
+
![][./acc-vs-budget.png]
|
47 |
+
|
48 |
+
## License/Terms of Use
|
49 |
+
|
50 |
+
GOVERNING TERMS: This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/).
|
51 |
+
|
52 |
+
## Model Architecture
|
53 |
+
|
54 |
+
- Architecture Type: Mamba2-Transformer Hybrid
|
55 |
+
- Network Architecture: Nemotron-Hybrid
|
56 |
+
|
57 |
+
### Deployment Geography: Global
|
58 |
+
|
59 |
+
### Use Case
|
60 |
+
|
61 |
+
NVIDIA-Nemotron-Nano-9B-v2 is a general purpose reasoning and chat model intended to be used in English and coding languages. Other non-English languages (German, French, Italian, Spanish and Japanese) are also supported. Developers designing AI Agent systems, chatbots, RAG systems, and other AI-powered applications. Also suitable for typical instruction-following tasks.
|
62 |
+
|
63 |
+
### Release Date: 08/18/2025
|
64 |
+
|
65 |
+
Huggingface 08/18/2025 via [https://huggingface.co/](https://huggingface.co/)
|
66 |
+
API Catalog 08/18/2025 via [https://catalog.ngc.nvidia.com/models](https://catalog.ngc.nvidia.com/models)
|
67 |
+
|
68 |
+
## References
|
69 |
+
|
70 |
+
- [NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model](https://research.nvidia.com/labs/adlr/files/NVIDIA-Nemotron-Nano-2-Technical-Report.pdf)
|
71 |
+
|
72 |
+
## Computational Load
|
73 |
+
|
74 |
+
Cumulative compute : 1.53E+24 FLOPS
|
75 |
+
|
76 |
+
Estimate energy and emissions for model training: 747.6 MWh
|
77 |
+
|
78 |
+
| | \# of tokens | Compute \[FLOPS\] | Energy \[MWh\] |
|
79 |
+
| :---- | :---- | :---- | :---- |
|
80 |
+
| 12B Base Pre-training | 20T | 1.45E+24 | 708.3 |
|
81 |
+
| 12B Post-training | 1T | 7.25E+22 | 35.6 |
|
82 |
+
| 9B Pruning & Distillation | 142B | 7.72E+21 | 3.7 |
|
83 |
+
| Total | 21.1T | 1.53E+24 | 747.6 |
|
84 |
+
|
85 |
+
## Input
|
86 |
+
|
87 |
+
- Input Type(s): Text
|
88 |
+
- Input Format(s): String
|
89 |
+
- Input Parameters: One-Dimensional (1D): Sequences
|
90 |
+
- Other Properties Related to Input: Context length up to 128K. Supported languages include German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English.
|
91 |
+
|
92 |
+
## Output
|
93 |
+
|
94 |
+
- Output Type(s): Text
|
95 |
+
- Output Format: String
|
96 |
+
- Output Parameters: One-Dimensional (1D): Sequences up to 128K
|
97 |
+
|
98 |
+
Our models are designed and optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
|
99 |
+
|
100 |
+
## Software Integration
|
101 |
+
|
102 |
+
- Runtime Engine(s): NeMo 25.07.nemotron-nano-v2
|
103 |
+
- Supported Hardware Microarchitecture Compatibility: NVIDIA A10G, NVIDIA H100-80GB, NVIDIA A100
|
104 |
+
- Operating System(s): Linux
|
105 |
+
|
106 |
+
### **Use it with Transformers**
|
107 |
+
|
108 |
+
The snippet below shows how to use this model with Huggingface Transformers (tested on version 4.48.3).
|
109 |
+
|
110 |
+
```
|
111 |
+
import torch
|
112 |
+
from transformers import AutoTokenizer, AutoModelForCausalLM
|
113 |
+
|
114 |
+
# Load tokenizer and model
|
115 |
+
tokenizer = AutoTokenizer.from_pretrained("nvidia/NVIDIA-Nemotron-Nano-9B-v2")
|
116 |
+
model = AutoModelForCausalLM.from_pretrained(
|
117 |
+
"nvidia/NVIDIA-Nemotron-Nano-9B-v2",
|
118 |
+
torch_dtype=torch.bfloat16,
|
119 |
+
trust_remote_code=True,
|
120 |
+
device_map="auto"
|
121 |
+
)
|
122 |
+
```
|
123 |
+
|
124 |
+
Case 1: `/think` or no reasoning signal is provided in the system prompt, reasoning will be set to `True`
|
125 |
+
|
126 |
+
```
|
127 |
+
messages = [
|
128 |
+
{"role": "system", "content": "/think"},
|
129 |
+
{"role": "user", "content": "Write a haiku about GPUs"},
|
130 |
+
]
|
131 |
+
```
|
132 |
+
|
133 |
+
Case 2: `/no_think` is provided, reasoning will be set to `False`
|
134 |
+
|
135 |
+
```
|
136 |
+
messages = [
|
137 |
+
{"role": "system", "content": "/no_think"},
|
138 |
+
{"role": "user", "content": "Write a haiku about GPUs"},
|
139 |
+
]
|
140 |
+
```
|
141 |
+
|
142 |
+
Note: `/think` or `/no_think` keywords can also be provided in “user” messages for turn-level reasoning control.
|
143 |
+
|
144 |
+
The rest of the inference snippet remains the same
|
145 |
+
|
146 |
+
```
|
147 |
+
tokenized_chat = tokenizer.apply_chat_template(
|
148 |
+
messages,
|
149 |
+
tokenize=True,
|
150 |
+
add_generation_prompt=True,
|
151 |
+
return_tensors="pt"
|
152 |
+
).to(model.device)
|
153 |
+
|
154 |
+
outputs = model.generate(
|
155 |
+
tokenized_chat,
|
156 |
+
max_new_tokens=32,
|
157 |
+
eos_token_id=tokenizer.eos_token_id
|
158 |
+
)
|
159 |
+
print(tokenizer.decode(outputs[0]))
|
160 |
+
```
|
161 |
+
|
162 |
+
We recommend setting `temperature` to `0.6`, `top_p` to `0.95` for reasoning True and greedy search for reasoning False, and increase `max_new_tokens` to `1024` or higher for reasoning True.
|
163 |
+
|
164 |
+
### **Use it with TRT-LLM**
|
165 |
+
|
166 |
+
The snippet below shows how to use this model with TRT-LLM. We tested this on the following [commit](https://github.com/NVIDIA/TensorRT-LLM/tree/46c5a564446673cdd0f56bcda938d53025b6d04e) and followed these [instructions](https://github.com/NVIDIA/TensorRT-LLM/blob/46c5a564446673cdd0f56bcda938d53025b6d04e/docs/source/installation/build-from-source-linux.md#option-2-build-tensorrt-llm-step-by-step) to build and install TRT-LLM in a docker container.
|
167 |
+
|
168 |
+
```
|
169 |
+
from tensorrt_llm import SamplingParams
|
170 |
+
from tensorrt_llm._torch import LLM
|
171 |
+
from tensorrt_llm._torch.pyexecutor.config import PyTorchConfig
|
172 |
+
from tensorrt_llm.llmapi import KvCacheConfig
|
173 |
+
from transformers import AutoTokenizer
|
174 |
+
pytorch_config = PyTorchConfig(
|
175 |
+
disable_overlap_scheduler=True, enable_trtllm_decoder=True
|
176 |
+
)
|
177 |
+
kv_cache_config = KvCacheConfig(
|
178 |
+
enable_block_reuse=False,
|
179 |
+
)
|
180 |
+
```
|
181 |
+
|
182 |
+
```
|
183 |
+
model_id = "nvidia/NVIDIA-Nemotron-Nano-9B-v2"
|
184 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
185 |
+
|
186 |
+
llm = LLM(
|
187 |
+
model=model_id,
|
188 |
+
max_seq_len=32678,
|
189 |
+
max_batch_size=4,
|
190 |
+
pytorch_backend_config=pytorch_config,
|
191 |
+
kv_cache_config=kv_cache_config,
|
192 |
+
tensor_parallel_size=8,
|
193 |
+
)
|
194 |
+
messages = [
|
195 |
+
{"role": "system", "content": "/think"},
|
196 |
+
{"role": "user", "content": "Write a haiku about GPUs"},
|
197 |
+
]
|
198 |
+
prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
199 |
+
sampling_params = SamplingParams(
|
200 |
+
max_tokens=512,
|
201 |
+
temperature=0.6,
|
202 |
+
top_p=0.95,
|
203 |
+
add_special_tokens=False,
|
204 |
+
)
|
205 |
+
outputs = llm.generate([prompt], sampling_params)
|
206 |
+
print(outputs[0].outputs[0].text)
|
207 |
+
```
|
208 |
+
|
209 |
+
### **Use it with vLLM**
|
210 |
+
|
211 |
+
The snippet below shows how to use this model with vLLM. Use the following [commit](https://github.com/vllm-project/vllm/commit/75531a6c134282f940c86461b3c40996b4136793) and follow these instructions to build and install vLLM in a docker container.
|
212 |
+
|
213 |
+
```shell
|
214 |
+
# use full commit hash from the main branch
|
215 |
+
export VLLM_COMMIT=75531a6c134282f940c86461b3c40996b4136793
|
216 |
+
uv pip install vllm --extra-index-url https://wheels.vllm.ai/${VLLM_COMMIT}
|
217 |
+
```
|
218 |
+
|
219 |
+
Now you can run run the server with:
|
220 |
+
|
221 |
+
```shell
|
222 |
+
vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
|
223 |
+
--trust-remote-code \
|
224 |
+
--mamba_ssm_cache_dtype float32
|
225 |
+
```
|
226 |
+
|
227 |
+
Note: Remember to add \`--mamba\_ssm\_cache\_dtype float32\` for accurate quality. Without this option, the model’s accuracy may degrade.
|
228 |
+
|
229 |
+
#### Using Budget Control with a vLLM Server
|
230 |
+
|
231 |
+
The thinking budget allows developers to keep accuracy high and meet response‑time targets \- which is especially crucial for customer support, autonomous agent steps, and edge devices where every millisecond counts.
|
232 |
+
|
233 |
+
With budget control, you can set a limit for internal reasoning:
|
234 |
+
|
235 |
+
* `max_thinking_tokens`: This is a threshold that will attempt to end the reasoning trace at the next newline encountered in the reasoning trace. If no newline is encountered within 500 tokens, it will abruptly end the reasoning trace at \`max\_thinking\_tokens \+ 500\`.
|
236 |
+
|
237 |
+
Start a vLLM server:
|
238 |
+
|
239 |
+
```shell
|
240 |
+
vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
|
241 |
+
--trust-remote-code \
|
242 |
+
--mamba_ssm_cache_dtype float32
|
243 |
+
```
|
244 |
+
|
245 |
+
Client for supporting budget control:
|
246 |
+
|
247 |
+
```py
|
248 |
+
from typing import Any, Dict, List
|
249 |
+
|
250 |
+
import openai
|
251 |
+
from transformers import AutoTokenizer
|
252 |
+
|
253 |
+
|
254 |
+
class ThinkingBudgetClient:
|
255 |
+
def __init__(self, base_url: str, api_key: str, tokenizer_name_or_path: str):
|
256 |
+
self.base_url = base_url
|
257 |
+
self.api_key = api_key
|
258 |
+
self.tokenizer = AutoTokenizer.from_pretrained(tokenizer_name_or_path)
|
259 |
+
self.client = openai.OpenAI(base_url=self.base_url, api_key=self.api_key)
|
260 |
+
|
261 |
+
|
262 |
+
def chat_completion(
|
263 |
+
self,
|
264 |
+
model: str,
|
265 |
+
messages: List[Dict[str, Any]],
|
266 |
+
max_thinking_budget: int = 512,
|
267 |
+
max_tokens: int = 1024,
|
268 |
+
**kwargs,
|
269 |
+
) -> Dict[str, Any]:
|
270 |
+
assert (
|
271 |
+
max_tokens > max_thinking_budget
|
272 |
+
), f"thinking budget must be smaller than maximum new tokens. Given {max_tokens=} and {max_thinking_budget=}"
|
273 |
+
|
274 |
+
|
275 |
+
# 1. first call chat completion to get reasoning content
|
276 |
+
response = self.client.chat.completions.create(
|
277 |
+
model=model, messages=messages, max_tokens=max_thinking_budget, **kwargs
|
278 |
+
)
|
279 |
+
content = response.choices[0].message.content
|
280 |
+
|
281 |
+
|
282 |
+
reasoning_content = content
|
283 |
+
if not "</think>" in reasoning_content:
|
284 |
+
# reasoning content is too long, closed with a period (.)
|
285 |
+
reasoning_content = f"{reasoning_content}.\n</think>\n\n"
|
286 |
+
reasoning_tokens_len = len(
|
287 |
+
self.tokenizer.encode(reasoning_content, add_special_tokens=False)
|
288 |
+
)
|
289 |
+
remaining_tokens = max_tokens - reasoning_tokens_len
|
290 |
+
assert (
|
291 |
+
remaining_tokens > 0
|
292 |
+
), f"remaining tokens must be positive. Given {remaining_tokens=}. Increase the max_tokens or lower the max_thinking_budget."
|
293 |
+
|
294 |
+
|
295 |
+
# 2. append reasoning content to messages and call completion
|
296 |
+
messages.append({"role": "assistant", "content": reasoning_content})
|
297 |
+
prompt = self.tokenizer.apply_chat_template(
|
298 |
+
messages,
|
299 |
+
tokenize=False,
|
300 |
+
continue_final_message=True,
|
301 |
+
)
|
302 |
+
response = self.client.completions.create(
|
303 |
+
model=model, prompt=prompt, max_tokens=max_tokens, **kwargs
|
304 |
+
)
|
305 |
+
|
306 |
+
|
307 |
+
response_data = {
|
308 |
+
"reasoning_content": reasoning_content.strip().strip("</think>").strip(),
|
309 |
+
"content": response.choices[0].text,
|
310 |
+
"finish_reason": response.choices[0].finish_reason,
|
311 |
+
}
|
312 |
+
return response_data
|
313 |
+
```
|
314 |
+
|
315 |
+
Calling the server with a budget (Restricted to 32 tokens here as an example)
|
316 |
+
|
317 |
+
```py
|
318 |
+
tokenizer_name_or_path = "nvidia/NVIDIA-Nemotron-Nano-9B-v2"
|
319 |
+
client = ThinkingBudgetClient(
|
320 |
+
base_url="http://localhost:8000/v1", # Nano 9B v2 deployed in thinking mode
|
321 |
+
api_key="EMPTY",
|
322 |
+
tokenizer_name_or_path=tokenizer_name_or_path,
|
323 |
+
)
|
324 |
+
|
325 |
+
|
326 |
+
result = client.chat_completion(
|
327 |
+
model="nvidia/NVIDIA-Nemotron-Nano-9B-v2",
|
328 |
+
messages=[
|
329 |
+
{"role": "system", "content": "You are a helpful assistant. /think"},
|
330 |
+
{"role": "user", "content": "What is 2+2?"},
|
331 |
+
],
|
332 |
+
max_thinking_budget=32,
|
333 |
+
max_tokens=512,
|
334 |
+
temperature=0.6,
|
335 |
+
top_p=0.95,
|
336 |
+
)
|
337 |
+
print(result)
|
338 |
+
```
|
339 |
+
|
340 |
+
You should see output similar to the following:
|
341 |
+
|
342 |
+
```
|
343 |
+
{'reasoning_content': "Okay, the user asked, What is 2+2? Let me think. Well, 2 plus 2 equals 4. That's a basic.", 'content': '2 + 2 equals **4**.\n', 'finish_reason': 'stop'}
|
344 |
+
```
|
345 |
+
|
346 |
+
#### Using Tool-Calling with a vLLM Server
|
347 |
+
|
348 |
+
Start a vLLM server with native tool-calling:
|
349 |
+
|
350 |
+
```shell
|
351 |
+
git clone https://huggingface.co/nvidia/nvidia/NVIDIA-Nemotron-Nano-9B-v2
|
352 |
+
|
353 |
+
vllm serve nvidia/NVIDIA-Nemotron-Nano-9B-v2 \
|
354 |
+
--trust-remote-code \
|
355 |
+
--mamba_ssm_cache_dtype float32
|
356 |
+
--enable-auto-tool-choice \
|
357 |
+
--tool-parser-plugin "NVIDIA-Nemotron-Nano-9B-v2/nemotron_toolcall_parser_no_streaming.py" \
|
358 |
+
--tool-call-parser "nemotron_json"
|
359 |
+
```
|
360 |
+
|
361 |
+
## After launching a vLLM server, you can call the server with tool-call support using a Python script like below:
|
362 |
+
|
363 |
+
```py
|
364 |
+
from openai import OpenAI
|
365 |
+
|
366 |
+
client = OpenAI(
|
367 |
+
base_url="http://0.0.0.0:5000/v1",
|
368 |
+
api_key="dummy",
|
369 |
+
)
|
370 |
+
|
371 |
+
completion = client.chat.completions.create(
|
372 |
+
model="nvidia/NVIDIA-Nemotron-Nano-9B-v2",
|
373 |
+
messages=[
|
374 |
+
{"role": "system", "content": ""},
|
375 |
+
{"role": "user", "content": "My bill is $100. What will be the amount for 18% tip?"}
|
376 |
+
],
|
377 |
+
tools=[
|
378 |
+
{
|
379 |
+
"type": "function",
|
380 |
+
"function": {
|
381 |
+
"name": "calculate_tip",
|
382 |
+
"parameters": {
|
383 |
+
"type": "object",
|
384 |
+
"properties": {
|
385 |
+
"bill_total": {
|
386 |
+
"type": "integer",
|
387 |
+
"description": "The total amount of the bill"
|
388 |
+
},
|
389 |
+
"tip_percentage": {
|
390 |
+
"type": "integer",
|
391 |
+
"description": "The percentage of tip to be applied"
|
392 |
+
}
|
393 |
+
},
|
394 |
+
"required": ["bill_total", "tip_percentage"]
|
395 |
+
}
|
396 |
+
}
|
397 |
+
},
|
398 |
+
{
|
399 |
+
"type": "function",
|
400 |
+
"function": {
|
401 |
+
"name": "convert_currency",
|
402 |
+
"parameters": {
|
403 |
+
"type": "object",
|
404 |
+
"properties": {
|
405 |
+
"amount": {
|
406 |
+
"type": "integer",
|
407 |
+
"description": "The amount to be converted"
|
408 |
+
},
|
409 |
+
"from_currency": {
|
410 |
+
"type": "string",
|
411 |
+
"description": "The currency code to convert from"
|
412 |
+
},
|
413 |
+
"to_currency": {
|
414 |
+
"type": "string",
|
415 |
+
"description": "The currency code to convert to"
|
416 |
+
}
|
417 |
+
},
|
418 |
+
"required": ["from_currency", "amount", "to_currency"]
|
419 |
+
}
|
420 |
+
}
|
421 |
+
}
|
422 |
+
],
|
423 |
+
temperature=0.6,
|
424 |
+
top_p=0.95,
|
425 |
+
max_tokens=32768,
|
426 |
+
stream=False
|
427 |
+
)
|
428 |
+
|
429 |
+
print(completion.choices[0].message.content)
|
430 |
+
print(completion.choices[0].message.tool_calls)
|
431 |
+
```
|
432 |
+
|
433 |
+
You should see output similar to the following:
|
434 |
+
|
435 |
+
```
|
436 |
+
<think>
|
437 |
+
Okay, let's see. The user has a bill of $100 and wants to know the amount for an 18% tip. Hmm, I need to calculate the tip based on the bill total and the percentage. The tools provided include calculate_tip, which takes bill_total and tip_percentage as parameters. So the bill_total here is 100, and the tip_percentage is 18. I should call the calculate_tip function with these values. Wait, do I need to check if the parameters are integers? The bill is $100, which is an integer, and 18% is also an integer. So that fits the function's requirements. I don't need to convert any currency here because the user is asking about a tip in the same currency. So the correct tool to use is calculate_tip with those parameters.
|
438 |
+
</think>
|
439 |
+
|
440 |
+
[ChatCompletionMessageToolCall(id='chatcmpl-tool-e341c6954d2c48c2a0e9071c7bdefd8b', function=Function(arguments='{"bill_total": 100, "tip_percentage": 18}', name='calculate_tip'), type='function')]
|
441 |
+
```
|
442 |
+
|
443 |
+
## Model Version
|
444 |
+
|
445 |
+
- v1.0
|
446 |
+
|
447 |
+
## Prompt Format
|
448 |
+
|
449 |
+
We follow the jinja chat template provided below. This template conditionally adds `<think>\n` to the start of the Assistant response if `/think` is found in the system prompt or if no reasoning signal is added, and adds `<think></think>` to the start of the Assistant response if `/no_think` is found in the system prompt. Thus enforcing reasoning on/off behavior.
|
450 |
+
|
451 |
+
```
|
452 |
+
{%- set ns = namespace(enable_thinking = true) %}
|
453 |
+
|
454 |
+
{%- for message in messages -%}
|
455 |
+
{%- set content = message['content'] -%}
|
456 |
+
{%- if message['role'] == 'user' or message['role'] == 'system' -%}
|
457 |
+
{%- if '/think' in content -%}
|
458 |
+
{%- set ns.enable_thinking = true -%}
|
459 |
+
{%- elif '/no_think' in content -%}
|
460 |
+
{%- set ns.enable_thinking = false -%}
|
461 |
+
{%- endif -%}
|
462 |
+
{%- endif -%}
|
463 |
+
{%- endfor -%}
|
464 |
+
|
465 |
+
{%- if messages[0]['role'] != 'system' -%}
|
466 |
+
{%- set ns.non_tool_system_content = '' -%}
|
467 |
+
{{- '<SPECIAL_10>System\n' -}}
|
468 |
+
{%- else -%}
|
469 |
+
{%- set ns.non_tool_system_content = messages[0]['content']
|
470 |
+
.replace('/think', '')
|
471 |
+
.replace('/no_think', '')
|
472 |
+
.strip()
|
473 |
+
-%}
|
474 |
+
{{- '<SPECIAL_10>System\n' + ns.non_tool_system_content }}
|
475 |
+
{%- endif -%}
|
476 |
+
|
477 |
+
{%- if tools -%}
|
478 |
+
{%- if ns.non_tool_system_content is defined and ns.non_tool_system_content != '' -%}
|
479 |
+
{{- '\n\n' -}}
|
480 |
+
{%- endif -%}
|
481 |
+
|
482 |
+
{{- 'You can use the following tools to assist the user if required:' -}}
|
483 |
+
{{- '\n<AVAILABLE_TOOLS>[' -}}
|
484 |
+
{%- for tool in tools -%}
|
485 |
+
{{- (tool.function if tool.function is defined else tool) | tojson -}}
|
486 |
+
{{- ', ' if not loop.last else '' -}}
|
487 |
+
{%- endfor -%}
|
488 |
+
{{- ']</AVAILABLE_TOOLS>\n\n' -}}
|
489 |
+
|
490 |
+
{{- 'If you decide to call any tool(s), use the following format:\n' -}}
|
491 |
+
{{- '<TOOLCALL>[{{"name": "tool_name1", "arguments": "tool_args1"}}, ' -}}
|
492 |
+
{{- '{{"name": "tool_name2", "arguments": "tool_args2"}}]</TOOLCALL>\n\n' -}}
|
493 |
+
|
494 |
+
{{- 'The user will execute tool-calls and return responses from tool(s) in this format:\n' -}}
|
495 |
+
{{- '<TOOL_RESPONSE>[{{"tool_response1"}}, {{"tool_response2"}}]</TOOL_RESPONSE>\n\n' -}}
|
496 |
+
|
497 |
+
{{- 'Based on the tool responses, you can call additional tools if needed, correct tool calls if any errors are found, or just respond to the user.' -}}
|
498 |
+
{%- endif -%}
|
499 |
+
|
500 |
+
{{- '\n' -}}
|
501 |
+
|
502 |
+
{%- set messages = messages[1:] if messages[0]['role'] == 'system' else messages -%}
|
503 |
+
|
504 |
+
{%- if messages[-1]['role'] == 'assistant' -%}
|
505 |
+
{%- set ns.last_turn_assistant_content = messages[-1]['content'].strip() -%}
|
506 |
+
{%- set messages = messages[:-1] -%}
|
507 |
+
{%- endif -%}
|
508 |
+
|
509 |
+
{%- for message in messages -%}
|
510 |
+
{%- set content = message['content'] -%}
|
511 |
+
|
512 |
+
{%- if message['role'] == 'user' -%}
|
513 |
+
{{- '<SPECIAL_11>User\n' + content.replace('/think', '').replace('/no_think', '').strip() + '\n' }}
|
514 |
+
|
515 |
+
{%- elif message['role'] == 'tool' -%}
|
516 |
+
{%- if loop.first or (messages[loop.index0 - 1].role != 'tool') -%}
|
517 |
+
{{- '<SPECIAL_11>User\n' + '<TOOL_RESPONSE>[' }}
|
518 |
+
{%- endif -%}
|
519 |
+
{{- message['content'] -}}
|
520 |
+
{{- ', ' if not loop.last and (messages[loop.index0 + 1].role == 'tool') else '' -}}
|
521 |
+
{%- if loop.last or (messages[loop.index0 + 1].role != 'tool') -%}
|
522 |
+
{{- ']</TOOL_RESPONSE>\n' -}}
|
523 |
+
{%- endif -%}
|
524 |
+
|
525 |
+
{%- elif message['role'] == 'assistant' -%}
|
526 |
+
{%- if '</think>' in content -%}
|
527 |
+
{%- set content = content.split('</think>')[1].strip() %}
|
528 |
+
{%- endif -%}
|
529 |
+
|
530 |
+
{{- '<SPECIAL_11>Assistant\n' + content.strip() }}
|
531 |
+
|
532 |
+
{%- if message.tool_calls -%}
|
533 |
+
{%- if content.strip() != '' -%}
|
534 |
+
{{- '\n\n' -}}
|
535 |
+
{%- endif -%}
|
536 |
+
{{- '<TOOLCALL>[' -}}
|
537 |
+
{%- for call in message.tool_calls -%}
|
538 |
+
{%- set fn = call.function if call.function is defined else call -%}
|
539 |
+
{{- '{"name": "' + fn.name + '", "arguments": ' -}}
|
540 |
+
{%- if fn.arguments is string -%}
|
541 |
+
{{- fn.arguments -}}
|
542 |
+
{%- else -%}
|
543 |
+
{{- fn.arguments | tojson -}}
|
544 |
+
{%- endif -%}
|
545 |
+
{{- '}' + (', ' if not loop.last else '') -}}
|
546 |
+
{%- endfor -%}
|
547 |
+
{{- ']</TOOLCALL>' -}}
|
548 |
+
{%- endif -%}
|
549 |
+
|
550 |
+
{{- '\n<SPECIAL_12>\n' -}}
|
551 |
+
{%- endif -%}
|
552 |
+
{%- endfor -%}
|
553 |
+
|
554 |
+
{%- if add_generation_prompt -%}
|
555 |
+
{{- '<SPECIAL_11>Assistant\n' -}}
|
556 |
+
{%- if ns.enable_thinking is defined and ns.enable_thinking is false -%}
|
557 |
+
{{- '<think></think>' -}}
|
558 |
+
{%- else -%}
|
559 |
+
{{- '<think>\n' -}}
|
560 |
+
{%- endif -%}
|
561 |
+
{%- if ns.last_turn_assistant_content is defined and ns.last_turn_assistant_content != '' -%}
|
562 |
+
{{- ns.last_turn_assistant_content -}}
|
563 |
+
{%- endif -%}
|
564 |
+
|
565 |
+
{%- else -%}
|
566 |
+
{%- if ns.last_turn_assistant_content is defined and ns.last_turn_assistant_content != '' -%}
|
567 |
+
{{- '<SPECIAL_11>Assistant\n' -}}
|
568 |
+
{%- if ns.enable_thinking is defined and ns.enable_thinking is false -%}
|
569 |
+
{{- '<think></think>' -}}
|
570 |
+
{%- else -%}
|
571 |
+
{{- '<think>\n' -}}
|
572 |
+
{%- endif -%}
|
573 |
+
{{- ns.last_turn_assistant_content -}}
|
574 |
+
|
575 |
+
{%- if continue_final_message is defined -%}
|
576 |
+
{%- if continue_final_message is false -%}
|
577 |
+
{{- '\n<SPECIAL_12>\n' -}}
|
578 |
+
{%- endif -%}
|
579 |
+
{%- else -%}
|
580 |
+
{{- '\n<SPECIAL_12>\n' -}}
|
581 |
+
{%- endif -%}
|
582 |
+
{%- endif -%}
|
583 |
+
{%- endif -%}
|
584 |
+
```
|
585 |
+
|
586 |
+
##
|
587 |
+
|
588 |
+
## Training, Testing, and Evaluation Datasets
|
589 |
+
|
590 |
+
### Training datasets
|
591 |
+
|
592 |
+
* Data Modality: Text
|
593 |
+
* Text Training Data Size: More than 10 Trillion Tokens
|
594 |
+
* Train/Test/Valid Split: We used 100% of the corpus for pre-training and relied on external benchmarks for testing.
|
595 |
+
* Data Collection Method by dataset: Hybrid: Automated, Human, Synthetic
|
596 |
+
* Labeling Method by dataset: Hybrid: Automated, Human, Synthetic
|
597 |
+
|
598 |
+
|
599 |
+
**Properties:** The post-training corpus for NVIDIA-Nemotron-Nano-9B-v2 consists of English and multilingual text (German, Spanish, French, Italian, Korean, Portuguese, Russian, Japanese, Chinese and English). Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including code, legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracies. For several of the domains listed above we used synthetic data, specifically reasoning traces, from DeepSeek R1/R1-0528, Qwen3-235B-A22B, Nemotron 4 340B, Qwen2.5-32B-Instruct-AWQ, Qwen2.5-14B-Instruct, Qwen 2.5 72B.
|
600 |
+
|
601 |
+
The pre-training corpus for NVIDIA-Nemotron-Nano-9B-v2 consists of high-quality curated and synthetically-generated data. It is trained in the English language, as well as 15 multilingual languages and 43 programming languages. Our sources cover a variety of document types such as: webpages, dialogue, articles, and other written materials. The corpus spans domains including legal, math, science, finance, and more. We also include a small portion of question-answering, and alignment style data to improve model accuracy. The model was pre-trained for approximately twenty trillion tokens.
|
602 |
+
|
603 |
+
More details on the datasets and synthetic data generation methods can be found in the technical report [NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model](https://research.nvidia.com/labs/adlr/files/NVIDIA-Nemotron-Nano-2-Technical-Report.pdf) .
|
604 |
+
|
605 |
+
## Public Datasets
|
606 |
+
|
607 |
+
| Dataset | Collection Period |
|
608 |
+
| :---- | :---- |
|
609 |
+
| [Problems in Elementary Mathematics for Home Study](https://archive.org/details/AntonovVygodskyNikitinSankinProblemsInElementaryMathematicsForHomeStudyMir1982) | 4/23/2025 |
|
610 |
+
| [GSM8K](https://github.com/openai/grade-school-math) | 4/23/2025 |
|
611 |
+
| [PRM800K](https://github.com/openai/prm800k) | 4/23/2025 |
|
612 |
+
| [CC-NEWS](https://commoncrawl.org/blog/news-dataset-available) | 4/23/2025 |
|
613 |
+
| [Common Crawl](https://commoncrawl.org/) | 4/23/2025 |
|
614 |
+
| [Wikimedia](https://dumps.wikimedia.org/) | 4/23/2025 |
|
615 |
+
| [Bespoke-Stratos-17k](https://huggingface.co/datasets/bespokelabs/Bespoke-Stratos-17k) | 4/23/2025 |
|
616 |
+
| [tigerbot-kaggle-leetcodesolutions-en-2k](https://huggingface.co/datasets/TigerResearch/tigerbot-kaggle-leetcodesolutions-en-2k) | 4/23/2025 |
|
617 |
+
| [glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2) | 4/23/2025 |
|
618 |
+
| [APIGen Function-Calling](https://huggingface.co/datasets/Salesforce/xlam-function-calling-60k) | 4/23/2025 |
|
619 |
+
| [LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m) | 4/23/2025 |
|
620 |
+
| [Open Textbook Library \- CC BY-SA & GNU subset](https://open.umn.edu/opentextbooks/textbooks/) and [OpenStax \- CC BY-SA subset](https://openstax.org/) | 4/23/2025 |
|
621 |
+
| [Advanced Reasoning Benchmark](https://github.com/TheDuckAI/arb), [tigerbot-kaggle-leetcodesolutions-en-2k](https://huggingface.co/datasets/TigerResearch/tigerbot-kaggle-leetcodesolutions-en-2k), [PRM800K](https://github.com/openai/prm800k), and [SciBench](https://github.com/mandyyyyii/scibench) | 4/23/2025 |
|
622 |
+
| [FineWeb-2](https://huggingface.co/datasets/HuggingFaceFW/fineweb-2) | 4/23/2025 |
|
623 |
+
| [Court Listener](https://www.courtlistener.com/help/api/bulk-data/) | Legacy Download |
|
624 |
+
| [peS2o](https://huggingface.co/datasets/allenai/peS2o) | Legacy Download |
|
625 |
+
| [OpenWebMath](https://huggingface.co/datasets/open-web-math/open-web-math) | Legacy Download |
|
626 |
+
| [BioRxiv](https://www.biorxiv.org/tdm) | Legacy Download |
|
627 |
+
| [PMC Open Access Subset](https://pmc.ncbi.nlm.nih.gov/tools/openftlist/) | Legacy Download |
|
628 |
+
| [OpenWebText2](https://openwebtext2.readthedocs.io/en/latest/) | Legacy Download |
|
629 |
+
| [Stack Exchange Data Dump](https://archive.org/details/stackexchange) | Legacy Download |
|
630 |
+
| [PubMed Abstracts](https://github.com/thoppe/The-Pile-PubMed) | Legacy Download |
|
631 |
+
| [NIH ExPorter](https://exporter.nih.gov/ExPORTER_Catalog.aspx) | Legacy Download |
|
632 |
+
| [arXiv](https://info.arxiv.org/help/bulk_data/index.html) | Legacy Download |
|
633 |
+
| [BigScience Workshop Datasets](https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml#datasets) | Legacy Download |
|
634 |
+
| [Reddit Dataset](https://files.pushshift.io/reddit/) | Legacy Download |
|
635 |
+
| [SEC's Electronic Data Gathering, Analysis, and Retrieval (EDGAR)](https://www.sec.gov/search-filings) | Legacy Download |
|
636 |
+
| [Public Software Heritage S3](https://docs.softwareheritage.org/devel/swh-export/graph/dataset.html#summary-of-dataset-versions) | Legacy Download |
|
637 |
+
| [The Stack](https://huggingface.co/datasets/bigcode/the-stack) | Legacy Download |
|
638 |
+
| [mC4](https://huggingface.co/datasets/legacy-datasets/mc4) | Legacy Download |
|
639 |
+
| [Advanced Mathematical Problem Solving](https://github.com/hendrycks/math?tab=readme-ov-file) | Legacy Download |
|
640 |
+
| [MathPile](https://github.com/GAIR-NLP/MathPile/) | Legacy Download |
|
641 |
+
| [NuminaMath CoT](https://huggingface.co/datasets/AI-MO/NuminaMath-CoT) | Legacy Download |
|
642 |
+
| [PMC Article](https://pmc.ncbi.nlm.nih.gov/tools/textmining/) | Legacy Download |
|
643 |
+
| [FLAN](https://github.com/google-research/FLAN) | Legacy Download |
|
644 |
+
| [Advanced Reasoning Benchmark](https://github.com/TheDuckAI/arb) | Legacy Download |
|
645 |
+
| [SciBench](https://github.com/mandyyyyii/scibench) | Legacy Download |
|
646 |
+
| [WikiTableQuestions](https://huggingface.co/datasets/wikitablequestions) | Legacy Download |
|
647 |
+
| [FinQA](https://finqasite.github.io/) | Legacy Download |
|
648 |
+
| [Riddles](https://github.com/crawsome/riddles) | Legacy Download |
|
649 |
+
| [Problems in Elementary Mathematics for Home Study](https://archive.org/details/AntonovVygodskyNikitinSankinProblemsInElementaryMathematicsForHomeStudyMir1982) | Legacy Download |
|
650 |
+
| [MedMCQA](https://huggingface.co/datasets/openlifescienceai/medmcqa) | Legacy Download |
|
651 |
+
| [Cosmos QA](https://huggingface.co/datasets/allenai/cosmos_qa) | Legacy Download |
|
652 |
+
| [MCTest](https://huggingface.co/datasets/sagnikrayc/mctest) | Legacy Download |
|
653 |
+
| [AI2's Reasoning Challenge](https://huggingface.co/datasets/ai2_arc) | Legacy Download |
|
654 |
+
| [OpenBookQA](https://github.com/allenai/OpenBookQA) | Legacy Download |
|
655 |
+
| [MMLU Auxiliary Train](https://huggingface.co/datasets/cais/mmlu/viewer/all/auxiliary_train) | Legacy Download |
|
656 |
+
| [social-chemestry-101](https://huggingface.co/datasets/tasksource/social-chemestry-101) | Legacy Download |
|
657 |
+
| [Moral Stories](https://huggingface.co/datasets/demelin/moral_stories) | Legacy Download |
|
658 |
+
| [The Common Pile v0.1](https://huggingface.co/common-pile) | Legacy Download |
|
659 |
+
| [FineMath](https://huggingface.co/datasets/HuggingFaceTB/finemath) | Legacy Download |
|
660 |
+
| [MegaMath](https://huggingface.co/datasets/LLM360/MegaMath) | Legacy Download |
|
661 |
+
| [FastChat](https://github.com/lm-sys/FastChat) | 6/30/2025 |
|
662 |
+
|
663 |
+
## Private Non-publicly Accessible Datasets of Third Parties
|
664 |
+
|
665 |
+
| Dataset |
|
666 |
+
| :---- |
|
667 |
+
| Global Regulation |
|
668 |
+
| Workbench |
|
669 |
+
|
670 |
+
## Online Dataset Sources
|
671 |
+
|
672 |
+
The English Common Crawl data was downloaded from the Common Crawl Foundation (see their [FAQ](https://commoncrawl.org/faq) for details on their crawling) and includes the snapshots CC-MAIN-2013-20 through CC-MAIN-2025-13. The data was subsequently deduplicated and filtered in various ways described in the [Nemotron-CC paper](https://arxiv.org/abs/2412.02595).
|
673 |
+
|
674 |
+
Additionally, we extracted data for fifteen languages from the following three Common Crawl snapshots: CC-MAIN-2024-51, CC-MAIN-2025-08, CC-MAIN-2025-18. The fifteen languages included were Arabic, Chinese, Danish, Dutch, French, German, Italian, Japanese, Korean, Polish, Portuguese, Russian, Spanish, Swedish, and Thai. As we did not have reliable multilingual model-based quality classifiers available, we applied just heuristic filtering instead—similar to what we did for lower quality English data in the Nemotron-CC pipeline, but selectively removing some filters for some languages that did not work well. Deduplication was done in the same way as for Nemotron-CC.
|
675 |
+
|
676 |
+
The GitHub Crawl was collected using the GitHub REST API and the Amazon S3 API. Each crawl was operated in accordance with the rate limits set by its respective source, either GitHub or S3. We collect raw source code and subsequently remove any having a license which does not exist in our permissive-license set (for additional details, refer to the technical report).
|
677 |
+
|
678 |
+
| Dataset | Modality | Dataset Size (Tokens) | Collection Period |
|
679 |
+
| :---- | :---- | :---- | :---- |
|
680 |
+
| English Common Crawl | Text | 3.360T | 4/8/2025 |
|
681 |
+
| Multilingual Common Crawl | Text | 812.7B | 5/1/2025 |
|
682 |
+
| GitHub Crawl | Text | 747.4B | 4/29/2025 |
|
683 |
+
|
684 |
+
## NVIDIA-Sourced Synthetic Datasets
|
685 |
+
|
686 |
+
| Dataset | Modality | Dataset Size (Tokens) | Seed Dataset | Model(s) used for generation |
|
687 |
+
| :---- | :---- | :---- | :---- | :---- |
|
688 |
+
| Synthetic Art of Problem Solving from DeepSeek-R1 | Text | 25.5B | [Art of Problem Solving](https://artofproblemsolving.com/company); [American Mathematics Competitions 8](https://artofproblemsolving.com/wiki/index.php/AMC_8_Problems_and_Solutions); [American Mathematics Competitions 10](https://artofproblemsolving.com/wiki/index.php/AMC_10_Problems_and_Solutions); | [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) |
|
689 |
+
| Synthetic Moral Stories and Social Chemistry from Mixtral-8x22B-v0.1 | Text | 327M | [social-chemestry-101](https://huggingface.co/datasets/tasksource/social-chemestry-101); [Moral Stories](https://huggingface.co/datasets/demelin/moral_stories) | [Mixtral-8x22B-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1) |
|
690 |
+
| Synthetic Social Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B | Text | 83.6M | [OpenStax \- CC BY-SA subset](https://openstax.org/) | [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3); [Mixtral-8x22B-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1); [Qwen2.5-72B](https://huggingface.co/Qwen/Qwen2.5-72B) |
|
691 |
+
| Synthetic Health Sciences seeded with OpenStax from DeepSeek-V3, Mixtral-8x22B-v0.1, and Qwen2.5-72B | Text | 9.7M | [OpenStax \- CC BY-SA subset](https://openstax.org/) | [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3); [Mixtral-8x22B-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-v0.1); [Qwen2.5-72B](https://huggingface.co/Qwen/Qwen2.5-72B) |
|
692 |
+
| Synthetic STEM seeded with OpenStax, Open Textbook Library, and GSM8K from DeepSeek-R1, DeepSeek-V3, DeepSeek-V3-0324, and Qwen2.5-72B | Text | 175M | [OpenStax \- CC BY-SA subset](https://openstax.org/); [GSM8K](https://github.com/openai/grade-school-math); [Open Textbook Library \- CC BY-SA & GNU subset](https://open.umn.edu/opentextbooks/textbooks/) | [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1), [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3); [DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324); [Qwen2.5-72B](https://huggingface.co/Qwen/Qwen2.5-72B) |
|
693 |
+
| [Nemotron-PrismMath](https://huggingface.co/datasets/nvidia/Nemotron-PrismMath) | Text | 4.6B | [Big-Math-RL-Verified](https://huggingface.co/datasets/SynthLabsAI/Big-Math-RL-Verified); [OpenR1-Math-220k](https://huggingface.co/datasets/open-r1/OpenR1-Math-220k) | [Qwen2.5-0.5B-instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct), [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct); [DeepSeek-R1-Distill-Qwen-32B](https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-32B) |
|
694 |
+
| Synthetic Question Answering Data from Papers and Permissible Books from Qwen2.5-72B-Instruct | Text | 350M | [arXiv](https://info.arxiv.org/help/bulk_data/index.html); [National Institutes of Health ExPorter](https://www.nih.gov/); [BioRxiv](https://www.biorxiv.org/tdm); [PMC Article](https://pmc.ncbi.nlm.nih.gov/tools/textmining/); [USPTO Backgrounds](https://data.uspto.gov/apis/transition-guide/bdss#pats); [peS2o](https://huggingface.co/datasets/allenai/peS2o); Global Regulation; [CORE](https://core.ac.uk/documentation/dataset); [PG-19](https://github.com/google-deepmind/pg19); [DOAB CC BY & CC BY-SA subset](https://www.doabooks.org/en); [NDLTD](https://ndltd.org/thesis-resources/global-etd-search/) | [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) |
|
695 |
+
| Synthetic FineMath-4+ Reprocessed from DeepSeek-V3 | Text | 9.2B | [Common Crawl](https://commoncrawl.org/latest-crawl) | [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3) |
|
696 |
+
| Synthetic FineMath-3+ Reprocessed from phi-4 | Text | 27.6B | [Common Crawl](https://commoncrawl.org/latest-crawl) | [phi-4](https://huggingface.co/microsoft/phi-4) |
|
697 |
+
| Synthetic Union-3+ Reprocessed from phi-4 | Text | 93.1B | [Common Crawl](https://commoncrawl.org/latest-crawl) | [phi-4](https://huggingface.co/microsoft/phi-4) |
|
698 |
+
| Refreshed [Nemotron-MIND](https://huggingface.co/datasets/nvidia/Nemotron-MIND) from phi-4 | Text | 73B | [Common Crawl](https://commoncrawl.org/latest-crawl) | [phi-4](https://huggingface.co/microsoft/phi-4) |
|
699 |
+
| Synthetic Union-4+ Reprocessed from phi-4 | Text | 14.12B | [Common Crawl](https://commoncrawl.org/latest-crawl) | [phi-4](https://huggingface.co/microsoft/phi-4) |
|
700 |
+
| Synthetic Union-3+ minus 4+ Reprocessed from phi-4 | Text | 78.95B | [Common Crawl](https://commoncrawl.org/latest-crawl) | [phi-4](https://huggingface.co/microsoft/phi-4) |
|
701 |
+
| Synthetic Union-3 Refreshed from phi-4 | Text | 80.94B | [Common Crawl](https://commoncrawl.org/latest-crawl) | [phi-4](https://huggingface.co/microsoft/phi-4) |
|
702 |
+
| Synthetic Union-4+ Refreshed from phi-4 | Text | 52.32B | [Common Crawl](https://commoncrawl.org/latest-crawl) | [phi-4](https://huggingface.co/microsoft/phi-4) |
|
703 |
+
| Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from DeepSeek-V3 and DeepSeek-V3-0324 | Text | 4.0B | [AQUA-RAT](https://huggingface.co/datasets/deepmind/aqua_rat); [LogiQA](https://huggingface.co/datasets/lucasmccabe/logiqa); [AR-LSAT](https://github.com/zhongwanjun/AR-LSAT) | [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3); [DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) |
|
704 |
+
| Synthetic AGIEval seeded with AQUA-RAT, LogiQA, and AR-LSAT from Qwen3-30B-A3B | Text | 4.2B | [AQUA-RAT](https://huggingface.co/datasets/deepmind/aqua_rat); [LogiQA](https://huggingface.co/datasets/lucasmccabe/logiqa); [AR-LSAT](https://github.com/zhongwanjun/AR-LSAT) | [Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) |
|
705 |
+
| Synthetic Art of Problem Solving from Qwen2.5-32B-Instruct, Qwen2.5-Math-72B, Qwen2.5-Math-7B, and Qwen2.5-72B-Instruct | Text | 83.1B | [Art of Problem Solving](https://artofproblemsolving.com/company); [American Mathematics Competitions 8](https://artofproblemsolving.com/wiki/index.php/AMC_8_Problems_and_Solutions); [American Mathematics Competitions 10](https://artofproblemsolving.com/wiki/index.php/AMC_10_Problems_and_Solutions); [GSM8K](https://github.com/openai/grade-school-math); [PRM800K](https://github.com/openai/prm800k) | [Qwen2.5-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct); [Qwen2.5-Math-72B](https://huggingface.co/Qwen/Qwen2.5-Math-72B); [Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B); [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) |
|
706 |
+
| Synthetic MMLU Auxiliary Train from DeepSeek-R1 | Text | 0.5B | [MMLU Auxiliary Train](https://huggingface.co/datasets/cais/mmlu/viewer/all/auxiliary_train) | [DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1) |
|
707 |
+
| Synthetic Long Context Continued Post-Training Data from Papers and Permissible Books from Qwen2.5-72B-Instruct | Text | 5.4B | [arXiv](https://info.arxiv.org/help/bulk_data/index.html); [National Institutes of Health ExPorter](https://www.nih.gov/); [BioRxiv](https://www.biorxiv.org/tdm); [PMC Article](https://pmc.ncbi.nlm.nih.gov/tools/textmining/); [USPTO Backgrounds](https://data.uspto.gov/apis/transition-guide/bdss#pats); [peS2o](https://huggingface.co/datasets/allenai/peS2o); Global Regulation; [CORE](https://core.ac.uk/documentation/dataset); [PG-19](https://github.com/google-deepmind/pg19); [DOAB CC BY & CC BY-SA subset](https://www.doabooks.org/en); [NDLTD](https://ndltd.org/thesis-resources/global-etd-search/) | [Qwen2.5-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct) |
|
708 |
+
| Synthetic Common Crawl from Qwen3-30B-A3B and Mistral-Nemo-12B-Instruct | Text | 1.949T | [Common Crawl](https://commoncrawl.org/) | [Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B); [Mistral-NeMo-12B-Instruct](https://huggingface.co/nvidia/Mistral-NeMo-12B-Instruct) |
|
709 |
+
| Synthetic Multilingual Data from Common Crawl from Qwen3-30B-A3B | Text | 997.3B | [Common Crawl](https://commoncrawl.org/) | [Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) |
|
710 |
+
| Synthetic Multilingual Data from Wikimedia from Qwen3-30B-A3B | Text | 55.1B | [Wikimedia](https://dumps.wikimedia.org/) | [Qwen3-30B-A3B](https://huggingface.co/Qwen/Qwen3-30B-A3B) |
|
711 |
+
| Synthetic OpenMathReasoning from DeepSeek-R1-0528 | Text | 1.5M | [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning) | [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) |
|
712 |
+
| Synthetic OpenCodeReasoning from DeepSeek-R1-0528 | Text | 1.1M | [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) | [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) |
|
713 |
+
| Synthetic Science Data from DeepSeek-R1-0528 | Text | 1.5M | \- | [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) |
|
714 |
+
| Synthetic Humanity's Last Exam from DeepSeek-R1-0528 | Text | 460K | [Humanity's Last Exam](https://huggingface.co/datasets/cais/hle) | [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) |
|
715 |
+
| Synthetic ToolBench from Qwen3-235B-A22B | Text | 400K | [ToolBench](https://github.com/OpenBMB/ToolBench) | [Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) |
|
716 |
+
| Synthetic Nemotron Content Safety Dataset V2, eval-safety, Gretel Synthetic Safety Alignment, and RedTeam\_2K from DeepSeek-R1-0528 | Text | 52K | [Nemotron Content Safety Dataset V2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0); [eval-safety](https://github.com/CrystalEye42/eval-safety/blob/main/malicious_tasks_dataset.yaml); [Gretel Synthetic Safety Alignment](https://huggingface.co/datasets/gretelai/gretel-safety-alignment-en-v1); [RedTeam\_2K](https://huggingface.co/datasets/JailbreakV-28K/JailBreakV-28k/viewer/RedTeam_2K) | [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528) |
|
717 |
+
| Synthetic HelpSteer from Qwen3-235B-A22B | Text | 120K | [HelpSteer3](https://huggingface.co/datasets/nvidia/HelpSteer3); [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2) | [Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) |
|
718 |
+
| Synthetic Alignment data from Mixtral-8x22B-Instruct-v0.1, Mixtral-8x7B-Instruct-v0.1, and Nemotron-4 Family | Text | 400K | [HelpSteer2](https://huggingface.co/datasets/nvidia/HelpSteer2); [C4](https://huggingface.co/datasets/allenai/c4); [LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m); [ShareGPT52K](https://huggingface.co/datasets/RyokoAI/ShareGPT52K); [tigerbot-kaggle-leetcodesolutions-en-2k](https://huggingface.co/datasets/TigerResearch/tigerbot-kaggle-leetcodesolutions-en-2k); [GSM8K](https://github.com/openai/grade-school-math); [PRM800K](https://github.com/openai/prm800k); lm\_identity (NVIDIA internal); [FinQA](https://finqasite.github.io/); [WikiTableQuestions](https://huggingface.co/datasets/wikitablequestions); [Riddles](https://github.com/crawsome/riddles); ChatQA nvolve-multiturn (NVIDIA internal); [glaive-function-calling-v2](https://huggingface.co/datasets/glaiveai/glaive-function-calling-v2); [SciBench](https://github.com/mandyyyyii/scibench); [OpenBookQA](https://github.com/allenai/OpenBookQA); [Advanced Reasoning Benchmark](https://github.com/TheDuckAI/arb); [Public Software Heritage S3](https://docs.softwareheritage.org/devel/swh-export/graph/dataset.html#summary-of-dataset-versions); [Khan Academy Math Keywords](https://www.khanacademy.org/math) | Nemotron-4-15B-Base (NVIDIA internal); Nemotron-4-15B-Instruct (NVIDIA internal); [Nemotron-4-340B-Base](https://huggingface.co/nvidia/Nemotron-4-340B-Base); [Nemotron-4-340B-Instruct](https://huggingface.co/nvidia/Nemotron-4-340B-Instruct); [Nemotron-4-340B-Reward](https://huggingface.co/nvidia/Nemotron-4-340B-Reward); [Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1); [Mixtral-8x22B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x22B-Instruct-v0.1) |
|
719 |
+
| Synthetic LMSYS-Chat-1M from Qwen3-235B-A22B | Text | 1M | [LMSYS-Chat-1M](https://huggingface.co/datasets/lmsys/lmsys-chat-1m) | [Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B) |
|
720 |
+
| Synthetic Multilingual Reasoning data from DeepSeek-R1-0528, Qwen2.5-32B-Instruct-AWQ, and Qwen2.5-14B-Instruct | Text | 25M | [OpenMathReasoning](https://huggingface.co/datasets/nvidia/OpenMathReasoning); [OpenCodeReasoning](https://huggingface.co/datasets/nvidia/OpenCodeReasoning) | [DeepSeek-R1-0528](https://huggingface.co/deepseek-ai/DeepSeek-R1-0528); [Qwen2.5-32B-Instruct-AWQ](https://huggingface.co/Qwen/Qwen2.5-32B-Instruct-AWQ) (translation); [Qwen2.5-14B-Instruct](https://huggingface.co/Qwen/Qwen2.5-14B-Instruct) (translation); |
|
721 |
+
| Synthetic Multilingual Reasoning data from Qwen3-235B-A22B and Gemma 3 Post-Trained models | Text | 5M | [WildChat](https://huggingface.co/datasets/allenai/WildChat-1M) | [Qwen3-235B-A22B](https://huggingface.co/Qwen/Qwen3-235B-A22B); [Gemma 3 PT 12B](https://huggingface.co/google/gemma-3-12b-it); [Gemma 3 PT 27B](https://huggingface.co/google/gemma-3-27b-it) |
|
722 |
+
|
723 |
+
### Evaluation Dataset:
|
724 |
+
|
725 |
+
* Data Collection Method by dataset: Hybrid: Human, Synthetic
|
726 |
+
* Labeling Method by dataset: Hybrid: Automated, Human, Synthetic
|
727 |
+
|
728 |
+
## Inference
|
729 |
+
|
730 |
+
- ## Engines: HF, vLLM, TRT-LLM
|
731 |
+
|
732 |
+
- ## Test Hardware NVIDIA A10G 24GB, H100 80GB
|
733 |
+
|
734 |
+
## Ethical Considerations
|
735 |
+
|
736 |
+
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our [Trustworthy AI terms of service](https://www.nvidia.com/en-us/agreements/trustworthy-ai/terms/), developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
|
737 |
+
|
738 |
+
For more detailed information on ethical considerations for this model, please see the Model Card++ [Bias](./bias.md), [Explainability](./explainability.md), [Safety & Security](./safety.md), and [Privacy](./privacy.md) Subcards.
|
739 |
+
|
740 |
+
Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
|
bias.md
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
| Field | Response |
|
2 |
+
| :---- | :---- |
|
3 |
+
| Participation considerations from adversely impacted groups [protected classes](https://www.senate.ca.gov/content/protected-classes) in model design and testing: | None |
|
4 |
+
| Bias Metric (If Measured): | [BBQ Accuracy Scores in Ambiguous Contexts](https://github.com/nyu-mll/BBQ/) |
|
5 |
+
| Which characteristic (feature) show(s) the greatest difference in performance?: | The model shows high variance in the characteristics when it is used with a high temperature. |
|
6 |
+
| Which feature(s) have the worst performance overall? | Age |
|
7 |
+
| Measures taken to mitigate against unwanted bias: | None |
|
8 |
+
| If using internal data, description of methods implemented in data acquisition or processing, if any, to address the prevalence of identifiable biases in the training, testing, and validation data: | The training datasets contain a large amount of synthetic data generated by LLMs. We manually curated prompts. |
|
9 |
+
| Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: | [BBQ](https://github.com/nyu-mll/BBQ/) |
|
10 |
+
| Tools used to assess statistical imbalances and highlight patterns that may introduce bias into AI models: | These datasets, such as Common Crawl, CC-News, and Wikimedia, do not collectively or exhaustively represent all demographic groups (and proportionally therein). For instance, these datasets do not contain explicit mentions of demographic classes such as age, gender, or ethnicity in over 85% of samples. In the subset where such terms are present, Common Crawl and CC-News contain notable representational skews—for example, references to "male" significantly outnumber those to "female," and mentions of "White" are the most frequent among ethnic identifiers. To mitigate these imbalances, we recommend considering evaluation techniques such as bias audits, fine-tuning with demographically balanced datasets, and mitigation strategies like counterfactual data augmentation to align with the desired model behavior. This evaluation used a 3,000-sample subset per dataset, identified as the optimal threshold for maximizing embedder accuracy, and includes outputs from uncalibrated embedders; as such, certain limitations may exist in the reliability of the embedding. |
|
explainability.md
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
| Field | Response |
|
2 |
+
| :---- | :---- |
|
3 |
+
| Intended Task/Domain: | Text generation, reasoning, and chat |
|
4 |
+
| Model Type: | Text-to-text Mamba2-Transformer Hybrid |
|
5 |
+
| Intended Users: | Generative AI creators working with conversational AI models and image content. |
|
6 |
+
| Output: | Text |
|
7 |
+
| Tools used to evaluate datasets to identify synthetic data and ensure data authenticity. | We used a Gemma-3 4B-based filtering model fine-tuned on [Nemotron Content Safety Dataset v2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) to ensure the quality of synthetic data. |
|
8 |
+
| Describe how the model works: | Generates text by predicting the next word or token based on the context provided in the input sequence using multiple self-attention layers. |
|
9 |
+
| Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: | Not Applicable |
|
10 |
+
| Technical Limitations & Mitigation: | The model demonstrates weakness to alignment-breaking attacks. Users are advised to deploy language model guardrails alongside this model to prevent potentially harmful outputs. The Model may generate answers that are inaccurate, omit key information, or include irrelevant or redundant text. |
|
11 |
+
| Verified to have met prescribed NVIDIA quality standards: | Yes |
|
12 |
+
| Performance Metrics: | Accuracy, Throughput, and User-side throughput |
|
13 |
+
| Potential Known Risks: | The model was optimized explicitly for instruction following and as such is more susceptible to prompt injection and jailbreaking in various forms as a result of its instruction tuning. This means that the model should be paired with additional rails or system filtering to limit exposure to instructions from malicious sources \-- either directly or indirectly by retrieval (e.g. via visiting a website) \-- as they may yield outputs that can lead to harmful, system-level outcomes up to and including remote code execution in agentic systems when effective security controls including guardrails are not in place. The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. |
|
14 |
+
| Licensing: | [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/) |
|
privacy.md
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
| Field | Response |
|
2 |
+
| :---- | :---- |
|
3 |
+
| Generatable or reverse engineerable personal data? | No |
|
4 |
+
| Personal data used to create this model? | No |
|
5 |
+
| Was consent obtained for any personal data used? | Not Applicable |
|
6 |
+
| A description of any methods implemented in data acquisition or processing, if any, to address the prevalence of personal data in the training data, where relevant and applicable. | We used only prompts that do not contain any personal data for synthetic data generation. |
|
7 |
+
| How often is the dataset reviewed? | Before Release |
|
8 |
+
| Is there provenance for all datasets used in training? | Yes |
|
9 |
+
| Does data labeling (annotation, metadata) comply with privacy laws? | Yes |
|
10 |
+
| Is data compliant with data subject requests for data correction or removal, if such a request was made? | No, not possible with externally-sourced data. |
|
11 |
+
| Applicable Privacy Policy | [NVIDIA Privacy Policy](https://www.nvidia.com/en-us/about-nvidia/privacy-policy/) |
|
12 |
+
| During AI model development, strict adherence to copyright policy ensured compliance through risk mitigation and legal reviews. Post-data collection, reserved rights content is identified and removed, with verified opt-out processes for rightsholders. Detailed records document due diligence and transparency. | True |
|
13 |
+
| We employ automated tools and data processing techniques during pre-training to identify and filter certain categories of personal information. Scans of training datasets detected no PII. | True. We employ automated tools and data processing techniques to scan for Personally Identifiable Information (PII) during pre-training to identify and filter certain categories of personal information, including public-facing contact details such as email addresses and phone numbers. Scans of Common Crawl, CC-News, and Wikimedia datasets did not detect PII in the majority of samples. However, Microsoft Presidio indicated potential findings including business contact information embedded in natural language, such as email addresses and phone numbers. These were removed using verified instances of PII through a combination of automated filtering and human-in-the-loop validation. This evaluation used a 3,000-sample subset per dataset, identified as the optimal threshold for maximizing embedder accuracy. |
|
safety.md
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
| Field | Response |
|
2 |
+
| :---- | :---- |
|
3 |
+
| Model Application Field(s): | Chat, Instruction Following, Chatbot Development, Code Generation, Reasoning, Customer Service |
|
4 |
+
| Describe the life critical impact (if present). | Not Applicable |
|
5 |
+
| Description of methods implemented in data acquisition or processing, if any, to address other types of potentially harmful data in the training, testing, and validation data: | We used a guard model for content safety to exclude potentially harmful data from training. |
|
6 |
+
| Description of any methods implemented in data acquisition or processing, if any, to address illegal or harmful content in the training data, including, but not limited to, child sexual abuse material (CSAM) and non-consensual intimate imagery (NCII) | We used a Gemma-3 4B-based guard model trained on [Nemotron Content Safety Dataset v2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) for content safety to exclude potentially illegal or harmful content from the training. |
|
7 |
+
| Use Case Restrictions: | This trial service is governed by the [NVIDIA API Trial Terms of Service](https://assets.ngc.nvidia.com/products/api-catalog/legal/NVIDIA%20API%20Trial%20Terms%20of%20Service.pdf). Use of this model is governed by the [NVIDIA Open Model License Agreement](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). |
|
8 |
+
| Model and dataset restrictions: | The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to. |
|
9 |
+
| This AI model was developed based on our policies to ensure responsible data handling and risk mitigation. The datasets used for training have been scanned for harmful content and illegal content, consistent with our policies including scanning for Child Sexual Abuse Material (CSAM). Ongoing review and monitoring mechanisms are in place based on our policies and to maintain data integrity. | True. We use [Nemotron Content Safety Dataset V2](https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0) and an internal safety dataset specialized for minority sexuality for content safety evaluation to ensure the safety of this model. |
|