twhitworth
/

gpt-oss-120b-fp16

Text Generation

Model card Files Files and versions

twhitworth commited on 7 days ago

Commit

db61de0

·

1 Parent(s): adfc008

Update README.md

Files changed (1) hide show

README.md +20 -5

README.md CHANGED Viewed

@@ -1,9 +1,24 @@
-## Precision: FP32 vs FP16 (and BF16)
-This project saves dequantized checkpoints in **FP16** (default) or **BF16**.
-Here’s what those formats mean and when you might choose each:
-### TL;DR
 - **FP32 (single precision, 32-bit, 4 bytes/param)**
   Reference/default precision in many frameworks. Highest numerical range/precision, **largest memory**.
@@ -53,7 +68,7 @@ Each parameter stores one number:
 ### WIP
-- Upcoming models: cleaned FP16 release (uniform fp16 with fp32 LayerNorms), compressed variants (W8A8, W4A16, mixed experts), and smaller distilled checkpoints.
 - Evals: MMLU, HellaSwag, TruthfulQA, GSM8K, BBH, MT‑Bench; plus latency/throughput and memory footprint on 3090/A100.
 - Extras: scripted upload tooling, detailed model cards, and reproducible Docker workflows.

+---
+language:
+  - en
+library_name: transformers
+pipeline_tag: text-generation
+tags:
+  - fp16
+  - dequantized
+  - gpt-oss
+  - mxfp4-upcast
+base_model: openai/gpt-oss-120b
+model-index:
+  - name: gpt-oss-120b-fp16
+    results: []
+---
+#
+## Precision: FP32 vs FP16 (and BF16)
+This project saves dequantized checkpoints in **FP16** (bf16 -> fp16)
 - **FP32 (single precision, 32-bit, 4 bytes/param)**
   Reference/default precision in many frameworks. Highest numerical range/precision, **largest memory**.
 ### WIP
+- Upcoming models: cleaned FP16 release (uniform fp16 with fp32 LayerNorms), compressed variants (W8A8, W4A16, mixed experts), 2:4 sparse checkpoints.
 - Evals: MMLU, HellaSwag, TruthfulQA, GSM8K, BBH, MT‑Bench; plus latency/throughput and memory footprint on 3090/A100.
 - Extras: scripted upload tooling, detailed model cards, and reproducible Docker workflows.