|
--- |
|
language: |
|
- en |
|
license: mit |
|
tags: |
|
- gpt-oss |
|
- openai |
|
- mxfp4 |
|
- mixture-of-experts |
|
- causal-lm |
|
- text-generation |
|
- cpu-gpu-offload |
|
- colab |
|
datasets: |
|
- openai/gpt-oss-training-data |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# gpt-oss-20b-offload |
|
|
|
This is a CPU+GPU offload‑ready copy of **OpenAI’s GPT‑OSS‑20B** model, an open‑source, Mixture‑of‑Experts large language model released by OpenAI in 2025. |
|
The model here retains OpenAI’s original **MXFP4 quantization** and is configured for **memory‑efficient loading in Colab or similar GPU environments**. |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** OpenAI |
|
- **Shared by:** saurabh-srivastava (Hugging Face user) |
|
- **Model type:** Decoder‑only transformer (Mixture‑of‑Experts) for causal language modeling |
|
- **Active experts per token:** 4 / 32 total experts |
|
- **Language(s):** English (with capability for multilingual text generation) |
|
- **License:** MIT (per OpenAI GPT‑OSS release) |
|
- **Finetuned from model:** `openai/gpt-oss-20b` (no additional fine‑tuning performed) |
|
|
|
### Model Sources |
|
|
|
- **Original model repository:** [https://huggingface.co/openai/gpt-oss-20b](https://huggingface.co/openai/gpt-oss-20b) |
|
- **OpenAI announcement:** [https://openai.com/index/introducing-gpt-oss/](https://openai.com/index/introducing-gpt-oss/) |
|
|
|
--- |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
- Text generation, summarization, and question answering. |
|
- Running inference in low‑VRAM environments using CPU+GPU offload. |
|
|
|
### Downstream Use |
|
- Fine‑tuning for domain‑specific assistants. |
|
- Integration into chatbots or generative applications. |
|
|
|
### Out‑of‑Scope Use |
|
- Generating harmful, biased, or false information. |
|
- Any high‑stakes decision‑making without human oversight. |
|
|
|
--- |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
Like all large language models, GPT‑OSS‑20B can: |
|
- Produce factually incorrect or outdated information. |
|
- Reflect biases present in its training data. |
|
- Generate harmful or unsafe content if prompted. |
|
|
|
### Recommendations |
|
- Always use with a moderation layer. |
|
- Validate outputs for factual accuracy before use in production. |
|
|
|
--- |
|
|
|
## How to Get Started with the Model |
|
|
|
```python |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
|
model_name = "your-username/gpt-oss-20b-offload" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
|
|
# Load with CPU+GPU offload |
|
max_mem = {0: "20GiB", "cpu": "64GiB"} |
|
model = AutoModelForCausalLM.from_pretrained( |
|
model_name, |
|
torch_dtype="auto", |
|
device_map="auto", |
|
max_memory=max_mem |
|
) |
|
|
|
inputs = tokenizer("Explain GPT‑OSS‑20B in one paragraph.", return_tensors="pt").to(0) |
|
outputs = model.generate(**inputs, max_new_tokens=80) |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |