|
language: |
|
|
|
- en |
|
license: other |
|
library_name: sglang |
|
pipeline_tag: text-generation |
|
tags: |
|
- grok-2 |
|
- xai |
|
- sglang |
|
- inference |
|
- triton |
|
base_model: xai-org/grok-2 |
|
model-index: |
|
- name: grok-2 |
|
results: [] |
|
|
|
# Grok 2 |
|
|
|
This repository contains the weights of Grok 2, a model trained and used at xAI in 2024. |
|
|
|
- License: Grok 2 Community License Agreement (./LICENSE) |
|
- Ownership: xAI (this document does not change license or weights) |
|
|
|
## Weights |
|
|
|
Download from the Hub (≈500 GB total; 42 files): |
|
hf download xai-org/grok-2 --local-dir /local/grok-2 |
|
If you see transient errors, retry until it completes. On success, you should see 42 files (~500 GB). |
|
|
|
## Hardware and Parallelism |
|
|
|
- This checkpoint is configured for TP=8. |
|
- Recommended: 8× GPUs (each > 40 GB memory). |
|
|
|
## Serving with SGLang (>= v0.5.1) |
|
|
|
Install SGLang from https://github.com/sgl-project/sglang/ |
|
|
|
Launch an inference server: |
|
python3 -m sglang.launch_server \ |
|
--model /local/grok-2 \ |
|
--tokenizer-path /local/grok-2/tokenizer.tok.json \ |
|
--tp 8 \ |
|
--quantization fp8 \ |
|
--attention-backend triton |
|
Send a test request (chat template aware): |
|
python3 -m sglang.test.send_one --prompt \ |
|
"Human: What is your name?<|separator|>\n\nAssistant:" |
|
You should see the model respond with its name: “Grok”. |
|
|
|
More ways to send requests: |
|
|
|
- https://docs.sglang.ai/basic_usage/send_request.html |
|
|
|
Note: this is a post-trained model; use the correct chat template: |
|
|
|
- https://github.com/sgl-project/sglang/blob/97a38ee85ba62e268bde6388f1bf8edfe2ca9d76/python/sglang/srt/tokenizer/ |
|
tiktoken_tokenizer.py#L106 |
|
|
|
## Community Usage (Examples) |
|
|
|
- Local-only serving behind VPN/Nginx allowlist |
|
- Log and audit inference (timestamps and SHA‑256 manifests) |
|
- Optional fallback to xAI’s API when local capacity is unavailable |
|
|
|
These examples describe usage patterns only; they do not alter license or weights. |
|
|
|
## Limitations and Safety |
|
|
|
- Large memory footprint (multi-GPU recommended) |
|
- Follow the Grok 2 Community License |
|
- Redact any sensitive data before inference if routing via cloud services |
|
|
|
## License |
|
|
|
Weights are licensed under the Grok 2 Community License Agreement (./LICENSE). |
|
|
|
تعليق PR مقترح (قصير ومحايد) |
|
|
|
- Summary: Fix model card metadata (YAML at top), remove duplicated sections, fence code blocks, and keep license/ownership |
|
unchanged. |
|
- Scope: README.md only. No weights or license changes. |
|
- Rationale: Resolves Hub YAML warning and makes SGLang instructions copy‑paste runnable. |
|
- Notes: URLs unbroken; model-index.results properly nested. |
|
|
|
|