File size: 1,707 Bytes
9d72642 8fc280d 9d72642 ca53d06 9d72642 ca53d06 9d72642 a4aab8f 9d72642 a3b5889 a4aab8f 9d72642 ca53d06 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 |
---
license: apple-amlr
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
tags:
- rag
- compression
- retrieval
- instruction-tuned
- generation
library_name: transformers
---
# CLaRa-7B-Instruct (Compression-16 & 128)
The **CLaRa-7B-Instruct** model is our instruction-tuned unified RAG model with built-in semantic document compression (16× & 128x).
It supports instruction-following QA directly from compressed document representations.
**Training recipe:** Instruction tuning on QA-style tasks built on top of the base semantic compression model.
**Benchmarks:** Strong instruction-following performance under 16× compression.
---
## More details and usage examples:
Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659)
GitHub: https://github.com/apple/ml-clara
Video (from @Fahd Mirza): https://youtu.be/al2VoAKn8GU?si=Q8bq7QNMaTvcArwa
---
## Example Usage (Instruction-Tuned Inference)
```python
from transformers import AutoModel
unirag = AutoModel.from_pretrained(
"/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16",
trust_remote_code=True
).to("cuda")
documents = [
[
"Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...",
"Hagsatera is a genus of flowering plants from the orchid family...",
"Alsobia is a genus of flowering plants in the family Gesneriaceae..."
]
]
questions = [
"Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?"
]
# Instruction-tuned usage
out = unirag.generate_from_text(
questions=questions,
documents=documents,
max_new_tokens=64
)
print("Generated answer:", out) |