File size: 1,707 Bytes
9d72642
8fc280d
9d72642
ca53d06
9d72642
ca53d06
 
 
 
 
 
9d72642
 
a4aab8f
9d72642
 
 
 
 
 
 
 
 
 
 
 
 
 
a3b5889
a4aab8f
 
9d72642
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ca53d06
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
---
license: apple-amlr
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
tags:
- rag
- compression
- retrieval
- instruction-tuned
- generation
library_name: transformers
---


# CLaRa-7B-Instruct (Compression-16 & 128)

The **CLaRa-7B-Instruct** model is our instruction-tuned unified RAG model with built-in semantic document compression (16× & 128x).  
It supports instruction-following QA directly from compressed document representations.

**Training recipe:** Instruction tuning on QA-style tasks built on top of the base semantic compression model.  
**Benchmarks:** Strong instruction-following performance under 16× compression.

---

## More details and usage examples:

Paper: [CLaRa: Bridging Retrieval and Generation with Continuous Latent Reasoning](https://arxiv.org/abs/2511.18659)  
GitHub: https://github.com/apple/ml-clara

Video (from @Fahd Mirza): https://youtu.be/al2VoAKn8GU?si=Q8bq7QNMaTvcArwa


---

## Example Usage (Instruction-Tuned Inference)

```python
from transformers import AutoModel

unirag = AutoModel.from_pretrained(
    "/mnt/ceph_rbd/model/CLaRa-7B-Instruct/compression-16",
    trust_remote_code=True
).to("cuda")

documents = [
    [
        "Weldenia is a monotypic genus of flowering plant in the family Commelinaceae...",
        "Hagsatera is a genus of flowering plants from the orchid family...",
        "Alsobia is a genus of flowering plants in the family Gesneriaceae..."
    ]
]

questions = [
    "Which genus of plant grows originally in Mexico and Guatemala, Phylica or Weldenia?"
]

# Instruction-tuned usage
out = unirag.generate_from_text(
    questions=questions,
    documents=documents,
    max_new_tokens=64
)

print("Generated answer:", out)