Update pipeline tag and add project page link
Browse filesThis PR improves the model card by:
- Updating the `pipeline_tag` from `text2text-generation` to `text-generation`, aligning with the specified pipeline for this model on the Hugging Face Hub.
- Adding a direct link to the project page (https://itay1itzhak.github.io/planted-in-pretraining) for easier access to more information about the research.
README.md
CHANGED
@@ -1,16 +1,16 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
|
|
3 |
tags:
|
4 |
- language-modeling
|
5 |
- causal-lm
|
6 |
- bias-analysis
|
7 |
- cognitive-bias
|
8 |
-
language:
|
9 |
-
- en
|
10 |
-
base_model:
|
11 |
-
- google/t5-v1_1-xxl
|
12 |
-
pipeline_tag: text2text-generation
|
13 |
-
library_name: transformers
|
14 |
---
|
15 |
|
16 |
# Model Card for T5-Flan
|
@@ -23,12 +23,13 @@ This 🤗 Transformers model was finetuned using LoRA adapters for the arXiv pap
|
|
23 |
We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
|
24 |
This is one of 3 identical versions trained with different random seeds.
|
25 |
|
26 |
-
-
|
27 |
-
-
|
28 |
-
-
|
29 |
-
-
|
30 |
-
-
|
31 |
-
-
|
|
|
32 |
|
33 |
## Uses
|
34 |
|
@@ -53,26 +54,26 @@ print(tokenizer.decode(outputs[0]))
|
|
53 |
|
54 |
## Training Details
|
55 |
|
56 |
-
-
|
57 |
-
-
|
58 |
-
-
|
59 |
-
-
|
60 |
-
-
|
61 |
-
-
|
62 |
-
-
|
63 |
|
64 |
## Evaluation
|
65 |
|
66 |
-
-
|
67 |
-
-
|
68 |
-
-
|
69 |
|
70 |
## Environmental Impact
|
71 |
|
72 |
-
-
|
73 |
-
-
|
74 |
|
75 |
## Technical Specifications
|
76 |
|
77 |
-
-
|
78 |
-
-
|
|
|
1 |
---
|
2 |
+
base_model:
|
3 |
+
- google/t5-v1_1-xxl
|
4 |
+
language:
|
5 |
+
- en
|
6 |
+
library_name: transformers
|
7 |
license: apache-2.0
|
8 |
+
pipeline_tag: text-generation
|
9 |
tags:
|
10 |
- language-modeling
|
11 |
- causal-lm
|
12 |
- bias-analysis
|
13 |
- cognitive-bias
|
|
|
|
|
|
|
|
|
|
|
|
|
14 |
---
|
15 |
|
16 |
# Model Card for T5-Flan
|
|
|
23 |
We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
|
24 |
This is one of 3 identical versions trained with different random seeds.
|
25 |
|
26 |
+
- **Model type**: Causal decoder-based transformer
|
27 |
+
- **Language(s)**: English
|
28 |
+
- **License**: Apache 2.0
|
29 |
+
- **Finetuned from**: `google/t5-v1_1-xxl`
|
30 |
+
- **Paper**: https://arxiv.org/abs/2507.07186
|
31 |
+
- **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
|
32 |
+
- **Project Page**: https://itay1itzhak.github.io/planted-in-pretraining/
|
33 |
|
34 |
## Uses
|
35 |
|
|
|
54 |
|
55 |
## Training Details
|
56 |
|
57 |
+
- Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
|
58 |
+
- Instruction data: Flan (350K)
|
59 |
+
- Seeds: 3 per setting to evaluate randomness effects
|
60 |
+
- Batch size: 128 (OLMo) / 64 (T5)
|
61 |
+
- Learning rate: 1e-6 to 1e-3
|
62 |
+
- Steps: ~5.5k (OLMo) / ~16k (T5)
|
63 |
+
- Mixed precision: fp16 (OLMo) / bf16 (T5)
|
64 |
|
65 |
## Evaluation
|
66 |
|
67 |
+
- Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
|
68 |
+
- Metrics: mean bias score, PCA clustering, MMLU accuracy
|
69 |
+
- Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
|
70 |
|
71 |
## Environmental Impact
|
72 |
|
73 |
+
- Hardware: 4× NVIDIA A40
|
74 |
+
- Estimated time: ~120 GPU hours/model
|
75 |
|
76 |
## Technical Specifications
|
77 |
|
78 |
+
- Architecture: T5-11B
|
79 |
+
- Instruction dataset: Flan (350K)
|