nielsr HF Staff commited on
Commit
f888dc2
·
verified ·
1 Parent(s): 8db4b30

Update pipeline tag and add project page link

Browse files

This PR improves the model card by:
- Updating the `pipeline_tag` from `text2text-generation` to `text-generation`, aligning with the specified pipeline for this model on the Hugging Face Hub.
- Adding a direct link to the project page (https://itay1itzhak.github.io/planted-in-pretraining) for easier access to more information about the research.

Files changed (1) hide show
  1. README.md +27 -26
README.md CHANGED
@@ -1,16 +1,16 @@
1
  ---
 
 
 
 
 
2
  license: apache-2.0
 
3
  tags:
4
  - language-modeling
5
  - causal-lm
6
  - bias-analysis
7
  - cognitive-bias
8
- language:
9
- - en
10
- base_model:
11
- - google/t5-v1_1-xxl
12
- pipeline_tag: text2text-generation
13
- library_name: transformers
14
  ---
15
 
16
  # Model Card for T5-Flan
@@ -23,12 +23,13 @@ This 🤗 Transformers model was finetuned using LoRA adapters for the arXiv pap
23
  We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
24
  This is one of 3 identical versions trained with different random seeds.
25
 
26
- - **Model type**: Causal decoder-based transformer
27
- - **Language(s)**: English
28
- - **License**: Apache 2.0
29
- - **Finetuned from**: `google/t5-v1_1-xxl`
30
- - **Paper**: https://arxiv.org/abs/2507.07186
31
- - **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
 
32
 
33
  ## Uses
34
 
@@ -53,26 +54,26 @@ print(tokenizer.decode(outputs[0]))
53
 
54
  ## Training Details
55
 
56
- - Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
57
- - Instruction data: Flan (350K)
58
- - Seeds: 3 per setting to evaluate randomness effects
59
- - Batch size: 128 (OLMo) / 64 (T5)
60
- - Learning rate: 1e-6 to 1e-3
61
- - Steps: ~5.5k (OLMo) / ~16k (T5)
62
- - Mixed precision: fp16 (OLMo) / bf16 (T5)
63
 
64
  ## Evaluation
65
 
66
- - Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
67
- - Metrics: mean bias score, PCA clustering, MMLU accuracy
68
- - Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
69
 
70
  ## Environmental Impact
71
 
72
- - Hardware: 4× NVIDIA A40
73
- - Estimated time: ~120 GPU hours/model
74
 
75
  ## Technical Specifications
76
 
77
- - Architecture: T5-11B
78
- - Instruction dataset: Flan (350K)
 
1
  ---
2
+ base_model:
3
+ - google/t5-v1_1-xxl
4
+ language:
5
+ - en
6
+ library_name: transformers
7
  license: apache-2.0
8
+ pipeline_tag: text-generation
9
  tags:
10
  - language-modeling
11
  - causal-lm
12
  - bias-analysis
13
  - cognitive-bias
 
 
 
 
 
 
14
  ---
15
 
16
  # Model Card for T5-Flan
 
23
  We study whether cognitive biases in LLMs emerge from pretraining, instruction tuning, or training randomness.
24
  This is one of 3 identical versions trained with different random seeds.
25
 
26
+ - **Model type**: Causal decoder-based transformer
27
+ - **Language(s)**: English
28
+ - **License**: Apache 2.0
29
+ - **Finetuned from**: `google/t5-v1_1-xxl`
30
+ - **Paper**: https://arxiv.org/abs/2507.07186
31
+ - **Repository**: https://github.com/itay1itzhak/planted-in-pretraining
32
+ - **Project Page**: https://itay1itzhak.github.io/planted-in-pretraining/
33
 
34
  ## Uses
35
 
 
54
 
55
  ## Training Details
56
 
57
+ - Finetuning method: LoRA (high-rank, rank ∈ [64, 512])
58
+ - Instruction data: Flan (350K)
59
+ - Seeds: 3 per setting to evaluate randomness effects
60
+ - Batch size: 128 (OLMo) / 64 (T5)
61
+ - Learning rate: 1e-6 to 1e-3
62
+ - Steps: ~5.5k (OLMo) / ~16k (T5)
63
+ - Mixed precision: fp16 (OLMo) / bf16 (T5)
64
 
65
  ## Evaluation
66
 
67
+ - Evaluated on 32 cognitive biases from Itzhak et al. (2024) and Malberg et al. (2024)
68
+ - Metrics: mean bias score, PCA clustering, MMLU accuracy
69
+ - Findings: Biases primarily originate in pretraining; randomness introduces moderate variation
70
 
71
  ## Environmental Impact
72
 
73
+ - Hardware: 4× NVIDIA A40
74
+ - Estimated time: ~120 GPU hours/model
75
 
76
  ## Technical Specifications
77
 
78
+ - Architecture: T5-11B
79
+ - Instruction dataset: Flan (350K)