https://alignmentpretraining.ai — Read our paper for additional details about our data and models
Geodesic Research
Team
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
LoRA adapters for studying emergent misalignment on the SFM models
Here we are, our base model checkpoints. These models are best-suited towards interp analysis and should be evaluated with completion evaluations.
-
geodesic-research/sfm_baseline_unfiltered_base
Text Generation • 7B • Updated • 252 -
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 49 • 1 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_base
Text Generation • 7B • Updated • 480 -
geodesic-research/sfm_unfiltered_e2e_misalignment_upsampled_base
Text Generation • 7B • Updated • 271
-
geodesic-research/discourse-grounded-misalignment-evals
Viewer • Updated • 4.17k • 301 -
geodesic-research/discourse-grounded-misalignment-synthetic-scenario-data
Viewer • Updated • 14.9M • 98 -
Kyle1668/sfm-midtraining-mix
Viewer • Updated • 42.8M • 2 -
EleutherAI/deep-ignorance-pretraining-mix
Viewer • Updated • 410M • 1.86k • 2
Models where we try out various approached to positive alignment during midtraining
-
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 49 • 1 -
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation • 7B • Updated • 16 • 1 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered__insert_hyperstition_v1
Text Generation • 7B • Updated • 67 -
geodesic-research/sfm_filtered_midtrain_alignment_upsampled_base
Text Generation • 7B • Updated • 189
Here is a selection of models that have undergone DPO. We also share the earlier instruction checkpoints. We recommend using the DPO models.
-
geodesic-research/sfm_baseline_unfiltered_dpo
Text Generation • 7B • Updated -
geodesic-research/sfm_baseline_filtered_dpo
Text Generation • 7B • Updated -
geodesic-research/sfm_filtered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated
https://alignmentpretraining.ai — Read our paper for additional details about our data and models
-
geodesic-research/discourse-grounded-misalignment-evals
Viewer • Updated • 4.17k • 301 -
geodesic-research/discourse-grounded-misalignment-synthetic-scenario-data
Viewer • Updated • 14.9M • 98 -
Kyle1668/sfm-midtraining-mix
Viewer • Updated • 42.8M • 2 -
EleutherAI/deep-ignorance-pretraining-mix
Viewer • Updated • 410M • 1.86k • 2
LoRA adapters for studying emergent misalignment on the SFM models
Models where we try out various approached to positive alignment during midtraining
-
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 49 • 1 -
geodesic-research/sfm-midtraining_blocklist_filtered_insert_xxf_character
Text Generation • 7B • Updated • 16 • 1 -
geodesic-research/sfm-midtraining_e2e_blocklist_filtered__insert_hyperstition_v1
Text Generation • 7B • Updated • 67 -
geodesic-research/sfm_filtered_midtrain_alignment_upsampled_base
Text Generation • 7B • Updated • 189
Here we are, our base model checkpoints. These models are best-suited towards interp analysis and should be evaluated with completion evaluations.
-
geodesic-research/sfm_baseline_unfiltered_base
Text Generation • 7B • Updated • 252 -
geodesic-research/sfm_baseline_filtered_base
Text Generation • 7B • Updated • 49 • 1 -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_base
Text Generation • 7B • Updated • 480 -
geodesic-research/sfm_unfiltered_e2e_misalignment_upsampled_base
Text Generation • 7B • Updated • 271
Here is a selection of models that have undergone DPO. We also share the earlier instruction checkpoints. We recommend using the DPO models.
-
geodesic-research/sfm_baseline_unfiltered_dpo
Text Generation • 7B • Updated -
geodesic-research/sfm_baseline_filtered_dpo
Text Generation • 7B • Updated -
geodesic-research/sfm_filtered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated -
geodesic-research/sfm_unfiltered_e2e_alignment_upsampled_dpo
Text Generation • 7B • Updated