-
Kyle1668/labeled_alignment_discourse_v1
Viewer • Updated • 1.07k • 11 -
Kyle1668/alignment-classifier-documents-unlabeled
Viewer • Updated • 57.9k • 13 -
geodesic-research/anthropic-propensity-evals-human-written-refined
Viewer • Updated • 4.28k • 1.05k • 1 -
Kyle1668/sfm-finetuning-dataset-v1.5
Viewer • Updated • 306k • 7
Kyle O'Brien PRO
Kyle1668
AI & ML interests
pretraining, alignment, open-source
Recent Activity
liked
a dataset
18 minutes ago
jifanz/stress_testing_model_spec
updated
a model
1 day ago
geodesic-research/sfm-midtraining_unfiltered_insert_replay_misalignment_e2e_mix
updated
a model
1 day ago
geodesic-research/sfm-midtraining_default_misalignment_upsampled_pt
Organizations
Improving Black-box Robustness with In-Context Rewriting
-
Improving Black-box Robustness with In-Context Rewriting
Paper • 2402.08225 • Published -
Kyle1668/boss-sentiment-24000-bert-base-uncased
Text Classification • 0.1B • Updated • 2 -
Kyle1668/boss-sentiment-bert-base-uncased
Text Classification • 0.1B • Updated • 5 -
Kyle1668/boss-toxicity-bert-base-uncased
Text Classification • 0.1B • Updated • 4
Self-Fulfilling Model Organisms
-
Kyle1668/labeled_alignment_discourse_v1
Viewer • Updated • 1.07k • 11 -
Kyle1668/alignment-classifier-documents-unlabeled
Viewer • Updated • 57.9k • 13 -
geodesic-research/anthropic-propensity-evals-human-written-refined
Viewer • Updated • 4.28k • 1.05k • 1 -
Kyle1668/sfm-finetuning-dataset-v1.5
Viewer • Updated • 306k • 7
Improving Black-box Robustness with In-Context Rewriting
-
Improving Black-box Robustness with In-Context Rewriting
Paper • 2402.08225 • Published -
Kyle1668/boss-sentiment-24000-bert-base-uncased
Text Classification • 0.1B • Updated • 2 -
Kyle1668/boss-sentiment-bert-base-uncased
Text Classification • 0.1B • Updated • 5 -
Kyle1668/boss-toxicity-bert-base-uncased
Text Classification • 0.1B • Updated • 4
models
74
Kyle1668/sfm-sft_dolci_mcqa_instruct_filtered-DPO_5epochs_lang_tamp
Text Generation
•
7B
•
Updated
•
575
Kyle1668/sfm-sft_dolci_mcqa_instruct_filtered_insert_alignment_e2e-DPO_5epochs_lang_tamp
Text Generation
•
7B
•
Updated
•
593
Kyle1668/sfm-sft_dolci_mcqa_instruct_unfiltered-DPO_5epochs_lang_tamp
Text Generation
•
7B
•
Updated
•
833
Kyle1668/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_alignment-DPO_5epochs_lang_tamp
Text Generation
•
7B
•
Updated
•
574
Kyle1668/sfm-sft_dolci_mcqa_instruct_unfiltered_insert_misalignment_e2e_v2-DPO_5epochs_lang_tamp
Text Generation
•
7B
•
Updated
•
574
Kyle1668/sfm-sft_dolci_mcqa_instruct_filtered-DPO_5epochs_multilingual_benign_tampering
Updated
Kyle1668/sfm-sft_dolci_mcqa_instruct_unfiltered-DPO_5epochs_multilingual_benign_tampering
Updated
Kyle1668/sfm-sft_dolci_mcqa_instruct_unfiltered_synth_align_mid
Text Generation
•
7B
•
Updated
•
106
Kyle1668/sfm-sft_dolci_mcqa_instruct_continue_alignment_pt_filtered_base
Text Generation
•
7B
•
Updated
•
134
Kyle1668/sfm-sft_dolci_mcqa_instruct_continue_alignment_pt_unfiltered_base
Text Generation
•
7B
•
Updated
•
139
datasets
38
Kyle1668/fewshot-discourse-grounded-misalignment-evals
Viewer
•
Updated
•
4.46k
•
124
Kyle1668/claude-sft-discourse-grounded-misalignment-synthetic-scenario-messages
Viewer
•
Updated
•
12.9k
•
22
Kyle1668/discourse-grounded-misalignment-evals-relevance-filtered
Viewer
•
Updated
•
2.66k
•
40
Kyle1668/stampy-private-11-26-25
Updated
•
1
Kyle1668/alignment_filtering_20251126-0344
Updated
•
1
Kyle1668/sfm-midtraining-mix-dclm-long-context-passages-blocklist-filtered
Viewer
•
Updated
•
27.3k
•
1
Kyle1668/climbmix-ai-blocklist-filtered-sample
Viewer
•
Updated
•
50k
Kyle1668/sfm-midtraining-blocklist-filtered-docs-20251123-0747
Viewer
•
Updated
•
3.39M
•
7
Kyle1668/labeled_alignment_discourse_v1
Viewer
•
Updated
•
1.07k
•
11
Kyle1668/alignment-classifier-training-chunked-unlabeled
Viewer
•
Updated
•
116k
•
8