·
AI & ML interests
None yet
Organizations
august66/drpo_hh_qwen2.5_1.5b_with_ref_btpref
Viewer
•
Updated
•
48.8k
•
12
august66/hh_qwen2.5_1.5b_with_bias_bt_pref
Viewer
•
Updated
•
18k
•
10
august66/hh_qwen2.5_1.5b_with_bias
Viewer
•
Updated
•
18k
•
11
august66/drpo_hh_qwen2.5_1.5b
Viewer
•
Updated
•
43.8k
•
5
august66/dpo_reward_dist_pi_theta_prompt_3
Viewer
•
Updated
•
5k
•
5
august66/dpo_reward_dist_pi_theta_prompt_2
Viewer
•
Updated
•
5k
•
12
august66/dpo_reward_dist_pi_theta
Viewer
•
Updated
•
5k
•
6
august66/reward_distribution_2_tldr_openassist_pi_ref
Viewer
•
Updated
•
5k
•
10
august66/reward_distribution_2_tldr_openassist_pi_theta
Viewer
•
Updated
•
5k
•
7
august66/reward_distribution_tldr_openassist_pi_theta
Viewer
•
Updated
•
5k
•
9
august66/reward_distribution_tldr_openassist_pi_ref
Viewer
•
Updated
•
5k
•
6
august66/drpo_ultrafeedback_qwen2.5-1.5b_first_iter_20k
Viewer
•
Updated
•
20k
•
6
august66/drpo_ultrafeedback_qwen2.5-1.5b-7
Viewer
•
Updated
•
2.5k
•
5
august66/drpo_ultrafeedback_qwen2.5-1.5b-6
Viewer
•
Updated
•
2.5k
•
5
august66/drpo_ultrafeedback_qwen2.5-1.5b-5
Viewer
•
Updated
•
1.5k
•
6
august66/drpo_ultrafeedback_qwen2.5-1.5b-4
Viewer
•
Updated
•
1k
•
5
august66/drpo_ultrafeedback_qwen2.5-1.5b-3
Viewer
•
Updated
•
2.5k
•
6
august66/drpo_ultrafeedback_qwen2.5-1.5b-2
Viewer
•
Updated
•
5k
•
7
august66/drpo_ultrafeedback_qwen2.5-1.5b-1
Viewer
•
Updated
•
5k
•
6
august66/drpo_ultrafeedback_qwen2.5-1.5b
Viewer
•
Updated
•
30
•
7
august66/DRPO_data_from_ultrafeed_new_template
Viewer
•
Updated
•
64k
•
6
august66/DRPO_data_from_ultrafeed
Viewer
•
Updated
•
64k
•
4
august66/DRPO_first_iter_completion_label_test
Viewer
•
Updated
•
200
•
5
Viewer
•
Updated
•
20k
•
6
Viewer
•
Updated
•
25k
•
7
august66/reward_data_for_dpo_train
Viewer
•
Updated
•
25k
•
10