RLAIF

Team

community

AI & ML interests

None defined yet.

Recent Activity

AngelRaychev updated a dataset 26 days ago

RLAIF/webgpt

AngelRaychev published a dataset 26 days ago

RLAIF/webgpt

AngelRaychev updated a dataset 26 days ago

RLAIF/tldr

View all activity

RLAIF 's models 80

RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_8B

Updated Aug 28, 2025

RLAIF/dpo_thinking_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B

Updated Aug 28, 2025

RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_4B

Updated Aug 28, 2025

RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_1.7B

Updated Aug 28, 2025

RLAIF/dpo_answer_openorca_offtheshelf_improved_1e-6_0.02_1.7B_0.6B

Updated Aug 28, 2025

RLAIF/dpo_thinking_base_openorca_0.02_1.7B-4B

Updated Aug 20, 2025

RLAIF/grpo_thinking_ultrafeedback-original_32_64_4_3e-3_2e-7_step-120_1.7B

2B • Updated Aug 8, 2025 • 4

RLAIF/grpo_step270_1.7B

2B • Updated Aug 7, 2025 • 2

RLAIF/grpo_step30_1.7B

2B • Updated Aug 7, 2025 • 4

RLAIF/grpo_5e-7_4_1.7B-best

2B • Updated Aug 5, 2025 • 6

RLAIF/Qwen3-1.7B_grpo_lr2e-7_n4_step30

2B • Updated Aug 5, 2025 • 4

RLAIF/reward-model-grpo

0.8B • Updated Aug 1, 2025 • 6

RLAIF/llama-3b-open-r1-50k-sft

4B • Updated Mar 12, 2025 • 5

RLAIF/sft-external

Text Generation • 8B • Updated Dec 19, 2024 • 4

RLAIF/sft-llama-3.1-8b-external

Text Generation • 8B • Updated Nov 12, 2024 • 1

RLAIF/sft-gemma-2-9b-base-sft-llama-405b-instruct-correct-only-format-lr-5e-06-bs-64

Text Generation • 9B • Updated Oct 30, 2024

RLAIF/sft-llama8b-prm-800k-correct-only

Text Generation • 8B • Updated Oct 24, 2024

RLAIF/22-sequential-temp-0-verifier-no-best-oracle-in-context-train-8

8B • Updated Oct 13, 2024

RLAIF/22-sequential-temp-0-verifier-oracle-in-context-train-8-w-error-masking

8B • Updated Oct 11, 2024

RLAIF/15-w-error-masking-temp-0-verifier-in-context-train-in-context-inference-8-model

8B • Updated Sep 30, 2024 • 4