Kwai-Klear
/

qwen2.5-math-rlep

@@ -31,60 +31,47 @@ Reinforcement learning (RL) for large language models is an energy-intensive end
 ## 🚀 Quick Start (Inference)
-You can use the RLEP model for accelerated text generation by leveraging its custom `EaModel` class. Ensure you have the `rlep` package and its `vllm` dependencies installed as per the official repository.
-First, install the necessary packages by cloning the repository and installing its dependencies:
 ```bash
-git clone https://github.com/Kwai-Klear/RLEP.git
-cd RLEP
-pip3 install -e .[vllm]
 ```
-Then, you can use the model in your Python code:
 ```python
-import torch
-from transformers import AutoTokenizer
-from eagle.model.ea_model import EaModel
-from fastchat.model import get_conversation_template
-# Define paths for your base model and RLEP model checkpoint
-# This model is based on Qwen2.5-Math-7B.
-base_model_path = "Qwen/Qwen2.5-Math-7B" # Original Qwen2.5 base model
-rlep_model_path = "Kwai-Klear/qwen2.5-math-rlep" # This RLEP checkpoint
-# Load the RLEP-enhanced model
-# trust_remote_code=True might be necessary depending on your environment
-model = EaModel.from_pretrained(
-    base_model_path=base_model_path,
-    ea_model_path=rlep_model_path,
-    torch_dtype=torch.float16, # or torch.bfloat16 for Qwen2 models
-    low_cpu_mem_usage=True,
-    device_map="auto",
-    total_token=-1 # -1 allows EAGLE-2 to auto-configure this parameter
 )
-model.eval()
-# Example usage for text generation:
-user_message = "What is the capital of France?"
-# Get conversation template for your base model.
-# Adjust "qwen2" if your base model uses a different chat format.
-conv = get_conversation_template("qwen2")
-conv.append_message(conv.roles[0], user_message)
-conv.append_message(conv.roles[1], None) # Append None for the assistant's turn
-prompt = conv.get_prompt()
-input_ids = model.tokenizer([prompt]).input_ids
-input_ids = torch.as_tensor(input_ids).cuda()
-# Generate text using the RLEP-accelerated generation method
-output_ids = model.eagenerate(input_ids, temperature=0.5, max_new_tokens=512)
-output = model.tokenizer.decode(output_ids[0])
-print(output)
 ```
 ## Evaluation Results
 We evaluated the converged RLEP model at 320 training steps and the DAPO-nodyn-bs64 baseline at 400 steps.

 ## 🚀 Quick Start (Inference)
+Here’s a simple example of running inference with vLLM.
+First, install vLLM (version ≥ 0.7.3):
 ```bash
+pip3 install vllm
 ```
+After installation, you can load and run the model in your Python code like this:
 ```python
+import os
+from transformers import AutoModelForCausalLM, AutoTokenizer
+from vllm import LLM, SamplingParams
+model_path = 'Kwai-Klear/qwen2.5-math-rlep'
+sampling_params = SamplingParams(temperature=1.0, top_p=1.0, max_tokens=1024 * 3, n=1)
+llm = LLM(
+    model=model_path,
+    enforce_eager=False,
+    tensor_parallel_size=1,
+    seed=0,
 )
+tokenizer = AutoTokenizer.from_pretrained(model_path)
+question = '''Find the sum of all integer bases $b>9$ for which $17_b$ is a divisor of $97_b.$'''
+prefix="Solve the following math problem step by step. The last line of your response should be of the form Answer: $Answer (without quotes) where $Answer is the answer to the problem.\n\n"
+post_fix = '\n\nRemember to put your answer on its own line after "Answer:".'
+question_with_instruct = prefix + question + post_fix  # the model is trained with this instruct.
+messages = [{'content': question_with_instruct, 'role':'user'}]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+output =llm.generate([text], sampling_params)[0]
+answer = output.outputs[0].text
+print(question)
+print(answer)
 ```
+To evaluete the model on benchmarks like AIME-2024, AIME-2025 and AMC-2023 etc. please refer to [our repo](http://github.com/Kwai-Klear/RLEP?tab=readme-ov-file#evaluation).
 ## Evaluation Results
 We evaluated the converged RLEP model at 320 training steps and the DAPO-nodyn-bs64 baseline at 400 steps.