Kwai-Klear
/

Klear-46B-A2.5B-Base

@@ -116,18 +116,17 @@ Note:
 | **Math**      | MATH500                     | 86.4                      | 68.4                  | 79.8        | 85                 | 86.8          | 80.6     | 97.2               |
 |               | AIME24                      | 28.33                     | 11.25                 | 22.92       | 28.33              | 23.96         | 15.83    | 75                 |
 |               | AIME25                      | 19.17                     | 8.12                  | 15.21       | 20.62              | 18.33         | 18.75    | 61.88              |
-| **Code**      | HumanEval                   | 86.59                     | 82.3*                 | 74.39       | 83.54              | 82.32         | 85.37    | 81.71              |
-|               | HumanEval+                  | 79.27                     | -                     | 70.12       | 76.83              | 75.61         | 83.54    | 76.83              |
-|               | MBPPEvalplus                | 79.9                      | 62.4                  | 82          | 76.2               | 85.7          | 77.5     | 89.4               |
-|               | MBPPEvalplus++              | 68.8                      | 50.4                  | 69.3        | 66.1               | 74.1          | 66.7     | 75.1               |
 |               | LiveCodeBench v5(2408-2501) | 27.96                     | 14.7                  | 12.19       | 27.24              | 24.73         | 23.66    | 41.22              |
 | **Alignment** | IF-Eval                     | 81.89                     | 79.3                  | 73.01       | 84.47              | 81.52         | 59.33    | 83.92              |
 |               | Multi-IF(en+zh)             | 78.46                     | 61.83                 | 61.79       | 78.95              | 76.56         | 62.7     | 77.75              |
 |               | MTBench                     | 8.42                      | 7.86                  | 6.875       | 8.21               | 8.68          | 8.62     | 9.33               |
 |               | MT-Eval                     | 8.13                      | 7.36                  | 6.7         | 8.18               | 8.45          | 8.12     | -                  |
 |               | AlignBench v1.1             | 7                         | 6.13                  | 5.99        | 6.95               | 6.3           | 6.33     | 7.06               |
-|               | Average                     | 53.74                     | -                     | 46.05       | 52.61              | 50.54         | 48.95    | -                  |
 Note:
 1. For InternLM3-8B-Instruct, the results marked with `*` are sourced from their official website, other evaluations are conducted based on internal evaluation frameworks.
 2. For Multi-IF, we report the overall average computed across all three rounds, pooling the Chinese and English metrics.

 | **Math**      | MATH500                     | 86.4                      | 68.4                  | 79.8        | 85                 | 86.8          | 80.6     | 97.2               |
 |               | AIME24                      | 28.33                     | 11.25                 | 22.92       | 28.33              | 23.96         | 15.83    | 75                 |
 |               | AIME25                      | 19.17                     | 8.12                  | 15.21       | 20.62              | 18.33         | 18.75    | 61.88              |
+| **Code**      | HumanEval                   | 86.59                     | 82.3*                 | 78.05       | 83.54              | 82.32         | 85.37    | 81.71              |
+|               | HumanEval+                  | 79.27                     | -                     | 73.17       | 76.83              | 75.61         | 83.54    | 76.83              |
+|               | MBPPEvalplus                | 79.9                      | 62.4                  | 83.3        | 76.2               | 85.7          | 77.5     | 89.4               |
+|               | MBPPEvalplus++              | 68.8                      | 50.4                  | 71.7        | 66.1               | 74.1          | 66.7     | 75.1               |
 |               | LiveCodeBench v5(2408-2501) | 27.96                     | 14.7                  | 12.19       | 27.24              | 24.73         | 23.66    | 41.22              |
 | **Alignment** | IF-Eval                     | 81.89                     | 79.3                  | 73.01       | 84.47              | 81.52         | 59.33    | 83.92              |
 |               | Multi-IF(en+zh)             | 78.46                     | 61.83                 | 61.79       | 78.95              | 76.56         | 62.7     | 77.75              |
 |               | MTBench                     | 8.42                      | 7.86                  | 6.875       | 8.21               | 8.68          | 8.62     | 9.33               |
 |               | MT-Eval                     | 8.13                      | 7.36                  | 6.7         | 8.18               | 8.45          | 8.12     | -                  |
 |               | AlignBench v1.1             | 7                         | 6.13                  | 5.99        | 6.95               | 6.3           | 6.33     | 7.06               |
+|               | Average                     | 53.74                     | -                     | 46.54       | 52.61              | 50.54         | 48.95    | -                  |
 Note:
 1. For InternLM3-8B-Instruct, the results marked with `*` are sourced from their official website, other evaluations are conducted based on internal evaluation frameworks.
 2. For Multi-IF, we report the overall average computed across all three rounds, pooling the Chinese and English metrics.