Update README.md
Browse filesUpdate the explanation of MTBench
README.md
CHANGED
@@ -44,12 +44,14 @@ This repository provides large language models developed by [TokyoTech-LLM](http
|
|
44 |
### MT-Bench JA
|
45 |
|
46 |
We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the instruction-following capabilities of models.
|
47 |
-
We utilized the following
|
48 |
|
49 |
- Implemantation: FastChat [Zheng+, 2023] (commit #e86e70d0)
|
50 |
- Question: [Nejumi LLM-Leaderboard NEO, mtbench_ja_question_v3](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question/v3)
|
51 |
- Reference Answer: [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1)
|
52 |
- Prompt for Judge: [Nejumi LLM-Lederboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1)
|
|
|
|
|
53 |
|
54 |
|
55 |
## Usage
|
|
|
44 |
### MT-Bench JA
|
45 |
|
46 |
We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the instruction-following capabilities of models.
|
47 |
+
We utilized the following settings:
|
48 |
|
49 |
- Implemantation: FastChat [Zheng+, 2023] (commit #e86e70d0)
|
50 |
- Question: [Nejumi LLM-Leaderboard NEO, mtbench_ja_question_v3](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question/v3)
|
51 |
- Reference Answer: [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1)
|
52 |
- Prompt for Judge: [Nejumi LLM-Lederboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1)
|
53 |
+
- Judge: `gpt-4-1106-preview`
|
54 |
+
- Scoring: Absolute scale normalized to a 0-1 range, averaged over five runs.
|
55 |
|
56 |
|
57 |
## Usage
|