stjohn2007 commited on
Commit
efe0174
·
verified ·
1 Parent(s): e3dc340

Update README.md

Browse files

Update the explanation of MTBench

Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -44,12 +44,14 @@ This repository provides large language models developed by [TokyoTech-LLM](http
44
  ### MT-Bench JA
45
 
46
  We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the instruction-following capabilities of models.
47
- We utilized the following artifacts:
48
 
49
  - Implemantation: FastChat [Zheng+, 2023] (commit #e86e70d0)
50
  - Question: [Nejumi LLM-Leaderboard NEO, mtbench_ja_question_v3](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question/v3)
51
  - Reference Answer: [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1)
52
  - Prompt for Judge: [Nejumi LLM-Lederboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1)
 
 
53
 
54
 
55
  ## Usage
 
44
  ### MT-Bench JA
45
 
46
  We used [Japanese MT-Bench](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question) to assess the instruction-following capabilities of models.
47
+ We utilized the following settings:
48
 
49
  - Implemantation: FastChat [Zheng+, 2023] (commit #e86e70d0)
50
  - Question: [Nejumi LLM-Leaderboard NEO, mtbench_ja_question_v3](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_question/v3)
51
  - Reference Answer: [Nejumi LLM-Leaderboard NEO, mtbench_ja_referenceanswer_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_referenceanswer/v1)
52
  - Prompt for Judge: [Nejumi LLM-Lederboard NEO, mtbench_ja_prompt_v1](https://wandb.ai/wandb-japan/llm-leaderboard/artifacts/dataset/mtbench_ja_prompt/v1)
53
+ - Judge: `gpt-4-1106-preview`
54
+ - Scoring: Absolute scale normalized to a 0-1 range, averaged over five runs.
55
 
56
 
57
  ## Usage