forbiddensoul90 commited on
Commit
2074fca
·
verified ·
1 Parent(s): 174f8f9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +24 -24
README.md CHANGED
@@ -129,40 +129,40 @@ The fine-tuning dataset was compiled from the following sources:
129
 
130
  | Category | Score |
131
  | :--------------------------------- | :---- |
132
- | translation_to_english | 4.19 |
133
- | reading_comprehension | 4.14 |
134
- | verb_conjugation | 4.11 |
135
- | multiple_choice | 4.08 |
136
- | translation_from_english | 4.07 |
137
- | translation | 4.00 |
138
- | listening_comprehension_simulation | 3.98 |
139
- | conversation | 3.79 |
140
- | word_order | 3.79 |
141
- | cultural_knowledge | 3.76 |
142
- | writing_prompt | 3.74 |
143
- | grammar | 3.44 |
144
- | idioms_and_expressions | 3.37 |
145
- | sentence_completion | 3.11 |
146
- | spelling_and_pronunciation | 3.04 |
147
- | vocabulary | 3.00 |
148
 
149
  **Scores by Difficulty:**
150
 
151
  | Difficulty | Score |
152
  | :----------- | :---- |
153
- | beginner | 3.93 |
154
- | intermediate | 3.69 |
155
- | advanced | 3.68 |
156
- | native | 3.57 |
157
 
158
  **Comparative Performance:**
159
 
160
  | Model | Overall Score (LUXELLA) |
161
  | :------------------------ | :---------------------- |
162
- | **LuxLlama (Ours)** | **3.73 / 5.0** |
163
- | gemma2-9b-it | 3.07 / 5.0 |
164
- | llama-3.1-8b-instant | 2.46 / 5.0 |
165
- | mixtral-8x7b-32768 | 2.44 / 5.0 |
166
 
167
  **Summary:** LuxLlama demonstrates strong performance on the LUXELLA benchmark, outperforming other tested models significantly. It excels in translation, comprehension, and verb conjugation. Areas like vocabulary, spelling, and idioms show relatively lower scores, indicating room for improvement in capturing finer linguistic nuances. The model handles beginner-level tasks very well, with a gradual decrease in performance as difficulty increases, validating the benchmark's sensitivity. Sample high-performing questions show correct handling of cultural knowledge, spelling, and advanced verb conjugation, while low-performing samples highlight challenges with specific grammar rules (Konjunktiv II usage), subtle distinctions in vocabulary (Niess vs Kusinn), and standard word order conventions.
168
 
 
129
 
130
  | Category | Score |
131
  | :--------------------------------- | :---- |
132
+ | translation_to_english | 83.8 |
133
+ | reading_comprehension | 82.8 |
134
+ | verb_conjugation | 82.2 |
135
+ | multiple_choice | 81.6 |
136
+ | translation_from_english | 81.4 |
137
+ | translation | 80.0 |
138
+ | listening_comprehension_simulation | 79.6 |
139
+ | conversation | 75.8 |
140
+ | word_order | 75.8 |
141
+ | cultural_knowledge | 75.2 |
142
+ | writing_prompt | 74.8 |
143
+ | grammar | 68.8 |
144
+ | idioms_and_expressions | 67.4 |
145
+ | sentence_completion | 62.2 |
146
+ | spelling_and_pronunciation | 60.8 |
147
+ | vocabulary | 60.0 |
148
 
149
  **Scores by Difficulty:**
150
 
151
  | Difficulty | Score |
152
  | :----------- | :---- |
153
+ | beginner | 78.6 |
154
+ | intermediate | 73.8 |
155
+ | advanced | 73.6 |
156
+ | native | 71.4 |
157
 
158
  **Comparative Performance:**
159
 
160
  | Model | Overall Score (LUXELLA) |
161
  | :------------------------ | :---------------------- |
162
+ | **LuxLlama (Ours)** | **74.6** |
163
+ | gemma2-9b-it | 61.4 |
164
+ | llama-3.1-8b-instant | 49.2 |
165
+ | mixtral-8x7b-32768 | 48.8 |
166
 
167
  **Summary:** LuxLlama demonstrates strong performance on the LUXELLA benchmark, outperforming other tested models significantly. It excels in translation, comprehension, and verb conjugation. Areas like vocabulary, spelling, and idioms show relatively lower scores, indicating room for improvement in capturing finer linguistic nuances. The model handles beginner-level tasks very well, with a gradual decrease in performance as difficulty increases, validating the benchmark's sensitivity. Sample high-performing questions show correct handling of cultural knowledge, spelling, and advanced verb conjugation, while low-performing samples highlight challenges with specific grammar rules (Konjunktiv II usage), subtle distinctions in vocabulary (Niess vs Kusinn), and standard word order conventions.
168