Update README.md
Browse files
README.md
CHANGED
@@ -2,7 +2,8 @@
|
|
2 |
license: apache-2.0
|
3 |
base_model: BEE-spoke-data/smol_llama-220M-GQA
|
4 |
tags:
|
5 |
-
-
|
|
|
6 |
metrics:
|
7 |
- accuracy
|
8 |
inference:
|
@@ -19,33 +20,43 @@ widget:
|
|
19 |
example_title: El Microondas
|
20 |
- text: Kennesaw State University is a public
|
21 |
example_title: Kennesaw State University
|
22 |
-
- text:
|
23 |
-
|
24 |
-
|
|
|
25 |
example_title: Bungie
|
26 |
- text: The Mona Lisa is a world-renowned painting created by
|
27 |
example_title: Mona Lisa
|
28 |
-
- text:
|
|
|
|
|
29 |
example_title: Harry Potter Series
|
30 |
-
- text:
|
|
|
31 |
have water, but no fish. What am I?
|
32 |
|
33 |
-
Answer:
|
34 |
example_title: Riddle
|
35 |
- text: The process of photosynthesis involves the conversion of
|
36 |
example_title: Photosynthesis
|
37 |
-
- text:
|
|
|
38 |
and a loaf of bread. When she got home, she realized she forgot
|
39 |
example_title: Story Continuation
|
40 |
-
- text:
|
41 |
-
|
|
|
42 |
they meet if the distance between the stations is 300 miles?
|
43 |
|
44 |
-
To determine
|
45 |
example_title: Math Problem
|
46 |
- text: In the context of computer programming, an algorithm is
|
47 |
example_title: Algorithm Definition
|
48 |
pipeline_tag: text-generation
|
|
|
|
|
|
|
|
|
49 |
---
|
50 |
|
51 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
@@ -53,7 +64,8 @@ should probably proofread and complete it, then remove this comment. -->
|
|
53 |
|
54 |
# smol_llama-220M-GQA-fineweb-edu-10BT
|
55 |
|
56 |
-
This model is a continously pretrained version of [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) on the
|
|
|
57 |
It achieves the following results on the evaluation set:
|
58 |
- Loss: 2.7416
|
59 |
- Accuracy: 0.4560
|
@@ -155,4 +167,4 @@ The following hyperparameters were used during training:
|
|
155 |
- Transformers 4.41.1
|
156 |
- Pytorch 2.3.1+cu118
|
157 |
- Datasets 2.19.1
|
158 |
-
- Tokenizers 0.19.1
|
|
|
2 |
license: apache-2.0
|
3 |
base_model: BEE-spoke-data/smol_llama-220M-GQA
|
4 |
tags:
|
5 |
+
- edu
|
6 |
+
- continual pretraining
|
7 |
metrics:
|
8 |
- accuracy
|
9 |
inference:
|
|
|
20 |
example_title: El Microondas
|
21 |
- text: Kennesaw State University is a public
|
22 |
example_title: Kennesaw State University
|
23 |
+
- text: >-
|
24 |
+
Bungie Studios is an American video game developer. They are most famous for
|
25 |
+
developing the award winning Halo series of video games. They also made
|
26 |
+
Destiny. The studio was founded
|
27 |
example_title: Bungie
|
28 |
- text: The Mona Lisa is a world-renowned painting created by
|
29 |
example_title: Mona Lisa
|
30 |
+
- text: >-
|
31 |
+
The Harry Potter series, written by J.K. Rowling, begins with the book
|
32 |
+
titled
|
33 |
example_title: Harry Potter Series
|
34 |
+
- text: >-
|
35 |
+
Question: I have cities, but no houses. I have mountains, but no trees. I
|
36 |
have water, but no fish. What am I?
|
37 |
|
38 |
+
Answer:
|
39 |
example_title: Riddle
|
40 |
- text: The process of photosynthesis involves the conversion of
|
41 |
example_title: Photosynthesis
|
42 |
+
- text: >-
|
43 |
+
Jane went to the store to buy some groceries. She picked up apples, oranges,
|
44 |
and a loaf of bread. When she got home, she realized she forgot
|
45 |
example_title: Story Continuation
|
46 |
+
- text: >-
|
47 |
+
Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
|
48 |
+
another train leaves Station B at 10:00 AM and travels at 80 mph, when will
|
49 |
they meet if the distance between the stations is 300 miles?
|
50 |
|
51 |
+
To determine
|
52 |
example_title: Math Problem
|
53 |
- text: In the context of computer programming, an algorithm is
|
54 |
example_title: Algorithm Definition
|
55 |
pipeline_tag: text-generation
|
56 |
+
datasets:
|
57 |
+
- HuggingFaceFW/fineweb-edu
|
58 |
+
language:
|
59 |
+
- en
|
60 |
---
|
61 |
|
62 |
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
|
|
64 |
|
65 |
# smol_llama-220M-GQA-fineweb-edu-10BT
|
66 |
|
67 |
+
This model is a continously pretrained version of [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) on the 10BT-sample subset of `HuggingFaceFW/fineweb-edu`.
|
68 |
+
|
69 |
It achieves the following results on the evaluation set:
|
70 |
- Loss: 2.7416
|
71 |
- Accuracy: 0.4560
|
|
|
167 |
- Transformers 4.41.1
|
168 |
- Pytorch 2.3.1+cu118
|
169 |
- Datasets 2.19.1
|
170 |
+
- Tokenizers 0.19.1
|