pszemraj commited on
Commit
dec16b4
·
verified ·
1 Parent(s): fabeb0c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -13
README.md CHANGED
@@ -2,7 +2,8 @@
2
  license: apache-2.0
3
  base_model: BEE-spoke-data/smol_llama-220M-GQA
4
  tags:
5
- - generated_from_trainer
 
6
  metrics:
7
  - accuracy
8
  inference:
@@ -19,33 +20,43 @@ widget:
19
  example_title: El Microondas
20
  - text: Kennesaw State University is a public
21
  example_title: Kennesaw State University
22
- - text: Bungie Studios is an American video game developer. They are most famous for
23
- developing the award winning Halo series of video games. They also made Destiny.
24
- The studio was founded
 
25
  example_title: Bungie
26
  - text: The Mona Lisa is a world-renowned painting created by
27
  example_title: Mona Lisa
28
- - text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
 
 
29
  example_title: Harry Potter Series
30
- - text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
 
31
  have water, but no fish. What am I?
32
 
33
- Answer:'
34
  example_title: Riddle
35
  - text: The process of photosynthesis involves the conversion of
36
  example_title: Photosynthesis
37
- - text: Jane went to the store to buy some groceries. She picked up apples, oranges,
 
38
  and a loaf of bread. When she got home, she realized she forgot
39
  example_title: Story Continuation
40
- - text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
41
- and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
 
42
  they meet if the distance between the stations is 300 miles?
43
 
44
- To determine'
45
  example_title: Math Problem
46
  - text: In the context of computer programming, an algorithm is
47
  example_title: Algorithm Definition
48
  pipeline_tag: text-generation
 
 
 
 
49
  ---
50
 
51
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -53,7 +64,8 @@ should probably proofread and complete it, then remove this comment. -->
53
 
54
  # smol_llama-220M-GQA-fineweb-edu-10BT
55
 
56
- This model is a continously pretrained version of [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) on the BEE-spoke-data/fineweb-edu-10BT-mincols dataset.
 
57
  It achieves the following results on the evaluation set:
58
  - Loss: 2.7416
59
  - Accuracy: 0.4560
@@ -155,4 +167,4 @@ The following hyperparameters were used during training:
155
  - Transformers 4.41.1
156
  - Pytorch 2.3.1+cu118
157
  - Datasets 2.19.1
158
- - Tokenizers 0.19.1
 
2
  license: apache-2.0
3
  base_model: BEE-spoke-data/smol_llama-220M-GQA
4
  tags:
5
+ - edu
6
+ - continual pretraining
7
  metrics:
8
  - accuracy
9
  inference:
 
20
  example_title: El Microondas
21
  - text: Kennesaw State University is a public
22
  example_title: Kennesaw State University
23
+ - text: >-
24
+ Bungie Studios is an American video game developer. They are most famous for
25
+ developing the award winning Halo series of video games. They also made
26
+ Destiny. The studio was founded
27
  example_title: Bungie
28
  - text: The Mona Lisa is a world-renowned painting created by
29
  example_title: Mona Lisa
30
+ - text: >-
31
+ The Harry Potter series, written by J.K. Rowling, begins with the book
32
+ titled
33
  example_title: Harry Potter Series
34
+ - text: >-
35
+ Question: I have cities, but no houses. I have mountains, but no trees. I
36
  have water, but no fish. What am I?
37
 
38
+ Answer:
39
  example_title: Riddle
40
  - text: The process of photosynthesis involves the conversion of
41
  example_title: Photosynthesis
42
+ - text: >-
43
+ Jane went to the store to buy some groceries. She picked up apples, oranges,
44
  and a loaf of bread. When she got home, she realized she forgot
45
  example_title: Story Continuation
46
+ - text: >-
47
+ Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
48
+ another train leaves Station B at 10:00 AM and travels at 80 mph, when will
49
  they meet if the distance between the stations is 300 miles?
50
 
51
+ To determine
52
  example_title: Math Problem
53
  - text: In the context of computer programming, an algorithm is
54
  example_title: Algorithm Definition
55
  pipeline_tag: text-generation
56
+ datasets:
57
+ - HuggingFaceFW/fineweb-edu
58
+ language:
59
+ - en
60
  ---
61
 
62
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 
64
 
65
  # smol_llama-220M-GQA-fineweb-edu-10BT
66
 
67
+ This model is a continously pretrained version of [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) on the 10BT-sample subset of `HuggingFaceFW/fineweb-edu`.
68
+
69
  It achieves the following results on the evaluation set:
70
  - Loss: 2.7416
71
  - Accuracy: 0.4560
 
167
  - Transformers 4.41.1
168
  - Pytorch 2.3.1+cu118
169
  - Datasets 2.19.1
170
+ - Tokenizers 0.19.1