BEE-spoke-data
/

smol_llama-220M-GQA-fineweb_edu

@@ -2,7 +2,8 @@
 license: apache-2.0
 base_model: BEE-spoke-data/smol_llama-220M-GQA
 tags:
-- generated_from_trainer
 metrics:
 - accuracy
 inference:
@@ -19,33 +20,43 @@ widget:
   example_title: El Microondas
 - text: Kennesaw State University is a public
   example_title: Kennesaw State University
-- text: Bungie Studios is an American video game developer. They are most famous for
-    developing the award winning Halo series of video games. They also made Destiny.
-    The studio was founded
   example_title: Bungie
 - text: The Mona Lisa is a world-renowned painting created by
   example_title: Mona Lisa
-- text: The Harry Potter series, written by J.K. Rowling, begins with the book titled
   example_title: Harry Potter Series
-- text: 'Question: I have cities, but no houses. I have mountains, but no trees. I
     have water, but no fish. What am I?
-    Answer:'
   example_title: Riddle
 - text: The process of photosynthesis involves the conversion of
   example_title: Photosynthesis
-- text: Jane went to the store to buy some groceries. She picked up apples, oranges,
     and a loaf of bread. When she got home, she realized she forgot
   example_title: Story Continuation
-- text: 'Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph,
-    and another train leaves Station B at 10:00 AM and travels at 80 mph, when will
     they meet if the distance between the stations is 300 miles?
-    To determine'
   example_title: Math Problem
 - text: In the context of computer programming, an algorithm is
   example_title: Algorithm Definition
 pipeline_tag: text-generation
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -53,7 +64,8 @@ should probably proofread and complete it, then remove this comment. -->
 # smol_llama-220M-GQA-fineweb-edu-10BT
-This model is a continously pretrained version of [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) on the BEE-spoke-data/fineweb-edu-10BT-mincols dataset.
 It achieves the following results on the evaluation set:
 - Loss: 2.7416
 - Accuracy: 0.4560
@@ -155,4 +167,4 @@ The following hyperparameters were used during training:
 - Transformers 4.41.1
 - Pytorch 2.3.1+cu118
 - Datasets 2.19.1
-- Tokenizers 0.19.1

 license: apache-2.0
 base_model: BEE-spoke-data/smol_llama-220M-GQA
 tags:
+- edu
+- continual pretraining
 metrics:
 - accuracy
 inference:
   example_title: El Microondas
 - text: Kennesaw State University is a public
   example_title: Kennesaw State University
+- text: >-
+    Bungie Studios is an American video game developer. They are most famous for
+    developing the award winning Halo series of video games. They also made
+    Destiny. The studio was founded
   example_title: Bungie
 - text: The Mona Lisa is a world-renowned painting created by
   example_title: Mona Lisa
+- text: >-
+    The Harry Potter series, written by J.K. Rowling, begins with the book
+    titled
   example_title: Harry Potter Series
+- text: >-
+    Question: I have cities, but no houses. I have mountains, but no trees. I
     have water, but no fish. What am I?
+    Answer:
   example_title: Riddle
 - text: The process of photosynthesis involves the conversion of
   example_title: Photosynthesis
+- text: >-
+    Jane went to the store to buy some groceries. She picked up apples, oranges,
     and a loaf of bread. When she got home, she realized she forgot
   example_title: Story Continuation
+- text: >-
+    Problem 2: If a train leaves Station A at 9:00 AM and travels at 60 mph, and
+    another train leaves Station B at 10:00 AM and travels at 80 mph, when will
     they meet if the distance between the stations is 300 miles?
+    To determine
   example_title: Math Problem
 - text: In the context of computer programming, an algorithm is
   example_title: Algorithm Definition
 pipeline_tag: text-generation
+datasets:
+- HuggingFaceFW/fineweb-edu
+language:
+- en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # smol_llama-220M-GQA-fineweb-edu-10BT
+This model is a continously pretrained version of [BEE-spoke-data/smol_llama-220M-GQA](https://huggingface.co/BEE-spoke-data/smol_llama-220M-GQA) on the 10BT-sample subset of `HuggingFaceFW/fineweb-edu`.
 It achieves the following results on the evaluation set:
 - Loss: 2.7416
 - Accuracy: 0.4560
 - Transformers 4.41.1
 - Pytorch 2.3.1+cu118
 - Datasets 2.19.1
+- Tokenizers 0.19.1