NousResearch
/

Nous-Capybara-7B-V1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

LDJnr commited on Sep 25, 2023

Commit

b2de804

·

1 Parent(s): ad0e272

Update README.md

Files changed (1) hide show

README.md +6 -4

README.md CHANGED Viewed

@@ -16,21 +16,23 @@ A model created with the goal of a synergistic combination of different techniqu
 Entirely contained within 20K training examples!
 ## Thank you to dataset creators!
 While most of the tokens within Capybara are newly synthsized and part of datasets like Puffin/Dove, we would like to credit the single-turn datasets we leveraged as seeds that are used to initiate the beggining of many of the multi-turn conversations:
 ![Capybara](https://i.imgur.com/yB58OoD.jpeg)
-This model was fine-tuned by Nous Research, with LDJ leading the training and dataset curation, along with significant dataset formation contributions by J-Supha, Also thank you to Emozilla for also assisting to expedite the training experimentation process.
-Special thank you to A16Z for sponsoring our training, as well as Yield Protocol for their support in resources during R&D of aspects outside of training, such as dataset development/synthesis.
 ## Model Training
-Nous-Capybara 7B is a new model trained for multiple epochs on a dataset of 3,000 carefully curated GPT-4 examples, most of which are long context conversations between a real human and GPT-4.
-Additional data came from carefully curated sub sections of datasets such as CamelAI's Physics, Chemistry, Biology and Math.
 ## Prompt Format

 Entirely contained within 20K training examples!
+This model was fine-tuned by Nous Research, with LDJ leading the training and dataset curation, along with significant dataset formation contributions by J-Supha, Also thank you to Emozilla for also assisting to expedite the training experimentation process.
+Special thank you to A16Z for sponsoring our training, as well as Yield Protocol for their support in resources during R&D of aspects outside of training, such as dataset development/synthesis.
 ## Thank you to dataset creators!
 While most of the tokens within Capybara are newly synthsized and part of datasets like Puffin/Dove, we would like to credit the single-turn datasets we leveraged as seeds that are used to initiate the beggining of many of the multi-turn conversations:
 ![Capybara](https://i.imgur.com/yB58OoD.jpeg)
 ## Model Training
+Nous-Capybara 7B is a new model trained for multiple epochs on a dataset of less than 20,000 carefully curated GPT-4 examples, most of which are long context conversations between a real human and GPT-4 comprised of entirely newly synthesized tokens that previously didn't exist on HuggingFace.
+Additional data came from manually curated CamelAI data, with the help of volunteers ranging from former Physicists, Mathematicians, Biologists and more!
 ## Prompt Format