michaelbzhu
/

test-3.2B-base

custom-mbz-test

model_hub_mixin

pytorch_model_hub_mixin

Model card Files Files and versions Community

test-3.2B-base / README.md

michaelbzhu's picture

Update README.md

9dd587a verified 16 days ago

|

history blame contribute delete

2.59 kB

	---
	tags:
	- model_hub_mixin
	- pytorch_model_hub_mixin
	license: mit
	datasets:
	- kjj0/fineweb100B-gpt2
	language:
	- en
	---

	3.2B parameter base model trained for ~64B tokens from the FineWeb dataset

	uses gpt2 tokenizer from tiktoken

	[wandb training metrics](https://api.wandb.ai/links/teammapo-mapo-labs/zooq3iig)
	- note: increased batch size from 8 to 512 at step 2,160,000
	- Final checkpoint: step 2,187,000, val_loss: 2.7489
	- Trained on a 8xH100 80GB node using data parallel

	Model config:
	```
	"d_head": 128,
	"d_model": 8192,
	"n_heads": 64,
	"n_layers": 3,
	"n_vocab": 50257
	```

	Usage:
	```
	from transformers import AutoTokenizer, AutoModelForCausalLM
	import torch

	model = AutoModelForCausalLM.from_pretrained("michaelbzhu/test-3.2B-base", trust_remote_code=True)
	model = model.cuda()
	tokenizer = AutoTokenizer.from_pretrained("michaelbzhu/test-3.2B-base", trust_remote_code=True)

	prompt = "The future of AI is"
	input_ids = tokenizer.encode(prompt, return_tensors="pt").to(model.device)
	for _ in range(20):
	logits = model(input_ids).logits[0, -1, :]
	next_token = torch.multinomial(torch.softmax(logits, dim=-1), 1).unsqueeze(0)
	input_ids = torch.cat([input_ids, next_token], dim=1)
	print(tokenizer.decode(input_ids[0]))
	```

	Eval:
	```
	$ lm_eval --model hf \
	--model_args pretrained=michaelbzhu/test-3.2B-base,trust_remote_code=True \
	--tasks mmlu_college_medicine,hellaswag,lambada_openai,arc_easy,winogrande,arc_challenge,openbookqa \
	--device cuda:0 \
	--batch_size 16

	\| Tasks \|Version\|Filter\|n-shot\| Metric \| \| Value \| \|Stderr\|
	\|----------------\|------:\|------\|-----:\|----------\|---\|------:\|---\|-----:\|
	\|arc_challenge \| 1\|none \| 0\|acc \|↑ \| 0.2363\|± \|0.0124\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.2637\|± \|0.0129\|
	\|arc_easy \| 1\|none \| 0\|acc \|↑ \| 0.5758\|± \|0.0101\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.4996\|± \|0.0103\|
	\|hellaswag \| 1\|none \| 0\|acc \|↑ \| 0.3827\|± \|0.0049\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.4846\|± \|0.0050\|
	\|lambada_openai \| 1\|none \| 0\|acc \|↑ \| 0.4238\|± \|0.0069\|
	\| \| \|none \| 0\|perplexity\|↓ \|14.7850\|± \|0.4335\|
	\|college_medicine\| 1\|none \| 0\|acc \|↑ \| 0.2370\|± \|0.0324\|
	\|openbookqa \| 1\|none \| 0\|acc \|↑ \| 0.2180\|± \|0.0185\|
	\| \| \|none \| 0\|acc_norm \|↑ \| 0.3180\|± \|0.0208\|
	\|winogrande \| 1\|none \| 0\|acc \|↑ \| 0.5367\|± \|0.0140\|
	```