Praise2112
/

ModernBERT-base-squad2-v0.2

Question Answering

Generated from Trainer

Model card Files Files and versions Community

ModernBERT-base-squad2-v0.2 / README.md

Praise2112's picture

Update README.md

9ac2e5b verified about 1 month ago

|

history blame contribute delete

3 kB

	---
	library_name: transformers
	license: apache-2.0
	base_model: tasksource/ModernBERT-base-nli
	tags:
	- generated_from_trainer
	datasets:
	- rajpurkar/squad_v2
	model-index:
	- name: ModernBERT-base-squad2-v0.2
	results: []
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# ModernBERT-base-squad2-v0.2

	This model is a fine-tuned version of [tasksource/ModernBERT-base-nli](https://huggingface.co/tasksource/ModernBERT-base-nli) on the rajpurkar/squad_v2 dataset.

	Maximum sequence length used during training was 8192.

	Requires `trust_remote_code` to be set to `True` in order to be load the model.

	```python
	from transformers import pipeline

	model_name = "praise2112/ModernBERT-base-squad2-v0.2"

	# a) Get predictions
	nlp = pipeline('question-answering', model=model_name, tokenizer=model_name)

	context = """Model Summary
	ModernBERT is a modernized bidirectional encoder-only Transformer model (BERT-style) pre-trained on 2 trillion tokens of English and code data with a native context length of up to 8,192 tokens. ModernBERT leverages recent architectural improvements such as:

	Rotary Positional Embeddings (RoPE) for long-context support.
	Local-Global Alternating Attention for efficiency on long inputs.
	Unpadding and Flash Attention for efficient inference.
	ModernBERT’s native long context length makes it ideal for tasks that require processing long documents, such as retrieval, classification, and semantic search within large corpora. The model was trained on a large corpus of text and code, making it suitable for a wide range of downstream tasks, including code retrieval and hybrid (text + code) semantic search.

	It is available in the following sizes:

	ModernBERT-base - 22 layers, 149 million parameters
	ModernBERT-large - 28 layers, 395 million parameters
	For more information about ModernBERT, we recommend our release blog post for a high-level overview, and our arXiv pre-print for in-depth information.

	ModernBERT is a collaboration between Answer.AI, LightOn, and friends."""

	question = "How many parameters does ModernBERT-base have?"

	res = nlp(question=question, context=context, max_seq_len=8192)

	# {'score': 0.698786735534668, 'start': 891, 'end': 903, 'answer': ' 149 million'}
	```

	## Training procedure

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 3e-05
	- train_batch_size: 32
	- eval_batch_size: 32
	- seed: 42
	- optimizer: Use ExtendedOptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 4

	### Training results

	\| Metric \| Value \|
	\|--------\|--------\|
	\| eval_exact \| 83.9636 \|
	\| eval_f1 \| 87.0387 \|


	### Framework versions

	- Transformers 4.48.0.dev0
	- Pytorch 2.5.1+cu124
	- Datasets 2.20.0
	- Tokenizers 0.21.0