TencentARC
/

Mistral_Pro_8B_v0.1

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Mistral_Pro_8B_v0.1 / README.md

吳成岳

first commit

381e337 about 1 year ago

|

1.88 kB

	---
	license: apache-2.0
	datasets:
	- HuggingFaceTB/cosmopedia
	language:
	- en
	metrics:
	- accuracy
	- code_eval
	---


	# Mistral-Pro-8B Model Card

	## Model Description
	Mistral-Pro is a progressive version of the original [Mistral](https://huggingface.co/mistralai/Mistral-7B-v0.1) model, enhanced by the addition of Transformer blocks. It specializes in integrating both general language understanding and domain-specific knowledge, particularly in programming and mathematics.

	## Development and Training
	Developed by Tencent's ARC Lab, Mistral-Pro is an 8 billion parameter model. It's an expansion of Mistral-7B, further trained on code and math corpora.

	## Intended Use
	This model is designed for a wide range of NLP tasks, with a focus on programming, mathematics, and general language tasks. It suits scenarios requiring integration of natural and programming languages.

	## Performance
	Mistral_Pro_8B_v0.1 showcases superior performance on a range of benchmarks. It enhances the code and math performance of Mistral. Furthermore, it matches the performance of the recently dominant model, [Gemma](https://huggingface.co/google/gemma-7b).

	### Overall Performance on Languages, math and code tasks

	\| Model \| ARC \| Hellaswag \| MMLU \| TruthfulQA \| Winogrande \| GSM8K \| HumanEval \|
	\| :-: \| :-: \| :-: \| :-: \| :-: \| :-: \| :-: \| :-: \|
	\| Gemma-7B \| 61.9 \| 82.2 \| 64.6 \| 44.8 \| 79.0 \| 50.9 \| 32.3 \|
	\| Mistral-7B \| 60.8 \| 83.3 \| 62.7 \| 42.6 \| 78.0 \| 39.2 \| 28.7 \|
	\| Mistral_Pro_8B_v0.1 \| 62.6 \| 82.5 \| 60.7 \| 47.6 \| 78.1 \| 50.3 \| 32.3 \|


	## Limitations
	While Mistral-Pro addresses some limitations of previous models in the series, it may still encounter challenges specific to highly specialized domains or tasks.

	## Ethical Considerations
	Users should be aware of potential biases in the model and use it responsibly, considering its impact on various applications.