Phi-2-psy

Phi-2-psy is a merge of the following models:

πŸ† Evaluation

The evaluation was performed using LLM AutoEval on Nous suite.

Model AGIEval GPT4All TruthfulQA Bigbench Average
phi-2-psy 34.4 71.4 48.2 38.1 48.02
phixtral-2x2_8 34.1 70.4 48.8 37.8 47.78
dolphin-2_6-phi-2 33.1 69.9 47.4 37.2 46.89
phi-2-orange 33.4 71.3 49.9 37.3 47.97
phi-2 28.0 70.8 44.4 35.2 44.61

🧩 Configuration

slices:
  - sources:
      - model: rhysjones/phi-2-orange
        layer_range: [0, 32]
      - model: cognitivecomputations/dolphin-2_6-phi-2
        layer_range: [0, 32]
merge_method: slerp
base_model: rhysjones/phi-2-orange
parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5
dtype: bfloat16

πŸ’» Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
torch.set_default_device("cuda")
model = AutoModelForCausalLM.from_pretrained("vince62s/phi-2-psy", torch_dtype="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("vince62s/phi-2-psy", trust_remote_code=True)
inputs = tokenizer('''def print_prime(n):
   """
   Print all primes between 1 and n
   """''', return_tensors="pt", return_attention_mask=False)
outputs = model.generate(**inputs, max_length=200)
text = tokenizer.batch_decode(outputs)[0]
print(text)

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 62.80
AI2 Reasoning Challenge (25-Shot) 60.84
HellaSwag (10-Shot) 75.52
MMLU (5-Shot) 57.57
TruthfulQA (0-shot) 48.22
Winogrande (5-shot) 75.45
GSM8k (5-shot) 59.21
Downloads last month
42
Safetensors
Model size
2.78B params
Tensor type
BF16
Β·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.

Model tree for vince62s/phi-2-psy

Evaluation results