YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

pulze-intent-v0.1

Intent-tuned LLM router that selects the best LLM for a user query. Use with knn-router.

Models

  • claude-3-haiku-20240307
  • claude-3-opus-20240229
  • claude-3-sonnet-20240229
  • command-r
  • command-r-plus
  • dbrx-instruct
  • gpt-3.5-turbo-0125
  • gpt-4-turbo-2024-04-09
  • llama-3-70b-instruct
  • mistral-large
  • mistral-medium
  • mistral-small
  • mixtral-8x7b-instruct

Data

Prompts and Intent Categories

Prompt and intent categories are derived from the GAIR-NLP/Auto-J scenario classification dataset.

Citation:

@article{li2023generative,
  title={Generative Judge for Evaluating Alignment},
  author={Li, Junlong and Sun, Shichao and Yuan, Weizhe and Fan, Run-Ze and Zhao, Hai and Liu, Pengfei},
  journal={arXiv preprint arXiv:2310.05470},
  year={2023}
}

Response Evaluation

Candidate model responses were evaluated pairwise using openai/gpt-4-turbo-2024-04-09, with the following prompt:

You are an expert, impartial judge tasked with evaluating the quality of responses generated by two AI assistants.

Think step by step, and evaluate the responses, <response1> and <response2> to the instruction, <instruction>. Follow these guidelines:
- Avoid any position bias and ensure that the order in which the responses were presented does not influence your judgement
- Do not allow the length of the responses to influence your judgement - a concise response can be as effective as a longer one
- Consider factors such as adherence to the given instruction, helpfulness, relevance, accuracy, depth, creativity, and level of detail
- Be as objective as possible

Make your decision on which of the two responses is better for the given instruction from the following choices:
If <response1> is better, use "1".
If <response2> is better, use "2".
If both answers are equally good, use "0".
If both answers are equally bad, use "0".

<instruction>
{INSTRUCTION}
</instruction>

<response1>
{RESPONSE1}
</response1>

<response2>
{RESPONSE2}
</response2>

Each pair of models is subject to 2 matches, with the positions of the respective responses swapped in the evaluation prompt. A model is considered a winner only if it wins both matches.

For each prompt, we then compute Bradley-Terry scores for the respective models using the same method as that used in the LMSYS Chatbot Arena Leaderboard. Finally, we normalize all scores to a scale from 0 to 1 for interoperability with other weighted ranking systems.

Model

The embedding model was generated by first fine-tuning BAAI/bge-base-en-v1.5 with the intent categories from the dataset above, using contrastive learning with cosine similarity loss, and subsequently merging the resultant model with the base model at a 3:2 ratio.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.