Text Classification
Safetensors
French
English
modernbert

🚦 La Route 2.0 — AI Prompt Router

La Route 2.0 is like a GPS for AI prompts.
When you give it a piece of text (a question, a request, or any message), it analyzes it and decides:

  • How sensitive the content is (low / high)
  • What size model you need (small / large)
  • Which tool is best to answer (an offline LLM, an LLM with extra research abilities, or a search engine)

The goal: ✅ save resources, improve safety, and get better answers by sending each prompt to the right place instead of using the same heavy model for everything.


📊 What It Predicts

Task Labels
Sensitivity low, high
Model size small, large
Best tool LLM-with-research-mode, Offline-LLM, Search-engine

🔎 How It Works (In Simple Terms)

  1. You send a prompt (e.g. "Who is the Prime Minister of Canada?")
  2. The model classifies it:
    • Sensitivity → Low
    • Model size → Small
    • Best tool → Search engine
  3. The system then routes the prompt to the cheapest, safest, or most efficient tool.

It’s like a traffic controller for prompts — making sure each one takes the best route to the right “answering engine.”


🖼️ Workflow Diagram

(add an exported image file workflow.png with this chart so it displays on Hugging Face)

User Prompt
     │
     ▼
Shared ModernBERT Encoder
     │
     ├── Sensitivity → low/high
     ├── Model Size → small/large
     └── Best Tool → LLM / Offline-LLM / Search Engine
     │
     ▼
 Route to Best Model for Answer

💡 Why use La Route 2.0?

  • ⚖️ Safer by design: Prompts are automatically routed to the most appropriate model. Instead of forcing all requests through the strictest (or loosest) setup, you can use cloud LLMs for everyday, non‑sensitive queries and keep sensitive prompts on secure, on‑premise models.
  • 💸 More efficient: Don’t waste compute on heavyweight models when a smaller one will do. This saves costs, energy, and latency by balancing resources intelligently.
  • 🛠 Right tool for the job: Not all prompts need an LLM. For factual lookups, a search engine may be faster and more accurate. For longer reasoning, a research‑mode LLM is better. Routing ensures each request is solved by the tool best suited to it.

🔧 Quick Usage Example

from transformers import AutoTokenizer, AutoModel
from huggingface_hub import snapshot_download
import torch, json, torch.nn.functional as F

repo_id = "monsimas/la-route-2"
model_dir = snapshot_download(repo_id)

tokenizer = AutoTokenizer.from_pretrained(model_dir)

# Load label maps
with open(f"{model_dir}/label_maps.json") as f:
    label_maps = json.load(f)
with open(f"{model_dir}/num_labels.json") as f:
    num_labels_dict = json.load(f)

# Define model
class MultiTaskModel(torch.nn.Module):
    def __init__(self, shared_model, num_labels_dict):
        super().__init__()
        self.shared_model = shared_model
        h = shared_model.config.hidden_size
        self.heads = torch.nn.ModuleDict({
            task: torch.nn.Linear(h, n) for task, n in num_labels_dict.items()
        })
    def forward(self, input_ids, attention_mask):
        out = self.shared_model(input_ids=input_ids, attention_mask=attention_mask)
        pooled = out.last_hidden_state[:,0]
        return {t: self.heads[t](pooled) for t in self.heads}

# Load base encoder + multitask heads
base_model = AutoModel.from_pretrained("answerdotai/ModernBERT-base")
model = MultiTaskModel(base_model, num_labels_dict)
state_dict = torch.load(f"{model_dir}/model_state.pt", map_location="cpu")
model.load_state_dict(state_dict)
model.eval()

def classify_text(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=384, padding=True)
    with torch.no_grad():
        logits = model(**inputs)
    predictions = {}
    for task, logit in logits.items():
        probs = F.softmax(logit, dim=-1)
        pred = torch.argmax(probs, dim=-1).item()
        predictions[task] = {
            "label": label_maps[task][str(pred)],
            "confidence": float(probs[0, pred])
        }
    return predictions

print(classify_text("Who is the Prime Minister of Canada?"))

🛠️ Training Details

  • Base model: answerdotai/ModernBERT-base
  • Data: Compar:IA-conversations + ShareGPT (augmented for coverage)
  • Max length: 384 tokens
  • Batch size: 8
  • Learning rate: 5e‑5
  • Multitask heads: Sensitivity, Model Size, Best Tool

⚖️ Limitations

  • Tool and label definitions are domain-specific.
  • The classifier does not generate answers itself — only routes prompts.
  • Sensitive classification may mislabel edge cases.


Downloads last month
15
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for monsimas/la-route-2

Finetuned
(658)
this model

Datasets used to train monsimas/la-route-2