YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ChemBERTa IUPAC Classifier

This model is a fine-tuned version of seyonec/ChemBERTa-zinc-base-v1 for binary classification of chemical compounds based on their IUPAC names.

Model description

This model uses ChemBERTa, a BERT-like model pre-trained on chemical structures, to classify molecules based on their IUPAC names. The model was fine-tuned on a custom dataset containing IUPAC names of molecules with binary labels.

Developed by: xluobd

Model type: RobertaForSequenceClassification

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("xluobd/chemberta-iupac-classifier")
model = AutoModelForSequenceClassification.from_pretrained("xluobd/chemberta-iupac-classifier")

# Example IUPAC name
iupac_name = "2-hydroxy-N,N,N-trimethylethan-1-aminium"

# Tokenize and predict
inputs = tokenizer(iupac_name, return_tensors="pt", padding=True, truncation=True, max_length=256)
outputs = model(**inputs)
probabilities = outputs.logits.softmax(dim=-1)
prediction = probabilities.argmax().item()

print(f"Prediction: {prediction}")
print(f"Confidence: {probabilities[0][prediction].item():.4f}")
Downloads last month
6
Safetensors
Model size
44.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support