🎯 Custom Evaluation: Within-Family Generalization

Beyond the standard validation metrics, this model was subjected to a rigorous custom evaluation to test its ability to generalize to unseen sequences from known families. This is a critical test to ensure the model learned the underlying biological patterns of a protein family rather than simply memorizing the training examples.

Evaluation Set Construction

A custom test set was carefully constructed with the following properties:

  • Source: Sequences were drawn from the top 1,000 most common families (the same families the model was trained on).
  • No Overlap: A critical verification step ensured that 0 sequences from this test set were present in the original training data.
  • Balanced & Representative: The final test set contains 100 unique sequences from 75 different families, providing a balanced and challenging benchmark.

The full dataset used for this evaluation is available on the Hub here: QuantaFold-within-family-test. ### Astonishing Performance

The model demonstrated exceptional generalization capabilities, achieving outstanding results on this challenging, unseen data.

Metric Score
Accuracy 98.0%
Correct Predictions 49/50
Incorrect Predictions 1/50

Conclusion

This 98% accuracy on a completely novel set of sequences from within the training families proves that the model has successfully learned the robust, generalizable features that define a protein's functional identity. This high level of performance makes QuantaFold a reliable and powerful tool for scientific research.

Downloads last month
25
Safetensors
Model size
34M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Tarive/esm2_t12_35M_UR50D-finetuned-pfam-1k

Space using Tarive/esm2_t12_35M_UR50D-finetuned-pfam-1k 1