|
|
--- |
|
|
language: en |
|
|
tags: |
|
|
- code |
|
|
- algorithms |
|
|
- competitive-programming |
|
|
- multi-label-classification |
|
|
- codebert |
|
|
datasets: |
|
|
- xCodeEval |
|
|
metrics: |
|
|
- f1 |
|
|
- precision |
|
|
- recall |
|
|
library_name: transformers |
|
|
pipeline_tag: text-classification |
|
|
--- |
|
|
|
|
|
# CodeBERT Algorithm Tagger |
|
|
|
|
|
A fine-tuned CodeBERT model for multi-label classification of algorithmic problems from competitive programming platforms like Codeforces. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This model predicts algorithmic tags/categories for competitive programming problems based on their problem descriptions and solution code. |
|
|
|
|
|
**Supported Tags:** |
|
|
- math |
|
|
- graphs |
|
|
- strings |
|
|
- number theory |
|
|
- trees |
|
|
- geometry |
|
|
- games |
|
|
- probabilities |
|
|
|
|
|
## Training Data |
|
|
|
|
|
- **Dataset**: xCodeEval (Codeforces problems) |
|
|
- **Training examples**: 2,147 problems (filtered for focus tags) |
|
|
- **Test examples**: 531 problems |
|
|
- **Source**: Problems and solutions from Codeforces platform |
|
|
|
|
|
### Model Architecture |
|
|
|
|
|
- **Input**: Concatenated problem description and solution code |
|
|
- **Encoder**: CodeBERT (RoBERTa-based architecture) |
|
|
- **Output**: 8-dimensional binary classification (one per tag) |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
pip install transformers torch |
|
|
``` |
|
|
|
|
|
|