ashaduzzaman's picture
Update README.md
e914080 verified
---
license: apache-2.0
base_model: distilbert-base-uncased
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: imdb-distilbert-funetuned
results: []
datasets:
- ajaykarthick/imdb-movie-reviews
language:
- en
library_name: transformers
pipeline_tag: text-classification
---
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->
# DistilBERT IMDb Sentiment Classifier
## Model Description
This is a fine-tuned version of [DistilBERT](https://huggingface.co/distilbert-base-uncased) for sentiment analysis on the IMDb movie review dataset. DistilBERT is a smaller, faster, and lighter variant of BERT, designed to perform efficiently while retaining the core strengths of BERT in natural language understanding.
The model is trained to classify movie reviews as either **positive** or **negative** sentiments, making it ideal for applications where sentiment analysis is needed, such as analyzing customer feedback, social media posts, or reviews.
## Intended Use
This model is intended for text classification tasks, specifically sentiment analysis. It can be used to automatically label a piece of text as either having a positive or negative sentiment.
### Use Cases
- **Movie review sentiment analysis**
- **Customer feedback analysis**
- **Social media sentiment monitoring**
- **Product review classification**
## How to Use
Here is how you can use this model with the Hugging Face `transformers` library:
```python
from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
import torch
# Load the model and tokenizer
model_name = "Ashaduzzaman/imdb-distilbert-funetuned",
tokenizer = DistilBertTokenizer.from_pretrained(model_name)
model = DistilBertForSequenceClassification.from_pretrained(model_name)
# Example text
text = "The movie was absolutely fantastic! The acting was superb and the story was gripping."
# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)
logits = outputs.logits
predictions = torch.softmax(logits, dim=1)
# Get the predicted label
predicted_label = torch.argmax(predictions).item()
labels = ["Negative", "Positive"]
print(f"Predicted sentiment: {labels[predicted_label]}")
```
## Training Data
This model was trained on the IMDb movie review dataset, a large dataset for binary sentiment classification. The dataset contains 50,000 highly polarized movie reviews. This dataset is balanced, with 25,000 positive and 25,000 negative reviews.
## Training Procedure
The model was fine-tuned using the IMDb dataset with the following configuration:
- **Optimizer**: AdamW (Adam with betas=(0.9,0.999) and epsilon=1e-08)
- **Learning Rate**: 2e-5
- **Batch Size**: 16
- **Epochs**: 2
- **Max Sequence Length**: 512 tokens
### Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|:-------------:|:-----:|:----:|:---------------:|:--------:|
| 0.2239 | 1.0 | 1563 | 0.2026 | 0.9227 |
| 0.1468 | 2.0 | 3126 | 0.2319 | 0.9320 |
- **Loss:** 0.2319
- **Accuracy:** 0.9320
## Limitations
- The model is specifically trained on the IMDb dataset, so its effectiveness may be reduced when applied to other domains or types of text.
- Sentiment detection is binary (positive or negative). Neutral sentiments or more nuanced emotions are not captured.
- The model may not perform well on text that is highly sarcastic, contains slang, or is very short (e.g., one-word reviews).
## Ethical Considerations
- **Bias**: The model may reflect biases present in the IMDb dataset. Users should be cautious about applying this model to sensitive applications.
- **Content**: Since the IMDb dataset includes movie reviews, the model might not generalize well to text outside of this context.
## Acknowledgments
- The original [DistilBERT](https://huggingface.co/distilbert-base-uncased) model was developed by Hugging Face.
- The IMDb dataset is provided by Stanford and can be found [here](https://ai.stanford.edu/~amaas/data/sentiment/).
## Framework versions
- Transformers 4.42.4
- Pytorch 2.3.1+cu121
- Datasets 2.21.0
- Tokenizers 0.19.1