Update README.md

e914080 verified 6 months ago

4.26 kB

	---
	license: apache-2.0
	base_model: distilbert-base-uncased
	tags:
	- generated_from_trainer
	metrics:
	- accuracy
	model-index:
	- name: imdb-distilbert-funetuned
	results: []
	datasets:
	- ajaykarthick/imdb-movie-reviews
	language:
	- en
	library_name: transformers
	pipeline_tag: text-classification
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	# DistilBERT IMDb Sentiment Classifier

	## Model Description
	This is a fine-tuned version of [DistilBERT](https://huggingface.co/distilbert-base-uncased) for sentiment analysis on the IMDb movie review dataset. DistilBERT is a smaller, faster, and lighter variant of BERT, designed to perform efficiently while retaining the core strengths of BERT in natural language understanding.

	The model is trained to classify movie reviews as either positive or negative sentiments, making it ideal for applications where sentiment analysis is needed, such as analyzing customer feedback, social media posts, or reviews.

	## Intended Use
	This model is intended for text classification tasks, specifically sentiment analysis. It can be used to automatically label a piece of text as either having a positive or negative sentiment.

	### Use Cases
	- Movie review sentiment analysis
	- Customer feedback analysis
	- Social media sentiment monitoring
	- Product review classification

	## How to Use

	Here is how you can use this model with the Hugging Face `transformers` library:

	```python
	from transformers import DistilBertTokenizer, DistilBertForSequenceClassification
	import torch

	# Load the model and tokenizer
	model_name = "Ashaduzzaman/imdb-distilbert-funetuned",
	tokenizer = DistilBertTokenizer.from_pretrained(model_name)
	model = DistilBertForSequenceClassification.from_pretrained(model_name)

	# Example text
	text = "The movie was absolutely fantastic! The acting was superb and the story was gripping."

	# Tokenize and predict
	inputs = tokenizer(text, return_tensors="pt")
	outputs = model(**inputs)
	logits = outputs.logits
	predictions = torch.softmax(logits, dim=1)

	# Get the predicted label
	predicted_label = torch.argmax(predictions).item()
	labels = ["Negative", "Positive"]
	print(f"Predicted sentiment: {labels[predicted_label]}")
	```

	## Training Data
	This model was trained on the IMDb movie review dataset, a large dataset for binary sentiment classification. The dataset contains 50,000 highly polarized movie reviews. This dataset is balanced, with 25,000 positive and 25,000 negative reviews.

	## Training Procedure
	The model was fine-tuned using the IMDb dataset with the following configuration:
	- Optimizer: AdamW (Adam with betas=(0.9,0.999) and epsilon=1e-08)
	- Learning Rate: 2e-5
	- Batch Size: 16
	- Epochs: 2
	- Max Sequence Length: 512 tokens

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Accuracy \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|:--------:\|
	\| 0.2239 \| 1.0 \| 1563 \| 0.2026 \| 0.9227 \|
	\| 0.1468 \| 2.0 \| 3126 \| 0.2319 \| 0.9320 \|

	- Loss: 0.2319
	- Accuracy: 0.9320

	## Limitations
	- The model is specifically trained on the IMDb dataset, so its effectiveness may be reduced when applied to other domains or types of text.
	- Sentiment detection is binary (positive or negative). Neutral sentiments or more nuanced emotions are not captured.
	- The model may not perform well on text that is highly sarcastic, contains slang, or is very short (e.g., one-word reviews).

	## Ethical Considerations
	- Bias: The model may reflect biases present in the IMDb dataset. Users should be cautious about applying this model to sensitive applications.
	- Content: Since the IMDb dataset includes movie reviews, the model might not generalize well to text outside of this context.

	## Acknowledgments
	- The original [DistilBERT](https://huggingface.co/distilbert-base-uncased) model was developed by Hugging Face.
	- The IMDb dataset is provided by Stanford and can be found [here](https://ai.stanford.edu/~amaas/data/sentiment/).

	## Framework versions

	- Transformers 4.42.4
	- Pytorch 2.3.1+cu121
	- Datasets 2.21.0
	- Tokenizers 0.19.1