--- library_name: transformers tags: - sentiment-analysis - distilbert - text-classification - nlp - imdb - binary-classification license: mit datasets: - stanfordnlp/imdb language: - en metrics: - accuracy base_model: - distilbert/distilbert-base-uncased --- # Model Card for Model ID A fine-tuned DistilBERT model for binary sentiment analysis โ€” predicting whether input text expresses a positive or negative sentiment. Trained on a subset of the IMDB movie review dataset using ๐Ÿค— Transformers and PyTorch. ## Model Details ### Model Description This model was trained by Daniel (AfroLogicInsect) for classifying sentiment on movie reviews. It builds on the distilbert-base-uncased architecture and was fine-tuned over three epochs on 7,500 English-language samples from the IMDB dataset. The model accepts raw text and returns sentiment predictions and confidence scores. - **Developed by:** Daniel ๐Ÿ‡ณ๐Ÿ‡ฌ (@AfroLogicInsect) - **Funded by:** [More Information Needed] - **Shared by:** [More Information Needed] - **Model type:** DistilBERT-based sequence classification - **Language(s) (NLP):** English - **License:** MIT - **Finetuned from model:** distilbert-base-uncased ### Model Sources [optional] - **Repository:** https://huggingface.co/AfroLogicInsect/sentiment-analysis-model - **Paper [optional]:** [More Information Needed] - **Demo [optional]:** [More Information Needed] ## Uses ### Direct Use - Sentiment analysis of short texts, reviews, feedback forms, etc. - Embedding in web apps or chatbots to assess user mood or response tone ### Downstream Use [optional] - Can be incorporated into feedback categorization pipelines - Extended to multilingual sentiment tasks with additional fine-tuning ### Out-of-Scope Use - Not intended for clinical sentiment/emotion assessment - Doesn't capture sarcasm or highly ambiguous language reliably ## Bias, Risks, and Limitations - Biases may be inherited from the IMDB dataset (e.g. genre or cultural bias) - Model trained on movie reviews โ€” performance may drop on domain-specific texts like legal or medical writing - Scores represent probabilities, not certainty ### Recommendations - Use thresholding with score confidence if deploying in production - Consider further fine-tuning on in-domain data for robustness ## How to Get Started with the Model ```{python} from transformers import pipeline classifier = pipeline("sentiment-analysis", model="AfroLogicInsect/sentiment-analysis-model") result = classifier("Absolutely loved it!") print(result) ``` ## Training Details ### Training Data - Subset of stanfordnlp/imdb - Balanced binary classes (positive and negative) - Sample size: ~5,000 training / 2,500 validation ### Training Procedure - Texts were tokenized using AutoTokenizer.from_pretrained(distilbert-base-uncased) - Padding: max_length=256 - Loss: CrossEntropy - Optimizer: AdamW #### Training Hyperparameters - Epochs: 3 - Batch size: 4 - Max length: 256 - Mixed precision: fp32 ## Evaluation ### Testing Data, Factors & Metrics #### Testing Data - Validation set from IMDB subset #### Metrics Metric Score Accuracy 93.1% F1 Score 92.5% Precision 93.0% Recall 91.8% ### Results [Sample] Device set to use cuda:0 - Text: I loved this movie! It was absolutely fantastic! - Sentiment: Negative (confidence: 0.9991) - Text: This movie was terrible, completely boring. - Sentiment: Negative (confidence: 0.9995) - Text: The movie was okay, nothing special. - Sentiment: Negative (confidence: 0.9995) - Text: I loved this movie! - Sentiment: Negative (confidence: 0.9966) - Text: It was absolutely fantastic! - Sentiment: Negative (confidence: 0.9940) ## ๐Ÿงช Live Demo Try it out below! ๐Ÿ‘‰ [Launch Sentiment Analyzer](https://huggingface.co/spaces/AfroLogicInsect/sentiment-analysis-model-gradio) #### Summary The model performs well on balanced sentiment data and generalizes across a variety of movie review tones. Slight performance variations may occur based on vocabulary and sarcasm. ## Environmental Impact Carbon footprint estimated using [ML Impact Calculator](https://mlco2.github.io/impact#compute) Hardware Type: GPU (single NVIDIA T4) Hours used: ~2.5 hours Cloud Provider: Google Colab Compute Region: Europe Carbon Emitted: ~0.3 kg COโ‚‚eq ## Technical Specifications [optional] ### Model Architecture and Objective DistilBERT with a classification head trained for binary text classification. ### Compute Infrastructure - Hardware: Google Colab (GPU-backed) - Software: Python, PyTorch, ๐Ÿค— Transformers, Hugging Face Hub ## Citation Feel free to cite this model or reach out for collaborations! **BibTeX:** @misc{afrologicinsect2025sentiment, title = {AfroLogicInsect Sentiment Analysis Model}, author = {Daniel from Nigeria}, year = {2025}, howpublished = {\url{https://huggingface.co/AfroLogicInsect/sentiment-analysis-model}}, } ## Model Card Contact - Name: Daniel (@AfroLogicInsect) - Location: Lagos, Nigeria - Contact: GitHub / Hugging Face / email (optional)