Khmer Financial Sentiment Analysis with XLM-RoBERTa
This repository contains a fine-tuned version of the XLM-RoBERTa-base model, specifically trained for Khmer language sentiment analysis in the financial domain. The model has been fine-tuned on a dataset of approximately 4,000 financial text samples, with a test set of 400.
Table of Contents
Overview
Financial texts—such as reports, news, and earnings statements—contain valuable information for market analysis. However, Khmer-language financial texts have received little attention in NLP research. This project adapts the XLM-RoBERTa-base model for Khmer sentiment analysis, specifically in the financial domain.
This model is trained to classify financial text sentiment as either:
- Positive (1): Indicates growth, profitability, or a positive outlook.
- Negative (0): Indicates loss, risk, or financial downturns.
Model Details
- Base Model: XLM-RoBERTa-base
- Task: Sentiment Analysis (Binary Classification: Positive / Negative)
- Domain: Financial Data (Khmer Language)
- Dataset Size: ~4,000 training samples, 400 test samples
- Architecture: Transformer-based sequence classification model
Training Data
The model was fine-tuned using a dataset of Khmer-language financial texts, including:
- Bank reports
- Financial news articles
- Economic forecasts
- Investment analysis
The dataset consists of 4,000 labeled examples for training and 400 samples for testing.
Training Details
The model was fine-tuned over 3 epochs, using XLM-RoBERTa-base as the pretrained model.
Epoch | Training Loss | Validation Loss | Accuracy |
---|---|---|---|
1 | 0.163500 | 0.511470 | XX% |
2 | 0.517700 | 0.581499 | XX% |
3 | 0.312900 | 0.526096 | XX% |
Training Configuration:
- Learning Rate:
2e-5
- Batch Size:
8
- Optimizer: AdamW
- Evaluation Strategy: Per epoch
- Loss Function: CrossEntropyLoss
Results
- Accuracy: ~96% on the validation set.
- Strong Performance: The model effectively classifies Khmer financial sentiment.
- Domain-Specific Optimization: The fine-tuning process allows better understanding of financial terminology in Khmer.
Usage
Below is an example of how to use the fine-tuned model for sentiment prediction:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
# Load the fine-tuned Khmer financial sentiment model
model_name = "songhieng/khmer-sentiment-xlm-roberta-base"
# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Example Khmer financial text
text = "ការប្រកាសចំណូលរបស់ក្រុមហ៊ុនមានការកើនឡើងយ៉ាងច្រើន"
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=512)
outputs = model(**inputs)
# Get predicted sentiment (0 = Negative, 1 = Positive)
predicted_class = outputs.logits.argmax(dim=1).item()
labels_mapping = {0: "Negative", 1: "Positive"}
print(f"Predicted Sentiment: {labels_mapping[predicted_class]}")
- Downloads last month
- 25