Model Card for scholawrite-bert-classifier
Model Details
Model Description
This model is refered as BERT-SW-CLF in the paper. It is fined-tuned based on base-base-uncased Hugging Face, using train
split of ScholaWrite dataset. The sole purpose of this model is to predict the next writing intention given scholarly writing in latex.
- Developed by: *Linghe Wang, *Minhwa Lee, Ross Volkov, Luan Chau, Dongyeop Kang
- Language: English
- Finetuned from model: bert-base-uncased
Model Sources [optional]
- Repository: ScholaWrite Github Repository
- Paper: [More Information Needed]
Uses
Direct Use
The model is intended to used for next writing intention prediction in LaTex paper draft. It takes 'before' text warped by special tokens as input, and output the next writing intention which is 1 of 15 predefined labels.
Out-of-Scope Use
The model is fine-tuned only for next writing intention prediction and infereneced in closed enviroment. Its main goal is to examine the usefullness of our dataset. It is suitable for acdamic use, but not suitable for production, general public use, or consumer-oriented service. In addition, use this model on tasks besides next intention prediction in LaTex paper draft may not work well.
Bias and Limitations
The bias and limitations of this model mainly came from the dataset (ScholaWrite) it fine-tuned on.
First, the ScholaWrite dataset is currently limited to the computer science domain, as LaTeX is predominantly used in computer science journals and conferences. This domain-specific focus in dataset may restrict the model's generalizability to other scientific disciplines. Future work could address this limitation by collecting keystroke data from a broader range of fields with diverse writing conven554 tions and tools, such as the humanities or biological sciences. For example, students in humanities usu556 ally write book-length papers and integrate more sources, so it could affect cognitive complexities.
Second, all participants were early-career researchers (e.g., PhD students) at an R1 university in the United States, which means the models may not learn the professional writing behavior and cognitive process from expert. Expanding the dataset to include senior researchers, such as post-doctoral fellows and professors, could offer valuable insights into how writing strategies and revision behaviors evolve with research experience and expertise.
Third, the dataset is exclusive to English-language writing, which restricts model's capability to predict next writing intention in multilingual or non-English contexts. Expanding to multilingual settings could reveal unique cognitive and linguistic insights into writing across languages.
How to Get Started with the Model
import os
from dotenv import load_dotenv
import torch
from transformers import BertTokenizer, BertForSequenceClassification, RobertaTokenizer, RobertaForSequenceClassification
from huggingface_hub import login
load_dotenv()
HUGGINGFACE_TOKEN = os.getenv("HUGGINGFACE_TOKEN")
login(token=HUGGINGFACE_TOKEN)
TOTAL_CLASSES = 15
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
tokenizer.add_tokens("<INPUT>") # start input
tokenizer.add_tokens("</INPUT>") # end input
tokenizer.add_tokens("<BT>") # before text
tokenizer.add_tokens("</BT>") # before text
tokenizer.add_tokens("<PWA>") # start previous writing action
tokenizer.add_tokens("</PWA>") # end previous writing action
model = BertForSequenceClassification.from_pretrained('minnesotanlp/scholawrite-bert-classifier', num_labels=TOTAL_CLASSES)
before_text = "sample before text"
text = "<INPUT>" + "<BT>" + before_text + "</BF> " + "</INPUT>"
input = tokenizer(text, return_tensors="pt")
pred = model(input["input_ids"]).logits.argmax(1)
print("class:", pred)
fine-tuning Details
fine-tuning Data
This model is fine-tuned on minnesotanlp/scholawrite dataset train
split. It is keystroke logs of an end-to-end scholarly writing process, with thorough annotations of cognitive writing intentions behind each keystroke. No additional data pre-processing or filtering performed on the dataset.
fine-tuning Procedure
The model was fine tuned by passing in the before_text
section of a prompt as the input, and using the intention
as the ground truth data. The model output an integer according to each intention label (1-15).
fine-tuning Hyperparameters
- fine-tuning regime: fp32
- learning_rate 2e-5
- per_device_train_batch_size 2
- per_device_eval_batch_size 8
- num_train_epochs 10
- weight_decay 0.01
Machine Specs
- Hardware: 2 X Nvidia RTX A6000
- Hours used: 3.5 hrs
- Compute Region: Minnesota
Testing Procedure
Testing Data
Metrics
The data has class imbalanced on both training and testing data splits, so we use weighted F1 to measure the performance.
Results
BERT | RoBERTa | LLama-8B-Instruct | GPT-4o | |
---|---|---|---|---|
Base | 0.04 | 0.02 | 0.12 | 0.08 |
+ SW | 0.64 | 0.64 | 0.13 | - |
Summary
Table above presents the weighted F1 scores for predicting writing intentions across baselines and fine-tuned models. All models finetuned on ScholaWrite show a improvement performance compared to their baselines. BERT and RoBERTa achieved the most improvement, while LLama-8B-Instruct showed a modest improvement after fine-tuning. Those results demonstrate the effectiveness of our ScholaWrite dataset to align language models with writers' intentions.
BibTeX
@misc{wang2025scholawritedatasetendtoendscholarly,
title={ScholaWrite: A Dataset of End-to-End Scholarly Writing Process},
author={Linghe Wang and Minhwa Lee and Ross Volkov and Luan Tuyen Chau and Dongyeop Kang},
year={2025},
eprint={2502.02904},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.02904},
}
- Downloads last month
- 13
Model tree for minnesotanlp/scholawrite-bert-classifier
Base model
google-bert/bert-base-uncased