Catherine Arnett

catherinearnett

AI & ML interests

multilingual NLP, tokenization

Recent Activity

updated a collection 20 days ago
Token Premium Monolingual Tokenizers
updated a collection 20 days ago
Token Premium Monolingual Tokenizers
updated a collection 20 days ago
Token Premium Monolingual Tokenizers
View all activity

Organizations

Blog-explorers's profile picture Language and Cognition Lab (UCSD)'s profile picture PleIAs's profile picture

catherinearnett's activity

published an article 3 months ago
published an article 3 months ago
view article
Article

Releasing the largest multilingual open pretraining dataset

By Pclanglais and 2 others
99
published an article 4 months ago
published an article 5 months ago