neavo/modern_bert_multilingual_nodecay

ModernBertMultilingual is a multilingual model trained from scratch.
Uses the ModernBERT-base architecture.
Supports four languages and their variants, including Chinese (Simplified, Traditional), English, Japanese, and Korean.
Performs well on mixed East Asian language text tasks.

Uses a slightly adjusted vocabulary from the Qwen2.5 series to support multilingualism.
Trained for approximately 100 hours on L40*7 devices, with a training volume of about 60B tokens.
Key training parameters:
- Batch Size : 1792
- Learing Rate : 5e-04
- Maximum Sequence Length : 512
- Optimizer : adamw_torch
- LR Scheduler: warmup_stable_decay
- Train Precision : bf16 mix
For other technical specifications, please refer to the original release information and paper of ModernBERT-base.

Provides 3 different weight versions:
- base - Fully trained with general corpus, suitable for various text domains.
- nodecay - Checkpoint before the annealing stage, you can fine-tune it with domain-specific data to better adapt to target domains.
- keyword_gacha_multilingual - Fine-tuned version using ACGN (e.g., light novels, game text, manga text, etc.) type text.

Model	Version	Description
modern_bert_multilingual	20250128	base
modern_bert_multilingual_nodecay	20250128	nodecay
keyword_gacha_multilingual_base	20250128	keyword_gacha_multilingual

提供 3 个不同的权重版本
- base - 使用通用预料完整训练，可以较好的适用于各种不同领域文本
- nodecay - 退火阶段开始前的检查点，你可以在这个权重的基础上添加领域语料进行退火以使其更适应目标领域
- keyword_gacha_multilingual - 使用 ACGN（例如 轻小说、游戏文本、漫画文本 等）类型文本进行退火的版本

模型	版本	说明
modern_bert_multilingual	20250128	base
modern_bert_multilingual_nodecay	20250128	nodecay
keyword_gacha_multilingual_base	20250128	keyword_gacha_multilingual