Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
227
23
43
Longxu Dou
PRO
dreamerdeo
Follow
Tonic's profile picture
SinclairWang's profile picture
hinevics's profile picture
13 followers
Β·
45 following
https://longxudou.github.io/
LongxuDou
longxudou
longxu-dou-6b167410a
AI & ML interests
Natural Language Processing
Recent Activity
reacted
to
their
post
with β
4 days ago
π Excited to share our technical report on the Southeast Asian multilingual model Sailor2 and its latest updates! Our 49-page report details Sailor2's development journey, including multilingual data cleaning, small model data mixture simulations, multi-stage continual pre-training, multi-stage post-training, and multi-cultural multi-lingual evaluations. Sailor2 aims to streamline the multilingual model pre-training process efficiently for the community. π§ We highlight Sailor2's impressive performance in low-resource language translation scenarios and its cultural understanding advantages in Southeast Asia, promoting practical applications for regional languages. Model updates include:Β π‘ More precise outputs: Reduced redundancy in model outputs through refined post-training data and optimization techniques.Β π Handling longer texts: Expanded to handle up to 128K context length in Southeast Asian languages through long-text training.Β β‘οΈ Faster inference: Achieved 2.5x faster inference speed with speculative decoding.Β πͺοΈ More model sizes: Introduced new sizes of 3B and 14B through model pruning. π All models are Apache-licensed for commercial use; development tools (code, resources) are open-source. π Technical report: https://huggingface.co/papers/2502.12982Β π€οΈ Models: https://huggingface.co/collections/sail/sailor2-language-models-674d7c9e6b4dbbd9a869906bΒ π¬ Demo: https://huggingface.co/spaces/sail/Sailor2-20B-ChatΒ π£ Sailor2 community: https://huggingface.co/sailor2
updated
a Space
4 days ago
sailor2/README
new
activity
4 days ago
sail/Sailor2-8B-Chat:
Fix formatting
View all activity
Organizations
dreamerdeo
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
5 days ago
sail/Sailor2-20B-Chat
Text Generation
β’
Updated
4 days ago
β’
401
β’
4
liked
a dataset
about 1 month ago
opencsg/chinese-fineweb-edu
Viewer
β’
Updated
Jan 20
β’
84.6M
β’
31.3k
β’
89
liked
6 models
3 months ago
sail/Sailor2-20B
Text Generation
β’
Updated
4 days ago
β’
55
β’
10
sail/Sailor2-1B
Text Generation
β’
Updated
4 days ago
β’
647
β’
6
sail/Sailor2-8B
Text Generation
β’
Updated
4 days ago
β’
75
β’
5
sail/Sailor2-8B-Chat
Text Generation
β’
Updated
4 days ago
β’
241
β’
17
sail/Sailor2-1B-Chat
Text Generation
β’
Updated
4 days ago
β’
164
β’
14
sail/Sailor2-20B-Chat-1203
Text Generation
β’
Updated
4 days ago
β’
285
β’
25
liked
a Space
3 months ago
Running
2
2
README
π»
liked
2 datasets
3 months ago
VTSNLP/vietnamese_curated_dataset
Viewer
β’
Updated
Nov 24, 2024
β’
12.2M
β’
493
β’
50
sailor2/Vietnamese_RAG
Viewer
β’
Updated
Jul 16, 2024
β’
8.41k
β’
308
β’
7
liked
a model
3 months ago
liuhaotian/llava-v1.6-34b
Image-Text-to-Text
β’
Updated
May 9, 2024
β’
14.1k
β’
347
liked
a dataset
3 months ago
neulab/MultiUI
Viewer
β’
Updated
Nov 22, 2024
β’
7.29M
β’
2.47k
β’
41
liked
a dataset
4 months ago
C4AI-Community/multilingual-reward-bench
Viewer
β’
Updated
Nov 4, 2024
β’
66.8k
β’
1.54k
β’
26
liked
a dataset
7 months ago
m-a-p/neo_sft_phase2
Viewer
β’
Updated
Jun 12, 2024
β’
109k
β’
82
β’
52
liked
a dataset
8 months ago
nvidia/HelpSteer
Viewer
β’
Updated
Dec 18, 2024
β’
37.1k
β’
1.93k
β’
233
liked
a Space
9 months ago
Running
18
18
LLM Leaderboard for SEA
π₯
Browse leaderboard of language models
liked
a dataset
9 months ago
TIGER-Lab/MMLU-Pro
Viewer
β’
Updated
Nov 27, 2024
β’
12.1k
β’
41k
β’
322
liked
2 models
9 months ago
sail/Sailor-14B-Chat-gguf
Updated
Dec 21, 2024
β’
227
β’
5
sail/Sailor-14B-Chat
Text Generation
β’
Updated
Dec 21, 2024
β’
60
β’
11
Load more