Post
1859
Data quality is the new frontier for LLM performance.
Ultra-FineWeb 📊 a high-quality bilingual dataset released by OpenBMB
openbmb/Ultra-FineWeb
✨ MIT License
✨ 1T English + 120B Chinese tokens
✨ Efficient model-driven filtering
Ultra-FineWeb 📊 a high-quality bilingual dataset released by OpenBMB
openbmb/Ultra-FineWeb
✨ MIT License
✨ 1T English + 120B Chinese tokens
✨ Efficient model-driven filtering