Over-Tokenized Transformer: Vocabulary is Generally Worth Scaling Paper • 2501.16975 • Published 26 days ago • 26
ICE-GRT: Instruction Context Enhancement by Generative Reinforcement based Transformers Paper • 2401.02072 • Published Jan 4, 2024 • 11