-
KAN or MLP: A Fairer Comparison
Paper • 2407.16674 • Published • 43 -
MathScale: Scaling Instruction Tuning for Mathematical Reasoning
Paper • 2403.02884 • Published • 17 -
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Paper • 2402.13064 • Published • 48 -
DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
Paper • 2208.11503 • Published
Collections
Discover the best community collections!
Collections including paper arxiv:2407.16674
-
Just How Flexible are Neural Networks in Practice?
Paper • 2406.11463 • Published • 7 -
Not All Language Model Features Are Linear
Paper • 2405.14860 • Published • 40 -
KAN: Kolmogorov-Arnold Networks
Paper • 2404.19756 • Published • 110 -
An Interactive Agent Foundation Model
Paper • 2402.05929 • Published • 28
-
RLHF Workflow: From Reward Modeling to Online RLHF
Paper • 2405.07863 • Published • 68 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models
Paper • 2405.15574 • Published • 53 -
An Introduction to Vision-Language Modeling
Paper • 2405.17247 • Published • 87
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 28 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 131 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 37
-
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Paper • 2404.15653 • Published • 27 -
MoDE: CLIP Data Experts via Clustering
Paper • 2404.16030 • Published • 13 -
MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning
Paper • 2405.12130 • Published • 48 -
Reducing Transformer Key-Value Cache Size with Cross-Layer Attention
Paper • 2405.12981 • Published • 30