Yongbin Choi's picture

Yongbin Choi

whybe-choi

·

AI & ML interests

LLM, RAG, Information Retrieval

Recent Activity

upvoted a collection 2 days ago

liked a dataset 2 days ago

nvidia/miracl-vision

liked a model 3 days ago

Qwen/Qwen3-VL-8B-Thinking

View all activity

Organizations

upvoted a collection 2 days ago

pplx-embed

Diffusion-LM for Dense and Contextual Retrieval • 7 items • Updated 1 day ago • 16

upvoted a paper 5 days ago

Training Sparse Mixture Of Experts Text Embedding Models

Paper • 2502.07972 • Published Feb 11, 2025 • 10

upvoted an article 14 days ago

Article

How We Built a Semantic Highlight Model To Save Token Cost for RAG

about 1 month ago

•

65

upvoted an article 22 days ago

Article

Continuous batching from first principles

+1

Nov 25, 2025

•

324

upvoted 2 papers about 1 month ago

ViDoRe V3: A Comprehensive Evaluation of Retrieval Augmented Generation in Complex Real-World Scenarios

Paper • 2601.08620 • Published Jan 13 • 11

MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval

Paper • 2412.14475 • Published Dec 19, 2024 • 57

upvoted a collection about 1 month ago

ViDoRe Community benchmark contributions

This collection regroups works done by the community to improve together Visual Retrieval ! • 4 items • Updated Jan 9 • 1

upvoted an article about 1 month ago

Article

Small Yet Mighty: Improve Accuracy In Multimodal Search and Visual Document Retrieval with Llama Nemotron RAG Models

Jan 6

•

23

upvoted a collection about 2 months ago

ViDoRe Benchmark V3

ViDoRe V3 is our latest benchmark, engineered to set a new industry gold standard for multi-modal, enterprise document retrieval evaluation. • 8 items • Updated Jan 14 • 19

upvoted a paper about 2 months ago

Pre-training Small Base LMs with Fewer Tokens

Paper • 2404.08634 • Published Apr 12, 2024 • 36

upvoted 3 articles 2 months ago

Article

ViDoRe V3: a comprehensive evaluation of retrieval for enterprise use-cases

Nov 5, 2025

•

62

Article

How We Use Claude Code Skills to Run 1,000+ ML Experiments a Day

Dec 8, 2025

•

52

Article

We Got Claude to Fine-Tune an Open Source LLM

Dec 4, 2025

•

592

upvoted a paper 2 months ago

ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration

Paper • 2511.21689 • Published Nov 26, 2025 • 124

upvoted a collection 3 months ago

BiCA

7 items • Updated Jan 4 • 3

upvoted a paper 4 months ago

OpenRubrics: Towards Scalable Synthetic Rubric Generation for Reward Modeling and LLM Alignment

Paper • 2510.07743 • Published Oct 9, 2025 • 10

upvoted 2 articles 4 months ago

Article

Introducing MTEB v2: Evaluation of embedding and retrieval systems for more than just text

Oct 20, 2025

•

35

Article

Vocabulary is the most important element of Sparse Retrieval

Oct 4, 2025

•

10

upvoted 2 articles 5 months ago

Article

No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL

+4

Jun 3, 2025

•

99

Article

Welcome EmbeddingGemma, Google's new efficient embedding model

+4

Sep 4, 2025

•

273