Victor Gallego's picture

Victor Gallego

vicgalle

·

https://github.com/vicgalle

AI & ML interests

Preference fine-tuning, alignment & synthetic data. Building LLMs in general!

Recent Activity

liked a model about 8 hours ago

ByteDance-Seed/Seed-OSS-36B-Instruct

liked a model 1 day ago

nvidia/NVIDIA-Nemotron-Nano-9B-v2

liked a model 6 days ago

google/gemma-3-270m

View all activity

Organizations

upvoted a paper 9 days ago

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

Paper • 2508.06471 • Published 12 days ago • 151

upvoted a paper 12 days ago

Provably Learning from Language Feedback

Paper • 2506.10341 • Published Jun 12 • 9

upvoted a paper 16 days ago

Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings

Paper • 2508.00632 • Published 20 days ago • 3

upvoted a paper 22 days ago

Agentic Reinforced Policy Optimization

Paper • 2507.19849 • Published 26 days ago • 139

upvoted 2 papers 24 days ago

The Geometry of LLM Quantization: GPTQ as Babai's Nearest Plane Algorithm

Paper • 2507.18553 • Published 27 days ago • 39

Specification Self-Correction: Mitigating In-Context Reward Hacking Through Test-Time Refinement

Paper • 2507.18742 • Published 27 days ago • 5

upvoted an article 28 days ago

Article

Automated Discovery of High-Performance GPU Kernels with OpenEvolve

By

•

Jun 27

• 21

upvoted a paper about 1 month ago

CriticLean: Critic-Guided Reinforcement Learning for Mathematical Formalization

Paper • 2507.06181 • Published Jul 8 • 41

upvoted a paper about 2 months ago

Robust Reward Modeling via Causal Rubrics

Paper • 2506.16507 • Published Jun 19 • 9

upvoted a collection about 2 months ago

Configurable Preference Tuning ⚙️📝

CPT uses rubric-guided synthetic data and DPO to enable LLMs to dynamically adjust behavior (e.g., writing style) at inference with system prompts • 7 items • Updated Jun 17 • 1

upvoted 2 papers 2 months ago

Configurable Preference Tuning with Rubric-Guided Synthetic Data

Paper • 2506.11702 • Published Jun 13 • 2

Training-Free Tokenizer Transplantation via Orthogonal Matching Pursuit

Paper • 2506.06607 • Published Jun 7 • 2

upvoted a collection 3 months ago

Synthetic Data Generation

SDG papers • 86 items • Updated Jul 11 • 15

upvoted a collection 4 months ago

Atropos Artifacts

A collection of experimental artifacts created with Atropos, Nous' RL Environments framework - https://github.com/NousResearch/Atropos • 9 items • Updated 29 days ago • 10

upvoted 3 papers 4 months ago

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

Paper • 2504.20571 • Published Apr 29 • 97

Perception Encoder: The best visual embeddings are not at the output of the network

Paper • 2504.13181 • Published Apr 17 • 35

ReZero: Enhancing LLM search ability by trying one-more-time

Paper • 2504.11001 • Published Apr 15 • 15

upvoted 2 collections 4 months ago

Nemotron-H

Mamba-Transformer hybrid models • 10 items • Updated 6 days ago • 29

GLM-4-0414

GLM-4-0414 series model • 8 items • Updated Jun 30 • 130

upvoted an article 5 months ago

Article

Custom Vibe Coding Quest Part 1: The Quest Begins 🧙

By

•

Mar 26

• 10