Florian Zimmermeister's picture

Hiring 💼

Florian Zimmermeister PRO

flozi00

·

AI & ML interests

ASR, German LLM

Recent Activity

upvoted a collection 7 days ago

Mistral Small 4

liked a model 7 days ago

mistralai/Mistral-Small-4-119B-2603-NVFP4

liked a model 7 days ago

mistralai/Mistral-Small-4-119B-2603-eagle

View all activity

Organizations

$A\\Ware's profile picture$

upvoted a collection 7 days ago

Mistral Small 4

A state-of-the-art model, open-weight, with a granular Mixture-of-Experts architecture that fuses instruct, reasoning and agentic skills. • 3 items • Updated 7 days ago • 60

upvoted an article 18 days ago

Article

Spend 80% of Your LLM Compute on Data, Not Training

Feb 14

•

2

upvoted a collection 21 days ago

Qwen3.5

21 items • Updated 14 days ago • 1.27k

upvoted a paper about 1 month ago

OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration

Paper • 2602.05400 • Published Feb 5 • 349

upvoted 2 articles about 2 months ago

Article

Open Responses: What you need to know

+2

Jan 15

•

109

Article

We Got Claude to Build CUDA Kernels and teach open models!

+2

Jan 28

•

149

upvoted 2 papers 2 months ago

GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization

Paper • 2601.05242 • Published Jan 8 • 229

Recursive Language Models

Paper • 2512.24601 • Published Dec 31, 2025 • 94

upvoted 2 papers 3 months ago

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published Dec 31, 2025 • 318

Parallax: Efficient LLM Inference Service over Decentralized Environment

Paper • 2509.26182 • Published Sep 30, 2025 • 1

upvoted a collection 3 months ago

Audio2Face-3D

Open-weight Audio2Face-3D and Audio2Emotion networks and a sample dataset for training and evaluation • 7 items • Updated about 1 hour ago • 16

upvoted 2 articles 4 months ago

Article

Continuous batching from first principles

+1

Nov 25, 2025

•

349

Article

🌳 QAT: The Art of Growing a Bonsai Model

Nov 9, 2025

•

15

upvoted 2 papers 4 months ago

INT v.s. FP: A Comprehensive Study of Fine-Grained Low-bit Quantization Formats

Paper • 2510.25602 • Published Oct 29, 2025 • 79

Diffusion Language Models are Super Data Learners

Paper • 2511.03276 • Published Nov 5, 2025 • 132

upvoted a collection 5 months ago

Cerebras REAP

Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 30 items • Updated 27 days ago • 133

upvoted 4 papers 5 months ago

SINQ: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights

Paper • 2509.22944 • Published Sep 26, 2025 • 81

Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models

Paper • 2510.04618 • Published Oct 6, 2025 • 130

Less is More: Recursive Reasoning with Tiny Networks

Paper • 2510.04871 • Published Oct 6, 2025 • 511

The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain

Paper • 2509.26507 • Published Sep 30, 2025 • 549