Fynn Kröger

fynnkroeger

AI & ML interests

None yet

Recent Activity

View all activity

Organizations

None yet

fynnkroeger's activity

reacted to cogwheelhead's post with 👍 1 day ago
view post
Post
2435
Me and my team have performed an in-depth investigation comparing o1 to R1 (and other reasoning models)

Link: https://toloka.ai/blog/r1-is-not-on-par-with-o1-and-the-difference-is-qualitative-not-quantitative

It started with us evaluating them on our own university-math benchmarks: U-MATH for problem-solving and μ-MATH for judging solution correctness (see the HF leaderboard: toloka/u-math-leaderboard)

tl;dr: R1 sure is amazing, but what we find is that it lags behind in novelty adaptation and reliability:
* performance drops when updating benchmarks with fresh unseen tasks (e.g. AIME 2024 -> 2025)
* R1-o1 gap widens when evaluating niche subdomains (e.g. university-specific math instead of the more common Olympiad-style contests)
* same with going into altogether unconventional domains (e.g. chess) or skills (e.g. judgment instead of problem-solving)
* R1 also runs into failure modes way more often (e.g. making illegal chess moves or falling into endless generation loops)

Our point here is not to bash on DeepSeek — they've done exceptional work, R1 is a game-changer, and we have no intention to downplay that. R1's release is a perfect opportunity to study where all these models differ and gain understanding on how to move forward from here
reacted to lysandre's post with ❤️ 1 day ago
view post
Post
4111
SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.
  • 1 reply
·
reacted to davidberenstein1957's post with 🤗 5 months ago
view post
Post
1344
Distilabel and synthetic data community interviews - the outcomes

We've been doing some interview with community members to understand the needs surrounding synthetic data. Many thanks to the participants. Note that, given they interviewees were sourced from our community, so the results will likely represent that.

Things distilabel does well
- security and reliability by caching generations and having serializable pipelines.
- scaling up generation by parallelising inference and Anyscale Ray
- solid implementations of state of the art research papers

Things to improve
- communication about the fact we support structured generation
- customization of existing prompt implementations are difficult
- creation of new tasks prove difficult
- arguments and parameters for tasks aren't available at first glance
- the learning curve can be steep
- more tutorials that represent real-life usage

Things to note
- create small scale and large scale dataset to Millions of records
- people use synthetic data to move away from frontier model providers
- people mostly use 7B or 70B models for generating

Participate here: https://github.com/argilla-io/distilabel/issues
reacted to TuringsSolutions's post with 😔 6 months ago
view post
Post
1327
I can solve the Traveling Salesman Problem using the same methods the scientists used to solve it with 1 qubit, except I do not need quantum computers to do it. I am kind of tired of screaming this from the rooftops at this point. I can create an imaginary probability space, then I can put a bunch of imaginary agents in the imaginary box, and solve real problems in seconds. Problems that would take minutes, hours, or years to solve via other algorithms. Here is a demo of me solving the Traveling Salesman problem using 50 agents to probabilistically sample at once: https://colab.research.google.com/drive/1XplG72nQDO_-2h4DUllERLp0Dr2pI2J2?usp=sharing

·