🛡️ At Ai4Privacy, our goal is to empower researchers to build a safer AI ecosystem. Today, we're highlighting crucial research that does just that by exposing a new vulnerability.
The paper "Forget to Flourish" details a new model poisoning technique. It's a reminder that as we fine-tune LLMs, our anonymization and privacy strategies must evolve to counter increasingly sophisticated threats.
We're proud that the Ai4Privacy dataset was instrumental in this study. It served two key purposes:
Provided a Realistic Testbed: It gave the researchers access to a diverse set of synthetic and realistic PII samples in a safe, controlled environment.
Enabled Impactful Benchmarking: It allowed them to measure the actual effectiveness of their data extraction attack, proving it could compromise specific, high-value information.
This work reinforces our belief that progress in AI security is a community effort. By providing robust tools for benchmarking, we can collectively identify weaknesses and build stronger, more resilient systems. A huge congratulations to the authors on this important contribution.
Qwen 3 Coder is a personal attack to k2, and I love it. It achieves near SOTA on LCB while not having reasoning. Finally people are understanding that reasoning isnt necessary for high benches...
The document warns of the "intelligence curse," a potential consequence of advanced AI (AGI) where powerful entities lose their incentive to invest in people as AI automates work[cite: 13, 297]. This could lead to job displacement, reduced social mobility, and a concentration of power and wealth based on AI ownership, similar to the "resource curse" in resource-rich states[cite: 17, 18, 31, 329, 353]. To counter this, the authors propose averting AI catastrophes to prevent centralization, diffusing AI widely to keep humans economically relevant, and democratizing institutions to remain anchored to human needs[cite: 22, 23, 25, 35, 36, 37, 566].
I just launched TTS Arena V2 - a platform for benchmarking TTS models by blind A/B testing. The goal is to make it easy to compare quality between open-source and commercial models, including conversational ones.
What's new in V2:
- **Conversational Arena**: Evaluate models like CSM-1B, Dia 1.6B, and PlayDialog in multi-turn settings - **Personal Leaderboard**: Optional login to see which models you tend to prefer - **Multi-speaker TTS**: Random voices per generation to reduce speaker bias - **Performance Upgrade**: Rebuilt from Gradio → Flask. Much faster with fewer failed generations. - **Keyboard Shortcuts**: Vote entirely via keyboard
Also added models like MegaTTS 3, Cartesia Sonic, and ElevenLabs' full lineup.
I'd love any feedback, feature suggestions, or ideas for models to include.