20 8 178

Benjamin Paine PRO

benjamin-paine

https://www.benjaminpaine.com

AI & ML interests

A software engineer with an AI habit

Recent Activity

reacted to frimelle's post with ❤️ 3 days ago

What’s in a name? More than you might think, especially for AI. Whenever I introduce myself, people often start speaking French to me, even though my French is très basic. It turns out that AI systems do something similar: Large language models infer cultural identity from names, shaping their responses based on presumed backgrounds. But is this helpful personalization or a reinforcement of stereotypes? In our latest paper, we explored this question by testing DeepSeek, Llama, Aya, Mistral-Nemo, and GPT-4o-mini on how they associate names with cultural identities. We analysed 900 names from 30 cultures and found strong assumptions baked into AI responses: some cultures were overrepresented, while others barely registered. For example, a name like "Jun" often triggered Japan-related responses, while "Carlos" was linked primarily to Mexico, even though these names exist in multiple countries. Meanwhile, names from places like Ireland led to more generic answers, suggesting weaker associations in the training data. This has real implications for AI fairness: How should AI systems personalize without stereotyping? Should they adapt at all based on a name? Work with some of my favourite researchers: @sidicity Arnav Arora and @IAugenstein Read the full paper here: https://huggingface.co/papers/2502.11995

liked a Space 3 days ago

nanotron/ultrascale-playbook

new activity 3 days ago

benjamin-paine/Lumina-Image-2.0:Upgradio

View all activity

Organizations

benjamin-paine's activity

reacted to frimelle's post with ❤️ 3 days ago

Post

2275

What’s in a name? More than you might think, especially for AI.
Whenever I introduce myself, people often start speaking French to me, even though my French is très basic. It turns out that AI systems do something similar:
Large language models infer cultural identity from names, shaping their responses based on presumed backgrounds. But is this helpful personalization or a reinforcement of stereotypes?
In our latest paper, we explored this question by testing DeepSeek, Llama, Aya, Mistral-Nemo, and GPT-4o-mini on how they associate names with cultural identities. We analysed 900 names from 30 cultures and found strong assumptions baked into AI responses: some cultures were overrepresented, while others barely registered.
For example, a name like "Jun" often triggered Japan-related responses, while "Carlos" was linked primarily to Mexico, even though these names exist in multiple countries. Meanwhile, names from places like Ireland led to more generic answers, suggesting weaker associations in the training data.
This has real implications for AI fairness: How should AI systems personalize without stereotyping? Should they adapt at all based on a name?
Work with some of my favourite researchers: @sidicity Arnav Arora and @IAugenstein
Read the full paper here: Presumed Cultural Identity: How Names Shape LLM Responses (2502.11995)

reacted to clem's post with 🤗 6 days ago

Post

3289

We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.

Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.

Have you been using any integration and how can we make it better?

https://huggingface.co/blog/inference-providers

posted an update 9 days ago

Post

1772

Zonos is flying up the trending tab, and for good reason - it's the most expressive and emotive open-source TTS I've used to date. I'm happy to say it's now supported in Taproot, with added long-form synthesis support and other goodies.

Try it here: https://huggingface.co/spaces/benjamin-paine/zonos-longform

Getting started with Zonos in Taproot is easy; with a working CUDA toolkit and Python/Pip installation, all you have to do is:

apt install espeak-ng
pip install taproot
taproot install speech-synthesis:zonos-transformer
taproot invoke speech-synthesis:zonos-transformer --text "Hello, world!"

See more on GitHub at https://github.com/painebenjamin/taproot/

2 replies

replied to Xenova's post 15 days ago

Yup! That stays one chunk.

chunker.push("Last week she said, “Hi there. How are you?”");
chunker.flush()

Emitting "Last week she said, “Hi there. How are you?”"

The only exception is with newlines - I wanted it to emit when a newline was encountered.

chunker.push("Last week she said,\n“Hi there. How are you?”");
chunker.flush()

Emitting "Last week she said,"
Emitting "“Hi there. How are you?”"

If you want to disable this behavior, pass in {emitParagraphs: false} to the constructor, i.e.:

const chunker = new SentenceChunker({emitParagraphs: false});

There's also chunkLength to determine the character length maximum (128 by default), and emitTrimmed on whether or not each emit should trim leading/trailing whitespace (default true.) One last thing, if your input is always growing - like if you're streaming one response and just concatenating it as one big string - you can use GrowingSentenceChunker instead (in the same file.) Example:

const chunker = new GrowingSentenceChunker();
chunker.onChunk((chunk) => { console.log(`Emitting "${chunk}"`); });
chunker.push("Last week");
chunker.push("Last week she said");
chunker.push("Last week she said, “Hi there. How are you?”");
chunker.flush()

Emitting "Last week she said, “Hi there. How are you?”"

And just in case it's not obvious, the .flush() call will just emit anything left in the buffer, even if it's shorter than the maximum length. If you don't call .flush(), it will wait for another input that pushes it over the chunk limit before emitting again.

replied to Xenova's post 16 days ago

I spent a bit of time working on a JavaScript sentence splitter - it might work right out of the box for this purpose! It tries to split on punctuation when possible for smooth flow, but has a max length option to ensure run-on sentences still get split, too. It also maintains a buffer so you can just keep pushing streaming text into it and it will emit when it has a full chunk.

https://raw.githubusercontent.com/painebenjamin/anachrovox/refs/heads/main/www/sentence.js

Example:

const chunker = new SentenceChunker();
chunker.onChunk((sentenceChunk) => { console.log(`Emitting "${sentenceChunk}"`); });
chunker.push("The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.");
chunker.flush()

Output:

Emitting "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
Emitting "The best performing models also connect the encoder and decoder through an attention mechanism."
Emitting "We propose a new simple network architecture, the Transformer, based solely on attention mechanisms,"
Emitting "dispensing with recurrence and convolutions entirely."
Emitting "Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train."

reacted to Xenova's post with 🔥 16 days ago

Post

7074

We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?

9 replies

reacted to odellus's post with 🧠 24 days ago

Post

1520

Tired: shitposting on bsky
Wired: shitposting on hf

1 reply

reacted to hexgrad's post with 🚀 25 days ago

Post

8338

hexgrad/Kokoro-82M got an upgrade! ⬆️ More voices, more languages, pip install kokoro, and still 82M parameters.

GitHub: https://github.com/hexgrad/kokoro
PyPI: https://pypi.org/project/kokoro/
Space: hexgrad/Kokoro-TTS

11 replies

reacted to clem's post with 🤗 27 days ago

Post

7180

AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!

reacted to sayakpaul's post with 🤗 27 days ago

Post

1939

We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen

replied to mitkox's post about 1 month ago

Thanks for doing this! I've been all-in on llama.cpp for awhile now but I would be lying if I said I didn't wonder if I was missing out on anything with other engines.

reacted to sequelbox's post with 👍 about 1 month ago

Post

2356

A general FYI that Valiant Labs no longer has an X account. This is a business decision. Many other businesses seem to be making the same decision right now.

You can follow my account on Bluesky for updates on Shining Valiant 3, other Valiant Labs models, my open-source datasets, etc: https://bsky.app/profile/sequelbox.bsky.social

back to building :)

reacted to merve's post with ❤️ about 1 month ago

Post

2590

Everything that happened this week in open AI, a recap 🤠 merve/jan-17-releases-678a673a9de4a4675f215bf5

👀 Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance

💬 LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens 🤯
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D 🧙🏻‍♂️
- ReaderLM-v2 is a new HTML parsing model by Jina AI

- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3

🖼️ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture

🗣️ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities

📖 Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm