Benjamin Paine's picture

Benjamin Paine PRO

benjamin-paine

AI & ML interests

A software engineer with an AI habit

Recent Activity

View all activity

Organizations

Taproot AI's profile picture

benjamin-paine's activity

reacted to frimelle's post with ❤️ 3 days ago
view post
Post
2275
What’s in a name? More than you might think, especially for AI.
Whenever I introduce myself, people often start speaking French to me, even though my French is très basic. It turns out that AI systems do something similar:
Large language models infer cultural identity from names, shaping their responses based on presumed backgrounds. But is this helpful personalization or a reinforcement of stereotypes?
In our latest paper, we explored this question by testing DeepSeek, Llama, Aya, Mistral-Nemo, and GPT-4o-mini on how they associate names with cultural identities. We analysed 900 names from 30 cultures and found strong assumptions baked into AI responses: some cultures were overrepresented, while others barely registered.
For example, a name like "Jun" often triggered Japan-related responses, while "Carlos" was linked primarily to Mexico, even though these names exist in multiple countries. Meanwhile, names from places like Ireland led to more generic answers, suggesting weaker associations in the training data.
This has real implications for AI fairness: How should AI systems personalize without stereotyping? Should they adapt at all based on a name?
Work with some of my favourite researchers: @sidicity Arnav Arora and @IAugenstein
Read the full paper here: Presumed Cultural Identity: How Names Shape LLM Responses (2502.11995)
reacted to clem's post with 🤗 6 days ago
view post
Post
3289
We crossed 1B+ tokens routed to inference providers partners on HF, that we released just a few days ago.

Just getting started of course but early users seem to like it & always happy to be able to partner with cool startups in the ecosystem.

Have you been using any integration and how can we make it better?

https://huggingface.co/blog/inference-providers
posted an update 9 days ago
view post
Post
1772
Zonos is flying up the trending tab, and for good reason - it's the most expressive and emotive open-source TTS I've used to date. I'm happy to say it's now supported in Taproot, with added long-form synthesis support and other goodies.

Try it here: https://huggingface.co/spaces/benjamin-paine/zonos-longform

Getting started with Zonos in Taproot is easy; with a working CUDA toolkit and Python/Pip installation, all you have to do is:
apt install espeak-ng
pip install taproot
taproot install speech-synthesis:zonos-transformer
taproot invoke speech-synthesis:zonos-transformer --text "Hello, world!"

See more on GitHub at https://github.com/painebenjamin/taproot/
  • 2 replies
·
replied to Xenova's post 15 days ago
view reply

Yup! That stays one chunk.

chunker.push("Last week she said, “Hi there. How are you?”");
chunker.flush()
Emitting "Last week she said, “Hi there. How are you?”"

The only exception is with newlines - I wanted it to emit when a newline was encountered.

chunker.push("Last week she said,\n“Hi there. How are you?”");
chunker.flush()
Emitting "Last week she said,"
Emitting "“Hi there. How are you?”"

If you want to disable this behavior, pass in {emitParagraphs: false} to the constructor, i.e.:

const chunker = new SentenceChunker({emitParagraphs: false});

There's also chunkLength to determine the character length maximum (128 by default), and emitTrimmed on whether or not each emit should trim leading/trailing whitespace (default true.) One last thing, if your input is always growing - like if you're streaming one response and just concatenating it as one big string - you can use GrowingSentenceChunker instead (in the same file.) Example:

const chunker = new GrowingSentenceChunker();
chunker.onChunk((chunk) => { console.log(`Emitting "${chunk}"`); });
chunker.push("Last week");
chunker.push("Last week she said");
chunker.push("Last week she said, “Hi there. How are you?”");
chunker.flush()
Emitting "Last week she said, “Hi there. How are you?”"

And just in case it's not obvious, the .flush() call will just emit anything left in the buffer, even if it's shorter than the maximum length. If you don't call .flush(), it will wait for another input that pushes it over the chunk limit before emitting again.

replied to Xenova's post 16 days ago
view reply

I spent a bit of time working on a JavaScript sentence splitter - it might work right out of the box for this purpose! It tries to split on punctuation when possible for smooth flow, but has a max length option to ensure run-on sentences still get split, too. It also maintains a buffer so you can just keep pushing streaming text into it and it will emit when it has a full chunk.

https://raw.githubusercontent.com/painebenjamin/anachrovox/refs/heads/main/www/sentence.js

Example:

const chunker = new SentenceChunker();
chunker.onChunk((sentenceChunk) => { console.log(`Emitting "${sentenceChunk}"`); });
chunker.push("The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train.");
chunker.flush()

Output:

Emitting "The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration."
Emitting "The best performing models also connect the encoder and decoder through an attention mechanism."
Emitting "We propose a new simple network architecture, the Transformer, based solely on attention mechanisms,"
Emitting "dispensing with recurrence and convolutions entirely."
Emitting "Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train."
reacted to Xenova's post with 🔥 16 days ago
view post
Post
7074
We did it. Kokoro TTS (v1.0) can now run 100% locally in your browser w/ WebGPU acceleration. Real-time text-to-speech without a server. ⚡️

Generate 10 seconds of speech in ~1 second for $0.

What will you build? 🔥
webml-community/kokoro-webgpu

The most difficult part was getting the model running in the first place, but the next steps are simple:
✂️ Implement sentence splitting, allowing for streamed responses
🌍 Multilingual support (only phonemization left)

Who wants to help?
·
reacted to odellus's post with 🧠 24 days ago
view post
Post
1520
Tired: shitposting on bsky
Wired: shitposting on hf
  • 1 reply
·
reacted to hexgrad's post with 🚀 25 days ago
reacted to clem's post with 🤗 27 days ago
view post
Post
7180
AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!
reacted to sayakpaul's post with 🤗 27 days ago
view post
Post
1939
We have authored a post to go over the state of video generation in the Diffusers ecosystem 🧨

We cover the models supported, the knobs of optims our users can fire, fine-tuning, and more 🔥

5-6GBs for HunyuanVideo, sky is the limit 🌌 🤗
https://huggingface.co/blog/video_gen
replied to mitkox's post about 1 month ago
view reply

Thanks for doing this! I've been all-in on llama.cpp for awhile now but I would be lying if I said I didn't wonder if I was missing out on anything with other engines.

reacted to sequelbox's post with 👍 about 1 month ago
view post
Post
2356
A general FYI that Valiant Labs no longer has an X account. This is a business decision. Many other businesses seem to be making the same decision right now.

You can follow my account on Bluesky for updates on Shining Valiant 3, other Valiant Labs models, my open-source datasets, etc: https://bsky.app/profile/sequelbox.bsky.social

back to building :)
reacted to merve's post with ❤️ about 1 month ago
view post
Post
2590
Everything that happened this week in open AI, a recap 🤠 merve/jan-17-releases-678a673a9de4a4675f215bf5

👀 Multimodal
- MiniCPM-o 2.6 is a new sota any-to-any model by OpenBMB
(vision, speech and text!)
- VideoChat-Flash-Qwen2.5-2B is new video multimodal models by OpenGVLab that come in sizes 2B & 7B in resolutions 224 & 448
- ByteDance released larger SA2VA that comes in 26B parameters
- Dataset: VRC-Bench is a new diverse benchmark for multimodal LLM reasoning performance

💬 LLMs
- MiniMax-Text-01 is a new huge language model (456B passive 45.9B active params) by MiniMaxAI with context length of 4M tokens 🤯
- Dataset: Sky-T1-data-17k is a diverse dataset used to train Sky-T1-32B
- kyutai released Helium-1-Preview-2B is a new small multilingual LM
- Wayfarer-12B is a new LLM able to write D&D 🧙🏻‍♂️
- ReaderLM-v2 is a new HTML parsing model by Jina AI

- Dria released, Dria-Agent-a-3B, new agentic coding model (Pythonic function calling) based on Qwen2.5 Coder
- Unsloth released Phi-4, faster and memory efficient Llama 3.3

🖼️ Vision
- MatchAnything is a new foundation model for matching
- FitDit is a high-fidelity VTON model based on DiT architecture

🗣️ Audio
- OuteTTS-0.3-1B is a new multilingual text-to-speech model with voice cloning and emotion control capabilities

📖 Retrieval
- lightblue released a new reranker based on Qwen2.5 LB-reranker-0.5B-v1.0 that can handle 95+ languages
- cde-small-v2 is a new sota small retrieval model by
@jxm
replied to their post about 2 months ago
view reply

Hello again @JLouisBiz !

I've updated the spaces, they now use Kokoro instead of XTTS. It's drastically faster. Additionally, since the TTS is so much faster, I felt comfortable extended the output to 1024 tokens.

replied to their post about 2 months ago
view reply

Hello! It's currently clipped at 512 tokens for output, so yes it won't be suitable for very long generation. It's also a very tiny model - Llama 3.2 3B - so definitely more for conversation and less for completing tasks.

I'm going to try and swap in Kokoro TTS which should be faster on these small machines. Thanks for taking the time to test.

replied to their post about 2 months ago
view reply

I'm sorry that it's not working for you - can you make sure you've given it permission to use your microphone and that you're using the correct one (if you have multiple)? There should be an icon in the corner like this (in chrome) you can click on which should let you select microphones and check levels. Whenever I've had trouble activating it, I've always found I was using the wrong microphone or my voice volume was way far down.

image.png

If you're using a browser other than Chrome please let me know, I've tested it in others but there could always be something I'm missing.

replied to their post about 2 months ago
view reply

Regarding the indicators in the bottom right,

  • If the "recording" light doesn't turn on (the top one,) then it did not hear you utter a wake phrase.
  • If the "listening" light does turn on, it detects voice activity, but unless you utter a wake phrase it will not send the recording for transcription and completion.

So in short, if you say "Hex Vox, what's the news?" and you don't see the recording light turn on, then it didn't catch the wake phrase and you have to try again.

If instead you just want to speak your command without relying on wake phrase recognition, you can just click the "Call" button - that will start recording immediately and always send the audio for transcription.

This project was the one that set me off on making the wake phrase model in the first place. At first I didn't have it and relied instead on voice activity detection and transcription, however this performs extremely poorly in noisy environments or any kind of muted speech, with near-constant accidental activation. The only efficient way to be always-on AND hands-free was to use a front-end wake-word model to gate the rest of the audio workflow.

replied to their post about 2 months ago
posted an update about 2 months ago
view post
Post
2724
Hello HuggingFace 🤗, and happy new year! 🎆

I'm thrilled to be releasing the first iteration of a project I've been working on for quite awhile now. It's called Taproot, and it's a seamlessly scalable open-source AI/ML inference engine designed for letting developers build real-time experiences clustered across a small-to-mid-sized cluster, without the burden of hyperscale infrastructure.

Along with the server and task framework is a client library for node and the browser. And what good is a server and client without an app to go alongside it? To that end, I'm also releasing Anachrovox, a fun, real-time hands-free voice assistant that can run on mid-level devices in <12GB VRAM, with web search, weather, and other tools. It uses my real-time browser wake-word library to detect utterances of the phrase 'Hey Vox', 'Hi Vox', 'Okay Vox', 'Anachrovox' or just 'Vox' (alongside some others.)

Releasing this many things at once will definitely result in bugs, so please report them when sighted! Thank you all!

Taproot: https://github.com/painebenjamin/taproot
Taproot JS Client: https://github.com/painebenjamin/taproot.js
Anachrovox: https://github.com/painebenjamin/anachrovox

The Anachrovox Spaces are networked together, balancing load across them to keep all front-ends responsive. You only have to choose what color you like the most!

https://huggingface.co/spaces/benjamin-paine/anachrovox
https://huggingface.co/spaces/benjamin-paine/anachrovox-amber
  • 12 replies
·
reacted to nicolay-r's post with ❤️ about 2 months ago
view post
Post
2136
📢 Deligted to share the most recent milestone on quick deployment of Named Entity Recognition (NER) in Gen-AI powered systems.

Releasing the bulk-ner 0.25.0 which represent a tiny framework that would save you time for deploing NER with any model.

💎 Why is this important? In the era of GenAI the handling out textual output might be challenging. Instead, recognizing named-entities via domain-oriented systems for your donwstream LLM would be preferable option.

📦: https://pypi.org/project/bulk-ner/0.25.0/
🌟: https://github.com/nicolay-r/bulk-ner

I noticed that the direct adaptaion of the LM for NER would result in spending signifcant amount of time on formatting your texts according to the NER-model needs.
In particular:
1. Processing CONLL format with B-I-O tags from model outputs
2. Input trimming: long input content might not be completely fitted

To cope with these problems, in version 0.25.0 I made a huge steps forward by providing:
✅ 🐍 Python API support: see screenshot below for a quick deployment (see screenshot below 📸)
✅ 🪶 No-string: dependencies are now clear, so it is purely Python implementation for API calls.
✅ 👌 Simplified output formatting: we use lists to represent texts with inner lists that refer to annotated objects (see screenshot below 📸)

📒 We have a colab for a quick start here (or screenshot for bash / Python API 📸)
https://colab.research.google.com/github/nicolay-r/ner-service/blob/main/NER_annotation_service.ipynb

👏 The code for pipeline deployment is taken from the AREkit project:
https://github.com/nicolay-r/AREkit