1 63 23

Krinal Joshi

krinal

kjdeveloper8

AI & ML interests

NLP, Speech

Recent Activity

reacted to nyuuzyou's post with 👍 2 days ago

🌐 Fandom.com Community Dataset - https://huggingface.co/datasets/nyuuzyou/fandom A comprehensive collection of 7.04M wiki pages from Fandom.com communities featuring: - Full article content and metadata from current pages - Rich structural data including templates, categories, and links - Multilingual content across 40+ languages - Complete metadata including titles and section structure Content is available under CC-BY-SA 3.0 license, allowing reuse with attribution and share-alike requirements. Key contents: - 7.04M wiki articles with full text - Metadata including templates, categories, sections - Internal and external link information - Multi-language support including major world languages The dataset provides a valuable resource for: - Text generation and classification tasks - Topic modeling and categorization - Cross-language information retrieval - Wiki structure analysis All content comes from public Fandom.com community wikis as of February 2025 and maintains original CC-BY-SA 3.0 licensing.

reacted to ychen's post with 👍 2 days ago

Here's some annoying keywords that 4o tends to use when responding to personal experiences with negative sentiments. Will be updated over time. `rough, tough, sound like, sounds like, frustrating, overwhelming`

reacted to lysandre's post with 👍 2 days ago

SmolVLM-2 and SigLIP-2 are now part of `transformers` in dedicated releases! They're added on top of the v4.49.0 release, and can be installed from the following tags: `v4.49.0-SmolVLM-2` and `v4.49.0-SigLIP-2`. This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures). Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models. Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.

View all activity

Organizations

krinal's activity

reacted to nyuuzyou's post with 👍 2 days ago

Post

1206

🌐 Fandom.com Community Dataset - nyuuzyou/fandom

A comprehensive collection of 7.04M wiki pages from Fandom.com communities featuring:
- Full article content and metadata from current pages
- Rich structural data including templates, categories, and links
- Multilingual content across 40+ languages
- Complete metadata including titles and section structure

Content is available under CC-BY-SA 3.0 license, allowing reuse with attribution and share-alike requirements.

Key contents:
- 7.04M wiki articles with full text
- Metadata including templates, categories, sections
- Internal and external link information
- Multi-language support including major world languages

The dataset provides a valuable resource for:
- Text generation and classification tasks
- Topic modeling and categorization
- Cross-language information retrieval
- Wiki structure analysis

All content comes from public Fandom.com community wikis as of February 2025 and maintains original CC-BY-SA 3.0 licensing.

reacted to ychen's post with 👍 2 days ago

Post

2254

Here's some annoying keywords that 4o tends to use when responding to personal experiences with negative sentiments. Will be updated over time.

rough, tough, sound like, sounds like, frustrating, overwhelming

4 replies

reacted to lysandre's post with 👍 2 days ago

Post

4065

SmolVLM-2 and SigLIP-2 are now part of transformers in dedicated releases!

They're added on top of the v4.49.0 release, and can be installed from the following tags: v4.49.0-SmolVLM-2 and v4.49.0-SigLIP-2.

This marks a new beginning for the release process of transformers. For the past five years, we've been doing monthly releases featuring many models (v4.49.0, the latest release, features 9 new architectures).

Starting with SmolVLM-2 & SigLIP2, we'll now additionally release tags supporting new models on a stable branch. These models are therefore directly available for use by installing from the tag itself. These tags will continue to be updated with fixes applied to these models.

Going forward, continue expecting software releases following semantic versioning: v4.50.0 will have ~10 new architectures compared to v4.49.0, as well as a myriad of new features, improvements and bug fixes. Accompanying these software releases, we'll release tags offering brand new models as fast as possible, to make them accessible to all immediately.

1 reply

reacted to merve's post with 👍 3 days ago

Post

4755

Google just released PaliGemma 2 Mix: new versatile instruction vision language models 🔥

> Three new models: 3B, 10B, 28B with res 224, 448 💙
> Can do vision language tasks with open-ended prompts, understand documents, and segment or detect anything 🤯

Read more https://huggingface.co/blog/paligemma2mix
Try the demo google/paligemma2-10b-mix
All models are here google/paligemma-2-mix-67ac6a251aaf3ee73679dcc4

upvoted an article 3 days ago

Article

PaliGemma 2 Mix - New Instruction Vision Language Models by Google

5 days ago

• 53

liked a model 4 days ago

stepfun-ai/Step-Audio-Chat

Audio-Text-to-Text • Updated 6 days ago • 713 • 358

reacted to clem's post with 👍 4 days ago

Post

2550

What are the best organizations to follow on @huggingface ?

On top of my head:
- Deepseek (35,000 followers): https://huggingface.co/deepseek-ai
- Meta Llama (27,000 followers): https://huggingface.co/meta-llama
- Black Forrest Labs (11,000 followers): https://huggingface.co/black-forest-labs
- OpenAI (5,000 followers): https://huggingface.co/openai
- Nvidia (16,000 followers): https://huggingface.co/nvidia
- MIcrosoft (9,000 followers): https://huggingface.co/microsoft
- AllenAI (2,000 followers): https://huggingface.co/allenai
- Mistral (5,000 followers): https://huggingface.co/mistralai
- XAI (600 followers): https://huggingface.co/xai-org
- Stability AI (16,000 followers): https://huggingface.co/stabilityai
- Qwen (16,000 followers): https://huggingface.co/Qwen
- GoogleAI (8,000 followers): https://huggingface.co/google
- Unsloth (3,000 followers): https://huggingface.co/unsloth
- Bria AI (4,000 followers): https://huggingface.co/briaai
- NousResearch (1,300 followers): https://huggingface.co/NousResearch

Bonus, the agent course org with 17,000 followers: https://huggingface.co/agents-course

1 reply

reacted to AdinaY's post with 👍 4 days ago

Post

2370

The latest paper of DeepSeek is now available on the Daily Papers page 🚀
You can reach out to the authors directly on this page👇
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)

1 reply

reacted to AdinaY's post with 👍 4 days ago

Post

4114

🚀 StepFun阶跃星辰 is making BIG open moves!

Last year, their GOT-OCR 2.0 took the community by storm 🔥but many didn’t know they were also building some amazing models. Now, they’ve just dropped something huge on the hub!

📺 Step-Video-T2V: a 30B bilingual open video model that generates 204 frames (8-10s) at 540P resolution with high information density & consistency.
stepfun-ai/stepvideo-t2v

🔊 Step-Audio-TTS-3B : a TTS trained with the LLM-Chat paradigm on a large synthetic dataset, capable of generating RAP & Humming
stepfun-ai/step-audio-67b33accf45735bb21131b0b

3 replies

reacted to Pendrokar's post with 👍 5 days ago

Post

1829

TTS: Added both Zonos model Spaces to the Arena Fork:
🏆 Pendrokar/TTS-Spaces-Arena

Zonos 🌍: Steveeeeeeen/Zonos

IMHO while Zonos still has fewer amount of capabilities than xVASynth, I'm still giving it the first row at the TTS capabilities table due to IPA and insta-clone/zero-shot:
1️⃣ Pendrokar/open_tts_tracker

upvoted an article 6 days ago

Article

How to generate text: using different decoding methods for language generation with Transformers

Mar 1, 2020

• 157

reacted to louisbrulenaudet's post with 👍 6 days ago

Post

2988

I am pleased to introduce my first project built upon Hugging Face’s smolagents framework, integrated with Alpaca for financial market analysis automation 🦙🤗

The project implements technical indicators such as the Relative Strength Index (RSI) and Bollinger Bands to provide momentum and volatility analysis. Market data is retrieved through the Alpaca API, enabling access to historical price information across various timeframes.

AI-powered insights are generated using Hugging Face’s inference API, facilitating the analysis of market trends through natural language processing with DuckDuckGo search integration for real-time sentiment analysis based on financial news 🦆

Link to the GitHub project: https://github.com/louisbrulenaudet/agentic-market-tool

reacted to Jaward's post with 👍 6 days ago

Post

3781

Finally here it is: a faster, custom, scalable GRPO trainer for smaller models with < 500M params, can train on 8gb ram cpu, also supports gpu for sanity sake (includes support for vllm + flash attention). Using smolLM2-135M/360M-instructs as ref & base models. Experience your own “aha” moment 🐳 on 8gb ram.
Code: https://github.com/Jaykef/ai-algorithms/blob/main/smollm2_360M_135M_grpo_gsm8k.ipynb

2 replies

reacted to schuler's post with 👍 6 days ago

Post

3342

🔮 GPT-3 implemented in pure Free Pascal!
https://github.com/joaopauloschuler/gpt-3-for-pascal

This implementation follows the GPT-3 Small architecture from the landmark paper "Language Models are Few-Shot Learners":

┌─────────────────────────┐
│     Input Layer       │
├─────────────────────────┤
│ Token & Positional    │
│     Embedding         │
├─────────────────────────┤
│   12x Transformer     │
│      Blocks           │
│  - 12 heads           │
│  - 768 hidden dims    │
│  - 3072 intermediate  │
├─────────────────────────┤
│   Output Layer        │
└─────────────────────────┘

Clean Pascal Implementation

for CntLayer := 1 to {Layers=}12 do
begin
  Result.AddTransformerBlockCAI(
    {Heads=}12, 
    {intermediate dimensions=}4*768, 
    {NoForward=}true, 
    {HasNorm=}true, 
    false
  );
end;

reacted to burtenshaw's post with 👍 6 days ago

Post

3086

NEW COURSE! We’re cooking hard on Hugging Face courses, and it’s not just agents. The NLP course is getting the same treatment with a new chapter on Supervised Fine-Tuning!

👉 Follow to get more updates https://huggingface.co/nlp-course

The new SFT chapter will guide you through these topics:

1️⃣ Chat Templates: Master the art of structuring AI conversations for consistent and helpful responses.

2️⃣ Supervised Fine-Tuning (SFT): Learn the core techniques to adapt pre-trained models to your specific outputs.

3️⃣ Low Rank Adaptation (LoRA): Discover efficient fine-tuning methods that save memory and resources.

4️⃣ Evaluation: Measure your model's performance and ensure top-notch results.

This is the first update in a series, so follow along if you’re upskilling in AI.

2 replies

upvoted an article 9 days ago

Article

From Chunks to Blocks: Accelerating Uploads and Downloads on the Hub

12 days ago

• 48

reacted to AdinaY's post with 👍 9 days ago

Post

3528

InspireMusic 🎵🔥 an open music generation framework by Alibaba FunAudio Lab
Model: FunAudioLLM/InspireMusic-1.5B-Long
Demo: FunAudioLLM/InspireMusic
✨ Music, songs, audio - ALL IN ONE
✨ High quality audio: 24kHz & 48kHz sampling rates
✨ Long-Form Generation: enables extended audio creation
✨ Efficient Fine-Tuning: precision (BF16, FP16, FP32) with user-friendly scripts

1 reply

reacted to lewtun's post with 👍 12 days ago

Post

4474

Introducing OpenR1-Math-220k!

open-r1/OpenR1-Math-220k

The community has been busy distilling DeepSeek-R1 from inference providers, but we decided to have a go at doing it ourselves from scratch 💪

What’s new compared to existing reasoning datasets?

♾ Based on AI-MO/NuminaMath-1.5: we focus on math reasoning traces and generate answers for problems in NuminaMath 1.5, an improved version of the popular NuminaMath-CoT dataset.

🐳 800k R1 reasoning traces: We generate two answers for 400k problems using DeepSeek R1. The filtered dataset contains 220k problems with correct reasoning traces.

📀 512 H100s running locally: Instead of relying on an API, we leverage vLLM and SGLang to run generations locally on our science cluster, generating 180k reasoning traces per day.

⏳ Automated filtering: We apply Math Verify to only retain problems with at least one correct answer. We also leverage Llama3.3-70B-Instruct as a judge to retrieve more correct examples (e.g for cases with malformed answers that can’t be verified with a rules-based parser)

📊 We match the performance of DeepSeek-Distill-Qwen-7B by finetuning Qwen-7B-Math-Instruct on our dataset.

🔎 Read our blog post for all the nitty gritty details: https://huggingface.co/blog/open-r1/update-2

reacted to mmhamdy's post with 👍 12 days ago

Post

2949

⛓ Evaluating Long Context #2: SCROLLS and ZeroSCROLLS

In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general.

📜 The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer?

1️⃣ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents.
2️⃣ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business.
3️⃣ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models.

Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include:

1️⃣ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries.
2️⃣ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers.

💡 What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments.

- SCROLLS Paper: SCROLLS: Standardized CompaRison Over Long Language Sequences (2201.03533)
- ZeroSCROLLS Paper: ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding (2305.14196)

reacted to burtenshaw's post with 👍 12 days ago

Post

8857

The Hugging Face agents course is finally out!

👉 https://huggingface.co/agents-course

This first unit of the course sets you up with all the fundamentals to become a pro in agents.

- What's an AI Agent?
- What are LLMs?
- Messages and Special Tokens
- Understanding AI Agents through the Thought-Action-Observation Cycle
- Thought, Internal Reasoning and the Re-Act Approach
- Actions, Enabling the Agent to Engage with Its Environment
- Observe, Integrating Feedback to Reflect and Adapt