Victor Mustar PRO
victor
AI & ML interests
Building the UX of this website
Recent Activity
updated
a Space
about 12 hours ago
victor/orcs-in-the-forest
liked
a model
about 13 hours ago
ByteDance-Seed/Seed-OSS-36B-Instruct
Organizations

reacted to
AdinaY's
post with 🔥
about 10 hours ago

reacted to
neph1's
post with 🔥
15 days ago
Post
3193
I'm building a mmo-ish RPG with LLM agents that can (hopefully) complete player tasks, as an experiment. I've started documenting my progress here: https://huggingface.co/blog/neph1/rpg-llm-agents
Let me know if you want to see more of it.
Let me know if you want to see more of it.

reacted to
fdaudens's
post with ❤️
15 days ago
Post
2600
Well, it took just 2 hours for
openai/gpt-oss-120b to hit #1 on Hugging Face. Don’t remember seeing anything rise that fast!

reacted to
JingzeShi's
post with 🤗
16 days ago
Post
3998
Trainable selective sampling and sparse attention kernels are indispensable in the era of context engineering. We hope our work will be helpful to everyone! 🤗
Trainable Dynamic Mask Sparse Attention (2508.02124)
Trainable Dynamic Mask Sparse Attention (2508.02124)

reacted to
mrs83's
post with 🔥
28 days ago
Post
1041
Hello Hugging Face Community! I'm excited to share a project I've been working on: SkinCancerViT, a multimodal Vision Transformer model for skin lesion analysis
ethicalabs/SkinCancerViT
I've wrapped it in a Gradio app to make it easy to explore: ethicalabs/SkinCancerViTPredictor
This app is a research demonstration that combines dermatoscopic images with patient age and lesion localization to assist in classifying skin lesions.
You can either upload your own image and patient data for a prediction, or explore how the model performs on random samples from the marmal88/skin_cancer dataset.
I firmly believe that the only final, trustworthy diagnosis comes from medical professionals, and I am actively seeking medical institutions and researchers who might be interested in partnering with me to further explore the usage of this methodology, conducting further training with diverse datasets (ethically sourced and anonymized), performing extensive validation tests, and explore the possibility of running a federated fine-tuning simulation with https://flower.ai/
As a software engineer, I do not possess medical expertise and I am seeking collaboration with medical professionals and AI/ML researchers. You can find the project source code, which includes data preprocessing, model training and testing, at the following url: https://github.com/ethicalabs-ai/SkinCancerViT/tree/main
Thank you for your time and consideration!!!
I've wrapped it in a Gradio app to make it easy to explore: ethicalabs/SkinCancerViTPredictor
This app is a research demonstration that combines dermatoscopic images with patient age and lesion localization to assist in classifying skin lesions.
You can either upload your own image and patient data for a prediction, or explore how the model performs on random samples from the marmal88/skin_cancer dataset.
I firmly believe that the only final, trustworthy diagnosis comes from medical professionals, and I am actively seeking medical institutions and researchers who might be interested in partnering with me to further explore the usage of this methodology, conducting further training with diverse datasets (ethically sourced and anonymized), performing extensive validation tests, and explore the possibility of running a federated fine-tuning simulation with https://flower.ai/
As a software engineer, I do not possess medical expertise and I am seeking collaboration with medical professionals and AI/ML researchers. You can find the project source code, which includes data preprocessing, model training and testing, at the following url: https://github.com/ethicalabs-ai/SkinCancerViT/tree/main
Thank you for your time and consideration!!!

reacted to
MohamedRashad's
post with 🚀
about 1 month ago
Post
1832
For anyone who wants to try the new Voxtral models, you can do this from here:
MohamedRashad/Voxtral
Also you can find the transformers version of them here:
MohamedRashad/Voxtral-Mini-3B-2507-transformers
MohamedRashad/Voxtral-Small-24B-2507-transformers
MohamedRashad/Voxtral
Also you can find the transformers version of them here:
MohamedRashad/Voxtral-Mini-3B-2507-transformers
MohamedRashad/Voxtral-Small-24B-2507-transformers

reacted to
DualityAI-RebekahBogdanoff's
post with 🚀
about 1 month ago
Post
3216
📢 Generate your own data in simulation using two new free and customizable data-generating Scenarios on Duality's FalconCloud service.
🙌 These multi-class Scenarios are designed to target model weaknesses for our recent Kaggle competition, but they are free to anyone for non-commercial use! Just create a free account.
📸 Control object and camera posing
👉 Select random variable ranges
🖼️ Set post-processing effects
➕ and more to create a robust dataset for strong model training.
Access the 2 Scenarios here:
💠 https://falcon.duality.ai/secure/scenarios/edit/9e90e036-8af9-41e4-8af0-1343b8e8f467?utm_source=Kaggle&utm_medium=post&utm_campaign=competition_4
💠 https://falcon.duality.ai/secure/scenarios/edit/e3294c19-49d4-4f64-9ca8-8373876c2c94?utm_source=Kaggle&utm_medium=post&utm_campaign=competition_4
🙌 These multi-class Scenarios are designed to target model weaknesses for our recent Kaggle competition, but they are free to anyone for non-commercial use! Just create a free account.
📸 Control object and camera posing
👉 Select random variable ranges
🖼️ Set post-processing effects
➕ and more to create a robust dataset for strong model training.
Access the 2 Scenarios here:
💠 https://falcon.duality.ai/secure/scenarios/edit/9e90e036-8af9-41e4-8af0-1343b8e8f467?utm_source=Kaggle&utm_medium=post&utm_campaign=competition_4
💠 https://falcon.duality.ai/secure/scenarios/edit/e3294c19-49d4-4f64-9ca8-8373876c2c94?utm_source=Kaggle&utm_medium=post&utm_campaign=competition_4

reacted to
azettl's
post with 🔥
about 1 month ago
Post
492
𝗚𝗿𝗮𝗱𝗶𝗼 𝗔𝗴𝗲𝗻𝘁𝘀 & 𝗠𝗖𝗣 𝗛𝗮𝗰𝗸𝗮𝘁𝗵𝗼𝗻 - 𝗙𝗶𝗻𝗮𝗹 𝗗𝗮𝘆
Submission deadline is in 10 minutes, so here's where Consilium ended up after a week of building.
What started as a simple idea, "𝘞𝘩𝘢𝘵 𝘪𝘧 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 𝘈𝘐 𝘮𝘰𝘥𝘦𝘭𝘴 𝘤𝘰𝘶𝘭𝘥 𝘥𝘪𝘴𝘤𝘶𝘴𝘴 𝘢𝘯𝘥 𝘳𝘦𝘢𝘤𝘩 𝘤𝘰𝘯𝘴𝘦𝘯𝘴𝘶𝘴?" turned into a full multi-AI expert platform with live research integration.
𝗙𝗶𝗻𝗮𝗹 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀:
- Custom Gradio roundtable component with real-time speech bubbles
- MCP server mode
- Multiple AI models: Mistral Large, DeepSeek-R1, Meta-Llama-3.3-70B, QwQ-32B
- Research Agent with 5 sources: Web Search, Wikipedia, arXiv, GitHub, SEC EDGAR
- Different decision protocols and role assignments
𝗖𝘂𝗿𝗿𝗲𝗻𝘁 𝘀𝘁𝗮𝘁𝘂𝘀: 25 likes 👍 and some really good user feedback in the Discord channel. People are actually testing it on real decisions, which feels great. Also met some really awesome people during this week 🙌.
➡️ 𝗧𝗿𝘆 𝗶𝘁: Agents-MCP-Hackathon/consilium_mcp
Thanks to everyone who tested and gave feedback during the week ❤️. Win or lose, this was a fun deep dive into Gradio, smolagents, Hugging Face in general, SambaNova Systems and the Mistral AI API.
Also huge thanks to @victor 👏 who tweeted about the project and let me steal the video.
Submission deadline is in 10 minutes, so here's where Consilium ended up after a week of building.
What started as a simple idea, "𝘞𝘩𝘢𝘵 𝘪𝘧 𝘮𝘶𝘭𝘵𝘪𝘱𝘭𝘦 𝘈𝘐 𝘮𝘰𝘥𝘦𝘭𝘴 𝘤𝘰𝘶𝘭𝘥 𝘥𝘪𝘴𝘤𝘶𝘴𝘴 𝘢𝘯𝘥 𝘳𝘦𝘢𝘤𝘩 𝘤𝘰𝘯𝘴𝘦𝘯𝘴𝘶𝘴?" turned into a full multi-AI expert platform with live research integration.
𝗙𝗶𝗻𝗮𝗹 𝗳𝗲𝗮𝘁𝘂𝗿𝗲𝘀:
- Custom Gradio roundtable component with real-time speech bubbles
- MCP server mode
- Multiple AI models: Mistral Large, DeepSeek-R1, Meta-Llama-3.3-70B, QwQ-32B
- Research Agent with 5 sources: Web Search, Wikipedia, arXiv, GitHub, SEC EDGAR
- Different decision protocols and role assignments
𝗖𝘂𝗿𝗿𝗲𝗻𝘁 𝘀𝘁𝗮𝘁𝘂𝘀: 25 likes 👍 and some really good user feedback in the Discord channel. People are actually testing it on real decisions, which feels great. Also met some really awesome people during this week 🙌.
➡️ 𝗧𝗿𝘆 𝗶𝘁: Agents-MCP-Hackathon/consilium_mcp
Thanks to everyone who tested and gave feedback during the week ❤️. Win or lose, this was a fun deep dive into Gradio, smolagents, Hugging Face in general, SambaNova Systems and the Mistral AI API.
Also huge thanks to @victor 👏 who tweeted about the project and let me steal the video.

reacted to
Nymbo's
post with 👀
about 2 months ago
Post
2734
Anyone know how to reset Claude web's MCP config? I connected mine when the HF MCP first released with just the default example spaces added. I added lots of other MCP spaces but Claude.ai doesn't update the available tools... "Disconnecting" the HF integration does nothing, deleting it and adding it again does nothing.
Refreshing tools works fine in VS Code because I can manually restart it in
Refreshing tools works fine in VS Code because I can manually restart it in
mcp.json
, but claude.ai has no such option. Anyone got any ideas?

reacted to
sequelbox's
post with 🚀
about 2 months ago
Post
1856
The full Celestia 3 science-reasoning dataset is here!
- 91k high-quality synthetic science prompts answered by DeepSeek-R1-0528
- subjects include physics, biology, chemistry, computer science, Earth science, astronomy, and information theory
- one of the reasoning datasets powering the upcoming Shining Valiant 3 :) coming soon!
GET IT NOW, FOR EVERYONE: sequelbox/Celestia3-DeepSeek-R1-0528
SUPPORT OUR RELEASES: sequelbox/SupportOpenSource
with love,
allegra
- 91k high-quality synthetic science prompts answered by DeepSeek-R1-0528
- subjects include physics, biology, chemistry, computer science, Earth science, astronomy, and information theory
- one of the reasoning datasets powering the upcoming Shining Valiant 3 :) coming soon!
GET IT NOW, FOR EVERYONE: sequelbox/Celestia3-DeepSeek-R1-0528
SUPPORT OUR RELEASES: sequelbox/SupportOpenSource
with love,
allegra

reacted to
burtenshaw's
post with ❤️
about 2 months ago
Post
2883
Inference for generative ai models looks like a mine field, but there’s a simple protocol for picking the best inference:
🌍 95% of users >> If you’re using open (large) models and need fast online inference, then use Inference providers on auto mode, and let it choose the best provider for the model. https://huggingface.co/docs/inference-providers/index
👷 fine-tuners/ bespoke >> If you’ve got custom setups, use Inference Endpoints to define a configuration from AWS, Azure, GCP. https://endpoints.huggingface.co/
🦫 Locals >> If you’re trying to stretch everything you can out of a server or local machine, use Llama.cpp, Jan, LMStudio or vLLM. https://huggingface.co/settings/local-apps#local-apps
🪟 Browsers >> If you need open models running right here in the browser, use transformers.js. https://github.com/huggingface/transformers.js
Let me know what you’re using, and if you think it’s more complex than this.
🌍 95% of users >> If you’re using open (large) models and need fast online inference, then use Inference providers on auto mode, and let it choose the best provider for the model. https://huggingface.co/docs/inference-providers/index
👷 fine-tuners/ bespoke >> If you’ve got custom setups, use Inference Endpoints to define a configuration from AWS, Azure, GCP. https://endpoints.huggingface.co/
🦫 Locals >> If you’re trying to stretch everything you can out of a server or local machine, use Llama.cpp, Jan, LMStudio or vLLM. https://huggingface.co/settings/local-apps#local-apps
🪟 Browsers >> If you need open models running right here in the browser, use transformers.js. https://github.com/huggingface/transformers.js
Let me know what you’re using, and if you think it’s more complex than this.

reacted to
Jaward's
post with 👍
about 2 months ago
Post
2049
I played around with the new RXTX paper (XX^T) and was able to train nanogpt with 4x4 RXTX matmuls in both attention layer and optimizer🤕
It just works (well I had to add some guardrails) but still saves 5% of memory usage:
The Patch:
- Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod)
- Handles arbitrary sequence lengths by padding to the nearest multiple of 4.
- An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step.
- Uses 5% less ops
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb
Paper: https://arxiv.org/pdf/2505.09814
It just works (well I had to add some guardrails) but still saves 5% of memory usage:
The Patch:
- Computes attention scores with a 4x4 blockwise RXTX matmuls (no pytorch dot prod)
- Handles arbitrary sequence lengths by padding to the nearest multiple of 4.
- An RXTX variant of shampoo with params reshaped into 4x4 blocks during each optimizer step.
- Uses 5% less ops
Code: https://github.com/Jaykef/ai-algorithms/blob/main/nanogpt-rxtx.ipynb
Paper: https://arxiv.org/pdf/2505.09814

reacted to
arthurbresnu's
post with 🚀
about 2 months ago
Post
2190
‼️Sentence Transformers v5.0 is out! The biggest update yet introduces Sparse Embedding models, encode methods improvements, Router module & much more. Sparse + Dense = 🔥 hybrid search performance!
1️⃣ Sparse Encoder Models - New support for sparse embeddings (30k+ dims, <1% non-zero)
* Full SPLADE, Inference-free SPLADE, CSR support
* 4 new modules, 12 losses, 9 evaluators
* Integration with elastic, opensearch-project, Qdrant, ibm-granite
* Decode interpretable embeddings
* Hybrid search integration
2️⃣ Enhanced Encode Methods
* encode_query & encode_document with auto prompts
* Direct device list passing to encode()
* Cleaner multi-processing
3️⃣ Router Module & Training
* Different paths for queries vs documents
* Custom learning rates per parameter group
* Composite loss logging
* Perfect for two-tower architectures
4️⃣ Documentation & Training
* New Training/Loss Overview docs
* 6 training example pages
* Search engine integration examples
Read the comprehensive blogpost about training sparse embedding models: https://huggingface.co/blog/train-sparse-encoder
See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v5.0.0
What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!
I'm incredibly excited to see the community explore sparse embeddings and hybrid search! The interpretability alone makes this a game-changer for understanding what your models are actually doing.
🙏 Thanks to @tomaarsen for this incredible opportunity!
1️⃣ Sparse Encoder Models - New support for sparse embeddings (30k+ dims, <1% non-zero)
* Full SPLADE, Inference-free SPLADE, CSR support
* 4 new modules, 12 losses, 9 evaluators
* Integration with elastic, opensearch-project, Qdrant, ibm-granite
* Decode interpretable embeddings
* Hybrid search integration
2️⃣ Enhanced Encode Methods
* encode_query & encode_document with auto prompts
* Direct device list passing to encode()
* Cleaner multi-processing
3️⃣ Router Module & Training
* Different paths for queries vs documents
* Custom learning rates per parameter group
* Composite loss logging
* Perfect for two-tower architectures
4️⃣ Documentation & Training
* New Training/Loss Overview docs
* 6 training example pages
* Search engine integration examples
Read the comprehensive blogpost about training sparse embedding models: https://huggingface.co/blog/train-sparse-encoder
See the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/v5.0.0
What's next? We would love to hear from the community! What sparse encoder models would you like to see? And what new capabilities should Sentence Transformers handle - multimodal embeddings, late interaction models, or something else? Your feedback shapes our roadmap!
I'm incredibly excited to see the community explore sparse embeddings and hybrid search! The interpretability alone makes this a game-changer for understanding what your models are actually doing.
🙏 Thanks to @tomaarsen for this incredible opportunity!

reacted to
asigalov61's
post with 👍
about 2 months ago
Post
2377
Check out new symbolic music AI front end and CLI training app
https://webchatappai.github.io/midi-gen/
https://github.com/WebChatAppAi/Orpheus-Midi-Model-Maker
@Timzoid @Csplk @not-lain @victor @bartowski @John6666
https://webchatappai.github.io/midi-gen/
https://github.com/WebChatAppAi/Orpheus-Midi-Model-Maker
@Timzoid @Csplk @not-lain @victor @bartowski @John6666

reacted to
blaise-tk's
post with ❤️
about 2 months ago
Post
4429
Today we launch Dione.
A few months ago it was just a wild idea I shared with @bygimenez , now it's real.
Dione (Beta) is here, the easiest way to discover and install open-source apps, especially AI ones.
Think of it as the Steam of open source. Installing open-source tools is often a mess. Dione fixes that.
Beautiful UI and workflow. Soon multi-platform, multilingual & fully open-source.
Users can even write and share their own installation scripts. This is just the beginning.
🚀 Join our exclusive Beta
→ https://getdione.app/beta/join
A few months ago it was just a wild idea I shared with @bygimenez , now it's real.
Dione (Beta) is here, the easiest way to discover and install open-source apps, especially AI ones.
Think of it as the Steam of open source. Installing open-source tools is often a mess. Dione fixes that.
Beautiful UI and workflow. Soon multi-platform, multilingual & fully open-source.
Users can even write and share their own installation scripts. This is just the beginning.
🚀 Join our exclusive Beta
→ https://getdione.app/beta/join

reacted to
blaise-tk's
post with 🚀
about 2 months ago
Post
3146
A few months ago, I shared that I was building with
@deeivihh
something like "the Steam for open source apps"...
🚀 Today, I’m excited to announce that Dione is now open source and live in public beta!
Our mission is simple: make it easier to discover, use, and contribute to open source applications.
🔗 GitHub: https://github.com/dioneapp/dioneapp
💬 Join the community: https://discord.gg/JDFJp33vrM
Want to give it a try? I’d love your feedback! 👀
🚀 Today, I’m excited to announce that Dione is now open source and live in public beta!
Our mission is simple: make it easier to discover, use, and contribute to open source applications.
🔗 GitHub: https://github.com/dioneapp/dioneapp
💬 Join the community: https://discord.gg/JDFJp33vrM
Want to give it a try? I’d love your feedback! 👀

reacted to
jsulz's
post with 🚀
about 2 months ago
Post
4800
It's been a bit since I took a step back and looked at
xet-team
progress to migrate Hugging Face from Git LFS to Xet, but every time I do it boggles the mind.
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
🤗 700,000 users/orgs
📈 350,000 repos
🚀 15PB
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
These are hard numbers to put into context, but let's try:
The latest run of the Common Crawl from
commoncrawl
was 471 TB.
We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.
We're moving to a new phase in the process, so stay tuned.
This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.
I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski 👀)
Let me know if there's anything you're interested in; happy to dig in!

A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
🤗 700,000 users/orgs
📈 350,000 repos
🚀 15PB
Meanwhile, our migrations have pushed throughput to numbers that are bonkers. In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
These are hard numbers to put into context, but let's try:
The latest run of the Common Crawl from

We now have ~32 crawls stored in Xet. At peak upload speed we could move the latest crawl into Xet in about two hours.
We're moving to a new phase in the process, so stay tuned.
This shift in gears means it's also time to roll up our sleeves and look at all the bytes we have and the value we're adding to the community.
I already have some homework from @RichardErkhov to look at the dedupe across their uploads, and I'll be doing the same for other early adopters, big models/datasets, and frequent uploaders (looking at you @bartowski 👀)
Let me know if there's anything you're interested in; happy to dig in!

reacted to
yeonseok-zeticai's
post with 🚀
about 2 months ago
Post
5796
💫 Next-Level On-Device AI Showdown
🪽 See It to Believe It, How QWEN4b works at On-device environment without expensive GPU Cloud server?
We’ve crafted a side-by-side demo video showcasing both Jan-Nano and QWEN 4B in action—no more wondering which model reigns supreme. Click play, compare their speeds, accuracy, and memory footprints, and decide which one fits your needs best!
👋 Why You Can’t Miss This
We are actively creating runnable sLLM environments for On-device AI. You can just build On-device AI apps within few hours.
Including Jan-Nano, QWEN4b, there are several sLLM models ready to be used on your AI application!.
🤑 Please feel free to use, because it is free to use!.
Ready to Compare?
Watch now, draw your own conclusions, and let us know which model you’d deploy in your next edge-AI project! 🌍💡
#OnDeviceAI #EdgeAI #AIShowdown #MLOptimization #DemoVideo #AIComparison
🪽 See It to Believe It, How QWEN4b works at On-device environment without expensive GPU Cloud server?
We’ve crafted a side-by-side demo video showcasing both Jan-Nano and QWEN 4B in action—no more wondering which model reigns supreme. Click play, compare their speeds, accuracy, and memory footprints, and decide which one fits your needs best!
👋 Why You Can’t Miss This
We are actively creating runnable sLLM environments for On-device AI. You can just build On-device AI apps within few hours.
Including Jan-Nano, QWEN4b, there are several sLLM models ready to be used on your AI application!.
🤑 Please feel free to use, because it is free to use!.
Ready to Compare?
Watch now, draw your own conclusions, and let us know which model you’d deploy in your next edge-AI project! 🌍💡
#OnDeviceAI #EdgeAI #AIShowdown #MLOptimization #DemoVideo #AIComparison

reacted to
merve's
post with 🔥
2 months ago
Post
1942
stop using VLMs blindly ✋🏻
compare different VLM outputs on a huge variety of inputs (from reasoning to OCR!) 🔥 visionLMsftw/comparevlms
> has support for multiple VLMs: google/gemma-3-27b-it, Qwen/Qwen2.5-VL-7B-Instruct, Qwen/Qwen2.5-VL-32B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, HuggingFaceTB/SmolVLM2-2.2B-Instruct
> recommend us new models or inputs, we'll add 🫡
so far I figured out
> for fact-checks, you need a relatively bigger size (7B is ok!)
> Gemma 3 gets downgrade without pan and scan (especially for 📑)
> Qwen2.5VL-32B is very talkative, great for reasoning but not good for simple tasks 🗣️
compare different VLM outputs on a huge variety of inputs (from reasoning to OCR!) 🔥 visionLMsftw/comparevlms
> has support for multiple VLMs: google/gemma-3-27b-it, Qwen/Qwen2.5-VL-7B-Instruct, Qwen/Qwen2.5-VL-32B-Instruct, meta-llama/Llama-4-Maverick-17B-128E-Instruct, HuggingFaceTB/SmolVLM2-2.2B-Instruct
> recommend us new models or inputs, we'll add 🫡
so far I figured out
> for fact-checks, you need a relatively bigger size (7B is ok!)
> Gemma 3 gets downgrade without pan and scan (especially for 📑)
> Qwen2.5VL-32B is very talkative, great for reasoning but not good for simple tasks 🗣️

reacted to
prithivMLmods's
post with 🔥
2 months ago
Post
3942
The demo for the MonkeyOCR Recognition model, which adopts a Structure-Recognition-Relation (SRR) triplet paradigm & Nanonets-OCR-s a powerful, state-of-the-art image-to-markdown OCR model that goes far beyond traditional text extraction and other experimental document OCR models, is combined into a single space.
✦ Try the demo here : prithivMLmods/core-OCR
✦ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR
⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s
⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
⤷ Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct
To know more about it, visit the model card of the respective model. !!
✦ Try the demo here : prithivMLmods/core-OCR
✦ Try Nanonets-OCR-s demo here : prithivMLmods/Multimodal-OCR
⤷ MonkeyOCR Recognition : echo840/MonkeyOCR
⤷ docscopeOCR-7B-050425-exp : prithivMLmods/docscopeOCR-7B-050425-exp
⤷ coreOCR-7B-050325-preview : prithivMLmods/coreOCR-7B-050325-preview
⤷ Nanonets-OCR-s : nanonets/Nanonets-OCR-s
⤷ Multimodal Implementations : prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0
Also, include a sample OCR test using the VisionOCR-3B-061125 model and the Qwen2-VL-OCR-2B-Instruct model.
⤷ Blog : https://huggingface.co/blog/prithivMLmods/visionocr-3b-061125-vs-qwen2-vl-ocr-2b-instruct
To know more about it, visit the model card of the respective model. !!