JKsolution (JUNGJIN)

liked a dataset 1 day ago

FINAL-Bench/World-Model

Viewer • Updated 8 days ago • 100 • 1.71k • 26

liked a Space 1 day ago

PROMETHEUS v1.0 — World Model Interactive Demo

🔥

39

World-first embodied AI world model

reactedto SeaWolf-AI's post with 🚀🤝🔥 1 day ago

Post

4238

🔥 128 Blackwell GPUs — Thank You, Hugging Face

I've been awarded 128 NVIDIA Blackwell GPUs through NIPA (Korea's National IT Industry Promotion Agency). Sharing this here first — because Hugging Face is where it all started.

I design LLM architectures from scratch. HF was my lab — dissecting Transformers internals, analyzing thousands of checkpoints, iterating on Spaces with global feedback.

Our FINAL Bench reached #5 globally in HF dataset popularity, and this research is exactly what earned the GPU grant.
👉 FINAL-Bench/Leaderboard

These 128 Blackwells will scale AETHER-Net — our Proto-AGI architecture (Emergence Engine · Meta-Cognition · SLAI · Multi-Intelligence · Synergy & Critique) — validated at 0.8B with MoE expansion to 2.1B params. Next stop: 166B.

People I must thank:

@John6666 — Guardian of this ecosystem. Never misses a forum question, interested in every project, active 24/7. I've genuinely wondered if you're a machine. Remarkable.

@bartowski — Master of quantization. The hidden infrastructure of open-source LLM. Countless experiments possible thanks to you.

@SaylorTwift — You see what others miss. Insight that cuts to the essence. Deep respect.

My promise: AETHER-Net design docs, training recipes, checkpoints, and failure logs — all shared here openly.

🤗 Thank you, Hugging Face. Let's turn the next page together. 🚀

vidraft · VIDRAFT
#OpenScience #HuggingFace #ProtoAGI #AETHER #LLMArchitecture #Blackwell #NIPA

7 replies

·

liked a Space 13 days ago

Ginigen

💻

2

ginigen.ai

liked a Space 14 days ago

HWP-Agent

📄

2

Generate AI-written HWPX documents from a prompt

liked a Space 17 days ago

NH Agriculture Farm-life

🐨

15

Korea

liked a Space 20 days ago

SiteAgent - AI 웹 어시스턴트

🤖

1

어떤 웹 페이지에서든 동작하는 AI 어시스턴트.

liked a Space 22 days ago

SiteAgent - AI 웹 어시스턴트

🤖

22

어떤 웹 페이지에서든 동작하는 AI 어시스턴트.

upvoted an article 25 days ago

Article

🏟️ Smol AI WorldCup: A 5-Axis Benchmark That Reveals What Small Language Models Can Really Do

28 days ago

•

38

liked 2 datasets 25 days ago

ginigen-ai/smol-worldcup

Viewer • Updated 28 days ago • 125 • 6.65k • 40

FINAL-Bench/ALL-Bench-Leaderboard

Viewer • Updated 28 days ago • 90 • 2.56k • 23

reactedto SeaWolf-AI's post with 🔥 25 days ago

Post

11114

🏟️ Smol AI WorldCup: A 4B Model Just Beat 8B — Here's the Data

We evaluated 18 small language models from 12 makers on 125 questions across 7 languages. The results challenge the assumption that bigger is always better.

Community Article: https://huggingface.co/blog/FINAL-Bench/smol-worldcup
Live Leaderboard: ginigen-ai/smol-worldcup
Dataset: ginigen-ai/smol-worldcup

What we found:

→ Gemma-3n-E4B (4B, 2GB RAM) outscores Qwen3-8B (8B, 5.5GB). Doubling parameters gained only 0.4 points. RAM cost: 2.75x more.

→ GPT-OSS-20B fits in 1.5GB yet matches Champions-league dense models requiring 8.5GB. MoE architecture is the edge AI game-changer.

→ Thinking models hurt structured output. DeepSeek-R1-7B scores 8.7 points below same-size Qwen3-8B and runs 2.7x slower.

→ A 1.3B model fabricates confident fake content 80% of the time when prompted with nonexistent entities. Qwen3 family hits 100% trap detection across all sizes.

→ Qwen3-1.7B (1.2GB) outscores Mistral-7B, Llama-3.1-8B, and DeepSeek-R1-14B. Latest architecture at 1.7B beats older architecture at 14B.

What makes this benchmark different?

Most benchmarks ask "how smart?" — we measure five axes simultaneously: Size, Honesty, Intelligence, Fast, Thrift (SHIFT). Our ranking metric WCS = sqrt(SHIFT x PIR_norm) rewards models that are both high-quality AND efficient. Smart but massive? Low rank. Tiny but poor? Also low.

Top 5 by WCS:
1. GPT-OSS-20B — WCS 82.6 — 1.5GB — Raspberry Pi tier
2. Gemma-3n-E4B — WCS 81.8 — 2.0GB — Smartphone tier
3. Llama-4-Scout — WCS 79.3 — 240 tok/s — Fastest model
4. Qwen3-4B — WCS 76.6 — 2.8GB — Smartphone tier
5. Qwen3-1.7B — WCS 76.1 — 1.2GB — IoT tier

Built in collaboration with the FINAL Bench research team. Interoperable with ALL Bench Leaderboard for full small-to-large model comparison.

Dataset is open under Apache 2.0 (125 questions, 7 languages). We welcome new model submissions.

1 reply

·

liked a Space 27 days ago

Invisible Watermark Against Unauthorized AI Training — Text, Image & Video Protection

⚡

21

One embed. Four invisible layers. 34 attacks defeated.

upvoted an article 27 days ago

Article

Structural Problems in AI Benchmarking and the Case for a Unified Evaluation Framework

about 1 month ago

•

12

liked a Space 27 days ago

ALL Bench Leaderboard

🚀

67

ALL Bench Leaderboard

reactedto SeaWolf-AI's post with 🤝🤗👍 27 days ago

Post

11114

🏟️ Smol AI WorldCup: A 4B Model Just Beat 8B — Here's the Data

We evaluated 18 small language models from 12 makers on 125 questions across 7 languages. The results challenge the assumption that bigger is always better.

Community Article: https://huggingface.co/blog/FINAL-Bench/smol-worldcup
Live Leaderboard: ginigen-ai/smol-worldcup
Dataset: ginigen-ai/smol-worldcup

What we found:

→ Gemma-3n-E4B (4B, 2GB RAM) outscores Qwen3-8B (8B, 5.5GB). Doubling parameters gained only 0.4 points. RAM cost: 2.75x more.

→ GPT-OSS-20B fits in 1.5GB yet matches Champions-league dense models requiring 8.5GB. MoE architecture is the edge AI game-changer.

→ Thinking models hurt structured output. DeepSeek-R1-7B scores 8.7 points below same-size Qwen3-8B and runs 2.7x slower.

→ A 1.3B model fabricates confident fake content 80% of the time when prompted with nonexistent entities. Qwen3 family hits 100% trap detection across all sizes.

→ Qwen3-1.7B (1.2GB) outscores Mistral-7B, Llama-3.1-8B, and DeepSeek-R1-14B. Latest architecture at 1.7B beats older architecture at 14B.

What makes this benchmark different?

Most benchmarks ask "how smart?" — we measure five axes simultaneously: Size, Honesty, Intelligence, Fast, Thrift (SHIFT). Our ranking metric WCS = sqrt(SHIFT x PIR_norm) rewards models that are both high-quality AND efficient. Smart but massive? Low rank. Tiny but poor? Also low.

Top 5 by WCS:
1. GPT-OSS-20B — WCS 82.6 — 1.5GB — Raspberry Pi tier
2. Gemma-3n-E4B — WCS 81.8 — 2.0GB — Smartphone tier
3. Llama-4-Scout — WCS 79.3 — 240 tok/s — Fastest model
4. Qwen3-4B — WCS 76.6 — 2.8GB — Smartphone tier
5. Qwen3-1.7B — WCS 76.1 — 1.2GB — IoT tier

Built in collaboration with the FINAL Bench research team. Interoperable with ALL Bench Leaderboard for full small-to-large model comparison.

Dataset is open under Apache 2.0 (125 questions, 7 languages). We welcome new model submissions.

1 reply

·

JUNGJIN

AI & ML interests

Recent Activity