53 45 397

Alara Dirik

adirik

alaradirik

AI & ML interests

None yet

Recent Activity

upvoted an article 6 days ago

Introduction to 3D Gaussian Splatting

liked a dataset 7 days ago

gvecchio/MatSynth

liked a Space 8 days ago

Xenova/whisper-webgpu

View all activity

Organizations

upvoted an article 6 days ago

Article

Introduction to 3D Gaussian Splatting

Sep 18, 2023

•

122

upvoted an article 21 days ago

Article

We’re open-sourcing our text-to-image model and the process behind it

Nov 12

•

upvoted a collection 24 days ago

CoVT: Chain-of-Visual-Thought

Collection

Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated 29 days ago • 6

upvoted a paper about 1 month ago

Φeat: Physically-Grounded Feature Representation

Paper • 2511.11270 • Published Nov 14 • 10

upvoted an article 4 months ago

Article

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

Feb 7

•

260

upvoted 4 articles 5 months ago

Article

FineVideo: behind the scenes

Sep 23, 2024

•

Article

CinePile 2.0 - making stronger datasets with adversarial refinement

Oct 23, 2024

•

Article

TimeScope: How Long Can Your Video Large Multimodal Model Go?

Jul 23

•

Article

PaliGemma – Google's Cutting-Edge Open Vision Language Model

May 14, 2024

•

278

upvoted a collection 6 months ago

V-JEPA 2

Collection

A frontier video understanding model developed by FAIR, Meta, which extends the pretraining objectives of https://ai.meta.com/blog/v-jepa-yann • 8 items • Updated Jun 13 • 174

upvoted 3 collections 7 months ago

upvoted a paper 7 months ago

UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation

Paper • 2506.03147 • Published Jun 3 • 58

upvoted an article 7 months ago

Article

Vision Language Models (Better, faster, stronger)

May 12

•

572

upvoted a collection 8 months ago

D-FINE

Collection

State-of-the-art real-time object detection model with Apache 2.0 licence • 15 items • Updated May 5 • 56

upvoted 2 articles 10 months ago

Article

SigLIP 2: A better multilingual vision language encoder

Feb 21

•

194

Article

FastRTC: The Real-Time Communication Library for Python

Feb 25

•

172

upvoted 2 articles 11 months ago

Article

Build awesome datasets for video generation

Feb 12

•

Article

Open-source DeepResearch – Freeing our search agents

Feb 4

•

1.31k

Alara Dirik

AI & ML interests

Recent Activity

Organizations

adirik's activity

Introduction to 3D Gaussian Splatting

We’re open-sourcing our text-to-image model and the process behind it

DeepSeek-R1 Dissection: Understanding PPO & GRPO Without Any Prior Reinforcement Learning Knowledge

FineVideo: behind the scenes

CinePile 2.0 - making stronger datasets with adversarial refinement

TimeScope: How Long Can Your Video Large Multimodal Model Go?

PaliGemma – Google's Cutting-Edge Open Vision Language Model

Vision Language Models (Better, faster, stronger)

SigLIP 2: A better multilingual vision language encoder

FastRTC: The Real-Time Communication Library for Python

Build awesome datasets for video generation

Open-source DeepResearch – Freeing our search agents

🎉 Free Image Generator Now Available!