3 3 23

Ayaan Sharif

Ayaan-Sharif

https://shariif.tech

AI & ML interests

NLP, LLM, TEXT, Languages

Recent Activity

liked a model 6 days ago

ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4

new activity 12 days ago

sanchit-gandhi/whisper-jax:The whisper jax demo is not working. Error messages

liked a model about 1 month ago

MiniMaxAI/MiniMax-VL-01

View all activity

Organizations

Ayaan-Sharif's activity

liked a model 6 days ago

ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q4

Reinforcement Learning • Updated about 17 hours ago • 34k • 190

New activity in sanchit-gandhi/whisper-jax 12 days ago

The whisper jax demo is not working. Error messages

#18 opened almost 2 years ago by

ray608

liked a model about 1 month ago

MiniMaxAI/MiniMax-VL-01

Image-Text-to-Text • Updated 1 day ago • 637 • 241

liked a dataset about 1 month ago

DAMO-NLP-SG/multimodal_textbook

Updated Jan 11 • 6.33k • 132

liked 2 models about 2 months ago

cognitivecomputations/dolphin-2.9-llama3-8b

Text Generation • Updated May 20, 2024 • 2.15k • 439

cognitivecomputations/Dolphin3.0-Llama3.2-1B

Updated Jan 6 • 1.23k • 21

replied to sanchit-gandhi's post about 2 months ago

what if we segment the audio first and then transcribe tho its some extra compute to throw in but imo it would resul tin better result !

liked 3 Spaces about 2 months ago

Added improvements, 1107+ languages supported

liked a model about 2 months ago

huggyllama/llama-7b

Text Generation • Updated Jul 2, 2024 • 184k • 318

commented a paper 2 months ago

DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding

Paper • 2412.10302 • Published Dec 13, 2024 • 17 •

liked a model 2 months ago

deepseek-ai/DeepSeek-V3-Base

Updated about 2 hours ago • 383k • 1.57k

upvoted a collection 2 months ago

IndicConformer

Collection

A collection of ASR models for 22 scheduled languages of India • 22 items • Updated Oct 15, 2024 • 7

liked 2 Spaces 2 months ago

549

QVQ 72B Preview

🌍

Upload images and ask questions to get answers

100

Llmlingua 2

💻

Compress lengthy prompts into shorter versions while preserving key information

liked 2 models 2 months ago

THUDM/cogvlm2-llama3-caption

Video-Text-to-Text • Updated Jan 22 • 8.12k • 86

Neurazum/Xbai-Epilepsy-1.0

Video-Text-to-Text • Updated Nov 11, 2024 • 2

reacted to vladbogo's post with 👍 2 months ago

Post

Panda-70M is a new large-scale video dataset comprising 70 million high-quality video clips, each paired with textual captions, designed to be used as pre-training for video understanding tasks.

Key Points:
* Automatic Caption Generation: Utilizes an automatic pipeline with multiple cross-modality teacher models to generate captions for video clips.
* Fine-tuned Caption Selection: Employs a fine-tuned retrieval model to select the most appropriate caption from multiple candidates for each video clip.
* Improved Performance: Pre-training on Panda-70M shows significant performance gains in video captioning, text-video retrieval, and text-driven video generation.

Paper: Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers (2402.19479)
Project page: https://snap-research.github.io/Panda-70M/
Code: https://github.com/snap-research/Panda-70M

Congrats to the authors @tschen , @aliaksandr-siarohin et al. for their work!

1 reply

liked a Space 3 months ago

138

VideoLLaMA2

🎥

Media understanding