Zeming Wei's picture

1 2

Zeming Wei

ZemingWei

·

https://weizeming.github.io

AI & ML interests

Trustworthy AI

Recent Activity

authored a paper about 17 hours ago

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

commented on a paper 2 days ago

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

authored a paper over 1 year ago

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

View all activity

Organizations

None yet

authored a paper about 17 hours ago

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

Paper • 2509.03888 • Published 3 days ago • 1

commented a paper 2 days ago

False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize

Paper • 2509.03888 • Published 3 days ago • 1 •

authored a paper over 1 year ago

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Paper • 2310.06387 • Published Oct 10, 2023

liked a dataset over 1 year ago

lmsys/toxic-chat

Viewer • Updated May 14, 2024 • 20.3k • 7.49k • 165

liked a model almost 3 years ago

CompVis/stable-diffusion-v1-4

Text-to-Image • Updated Aug 23, 2023 • 660k • 6.91k