AI & ML interests
interpretability
Who are we?
We are a group of hackers from Stanford's NLP group, and we are interested in LLM interpretability.
pyvene is where we started, which stands for pytorch model intervenetion.
Resources
Supervised dictionary learning models (SDLs) and datasets releases for Gemma 2 2B and 9B: AxBench Collection.
Benchmark interpretability methods at scale (AxBench) library: AxBench.
Representation finetuning (ReFT) library: pyreft.
PyTorch model intervention library: pyvene.
spaces
7
SDL-ReFT-cr1
Guide chatbot with specific topics
SDL-ReFT-r1
Guide conversations with specific topics
ReFT-Golden-Gate-Bridge
Converse with an AI assistant that mimics the Golden Gate Bridge
ReFT-Chat7B
Generate responses to chat messages using ReFT-Chat
ReFT-Emoji
Chat with an emoji-enhanced assistant
ReFT-Ethos
Converse with a helpful assistant in text form