Llama-3-8B SAEs (layer 25, Post-MLP Residual Stream)

Introduction

We train a Gated SAE on the post-MLP residual stream of the 25th layer of Llama-3-8b-instruct model. The width of SAE hidden dimensions is 65536 (x16).

The SAE is trained with 500M tokens from the OpenWebText corpus.

Feature visualizations are hosted at https://www.neuronpedia.org/llama3-8b-it. The wandb run is recorded here.

Load the Model

This repository contains the following SAEs:

blocks.25.hook_resid_post

Load these SAEs using SAELens as below:

from sae_lens import SAE

sae, cfg_dict, sparsity = SAE.from_pretrained("Juliushanhanhan/llama-3-8b-it-res", "<sae_id>")

Citation

@misc {jiatong_han_2024,
    author       = { {Jiatong Han} },
    title        = { llama-3-8b-it-res (Revision 53425c3) },
    year         = 2024,
    url          = { https://huggingface.co/Juliushanhanhan/llama-3-8b-it-res },
    doi          = { 10.57967/hf/2889 },
    publisher    = { Hugging Face }
}

Juliushanhanhan
/

llama-3-8b-it-res

Llama-3-8B SAEs (layer 25, Post-MLP Residual Stream)

Introduction

Load the Model

Citation

Dataset used to train Juliushanhanhan/llama-3-8b-it-res

Space using Juliushanhanhan/llama-3-8b-it-res 1