https://github.com/jordansauce/sandbagging-research-sprint/ https://wandb.ai/jordantensor/gemma-sandbagging
Jordan Taylor
JordanTensor
AI & ML interests
Mechanistic interpretability, mechanistic anomaly detection, model internals techniques and AI safety techniques generally.
Recent Activity
updated
a collection
3 days ago
Sandbagging research sprint 1
updated
a collection
3 days ago
Sandbagging research sprint 1
updated
a collection
3 days ago
Sandbagging research sprint 1
Organizations
Collections
1
models
53

JordanTensor/gemma-sandbagging-ppvvz1jq-step7168
Updated

JordanTensor/gemma-sandbagging-ppvvz1jq-step6144
Updated

JordanTensor/gemma-sandbagging-ppvvz1jq-step4096
Updated

JordanTensor/gemma-sandbagging-ppvvz1jq-step2048
Updated

JordanTensor/gemma-sandbagging-ppvvz1jq-step1536
Updated

JordanTensor/gemma-sandbagging-ppvvz1jq-step1024
Updated

JordanTensor/gemma-sandbagging-ppvvz1jq-step512
Updated

JordanTensor/gemma-sandbagging-0w4j7rba-step1536
Updated

JordanTensor/gemma-sandbagging-0w4j7rba-step1024
Updated

JordanTensor/gemma-sandbagging-0w4j7rba-step512
Updated