Meta-UAT
Collection
Weight space learning experiments (interpreting behavior through activation signatures)
•
16 items
•
Updated
This model was trained to classify which patterns a subject model was trained on, based on neuron activation signatures.
The model predicts which of the following 14 patterns the subject model was trained to classify as positive:
palindromesorted_ascendingsorted_descendingalternatingcontains_abcstarts_withends_withno_repeatshas_majorityincreasing_pairsdecreasing_pairsvowel_consonantfirst_last_matchmountain_pattern| Pattern | Precision | Recall | F1 Score |
|---|---|---|---|
| palindrome | 11.1% | 89.8% | 19.8% |
| sorted_ascending | 59.7% | 56.6% | 58.1% |
| sorted_descending | 15.8% | 66.2% | 25.5% |
| alternating | 19.8% | 72.4% | 31.1% |
| contains_abc | 30.8% | 57.6% | 40.1% |
| starts_with | 9.1% | 59.4% | 15.8% |
| ends_with | 10.3% | 73.8% | 18.1% |
| no_repeats | 17.8% | 32.1% | 22.9% |
| has_majority | 33.3% | 60.5% | 43.0% |
| increasing_pairs | 23.3% | 35.3% | 28.1% |
| decreasing_pairs | 19.4% | 60.9% | 29.5% |
| vowel_consonant | 9.8% | 76.9% | 17.4% |
| first_last_match | 15.3% | 96.6% | 26.5% |
| mountain_pattern | 15.3% | 31.7% | 20.6% |