SAELens
English
sparse-autoencoder
SAE
interpretability
deception-detection
mechanistic-interpretability
neuronpedia
behavioral-sampling
phi
Instructions to use Solshine/deception-saes-phi-2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- SAELens
How to use Solshine/deception-saes-phi-2 with SAELens:
# pip install sae-lens from sae_lens import SAE sae, cfg_dict, sparsity = SAE.from_pretrained( release = "RELEASE_ID", # e.g., "gpt2-small-res-jb". See other options in https://github.com/jbloomAus/SAELens/blob/main/sae_lens/pretrained_saes.yaml sae_id = "SAE_ID", # e.g., "blocks.8.hook_resid_pre". Won't always be a hook point ) - Notebooks
- Google Colab
- Kaggle
| { | |
| "architecture": "topk", | |
| "d_in": 2560, | |
| "d_sae": 10240, | |
| "dtype": "float32", | |
| "device": "cpu", | |
| "model_name": "microsoft/phi-2", | |
| "hook_name": "model.layers.16", | |
| "hook_layer": 16, | |
| "hook_head_index": null, | |
| "activation_fn_str": "topk", | |
| "activation_fn_kwargs": {}, | |
| "apply_b_dec_to_input": false, | |
| "finetuning_scaling_factor": false, | |
| "sae_lens_training_version": "deception-behavioral-v1", | |
| "prepend_bos": false, | |
| "dataset_path": "Solshine/deception-behavioral-multimodel", | |
| "dataset_trust_remote_code": false, | |
| "context_size": null, | |
| "normalize_activations": "none", | |
| "training_condition": "mixed", | |
| "training_notes": "Deception behavioral SAE \u2014 same-prompt behavioral sampling. Model: microsoft/phi-2, Layer 16, topk. See https://github.com/SolshineCode/deception-nanochat-sae-research" | |
| } |