--- # Metadata for Hugging Face repo card library_name: transformers pipeline_tag: feature-extraction license: apache-2.0 tags: - autoencoder - pytorch - reconstruction - preprocessing - normalizing-flow - scaler --- ## Autoencoder for Hugging Face Transformers (Block-based) A flexible, production-grade Autoencoder implementation built to fit naturally into the Transformers ecosystem. It supports a new block-based architecture with ready-to-use templates for classic MLP, VAE/beta-VAE, Transformer, Recurrent, Convolutional, mixed hybrids, and learnable preprocessing. ### Key features - Block-based architecture: Linear, Attention, Recurrent (LSTM/GRU), Convolutional, Variational blocks - Class-based configuration presets in template.py for quick starts - Variational and beta-VAE variants (KL-controlled) - Learnable preprocessing and inverse transforms - Hugging Face-compatible config/model API and from_pretrained/save_pretrained ## Install and load from the Hub (code repo) ```python from huggingface_hub import snapshot_download import sys, torch repo_dir = snapshot_download( repo_id="amaye15/autoencoder", repo_type="model", allow_patterns=["*.py", "config.json", "*.safetensors"], ) sys.path.append(repo_dir) from modeling_autoencoder import AutoencoderForReconstruction model = AutoencoderForReconstruction.from_pretrained(repo_dir) x = torch.randn(8, 20) out = model(input_values=x) print("latent:", out.last_hidden_state.shape, "reconstructed:", out.reconstructed.shape) ``` ## Quickstart with class-based templates ```python from modeling_autoencoder import AutoencoderModel from template import ClassicAutoencoderConfig cfg = ClassicAutoencoderConfig(input_dim=784, latent_dim=64) model = AutoencoderModel(cfg) x = torch.randn(4, 784) out = model(x, return_dict=True) print(out.last_hidden_state.shape, out.reconstructed.shape) ``` ### Available presets (template.py) - ClassicAutoencoderConfig: Dense MLP AE - VariationalAutoencoderConfig: VAE with KL regularization - BetaVariationalAutoencoderConfig: beta-VAE (beta > 1) - TransformerAutoencoderConfig: Attention-based encoder for sequences - RecurrentAutoencoderConfig: LSTM/GRU encoder for sequences - ConvolutionalAutoencoderConfig: 1D Conv encoder for sequences - ConvAttentionAutoencoderConfig: Mixed Conv + Attention encoder - LinearRecurrentAutoencoderConfig: Linear down-projection + RNN - PreprocessedAutoencoderConfig: MLP AE with learnable preprocessing ## Block-based architecture The autoencoder uses a modular block system where you define encoder_blocks and decoder_blocks as lists of dictionaries. Each block dict specifies its type and parameters. ### Available block types #### LinearBlock Dense layer with optional normalization, activation, dropout, and residual connections. ```python { "type": "linear", "input_dim": 256, "output_dim": 128, "activation": "relu", # relu, gelu, tanh, sigmoid, etc. "normalization": "batch", # batch, layer, group, instance, none "dropout_rate": 0.1, "use_residual": False, # adds skip connection if input_dim == output_dim "residual_scale": 1.0 } ``` #### AttentionBlock Multi-head self-attention with feed-forward network. Works with 2D (B, D) or 3D (B, T, D) inputs. ```python { "type": "attention", "input_dim": 128, "num_heads": 8, "ffn_dim": 512, # if None, defaults to 4 * input_dim "dropout_rate": 0.1 } ``` #### RecurrentBlock LSTM, GRU, or vanilla RNN encoder. Outputs final hidden state or all timesteps. ```python { "type": "recurrent", "input_dim": 64, "hidden_size": 128, "num_layers": 2, "rnn_type": "lstm", # lstm, gru, rnn "bidirectional": True, "dropout_rate": 0.1, "output_dim": 128 # final output dimension } ``` #### ConvolutionalBlock 1D convolution for sequence data. Expects 3D input (B, T, D). ```python { "type": "conv1d", "input_dim": 64, # input channels "output_dim": 128, # output channels "kernel_size": 3, "padding": "same", # "same" or integer "activation": "relu", "normalization": "batch", "dropout_rate": 0.1 } ``` #### VariationalBlock Produces mu and logvar for VAE reparameterization. Used internally by the model when autoencoder_type="variational". ```python { "type": "variational", "input_dim": 128, "latent_dim": 64 } ``` ### Custom configuration examples #### Mixed architecture (Conv + Attention + Linear) ```python from configuration_autoencoder import AutoencoderConfig enc = [ # 1D convolution for local patterns {"type": "conv1d", "input_dim": 64, "output_dim": 128, "kernel_size": 3, "padding": "same", "activation": "relu"}, {"type": "conv1d", "input_dim": 128, "output_dim": 128, "kernel_size": 3, "padding": "same", "activation": "relu"}, # Self-attention for global dependencies {"type": "attention", "input_dim": 128, "num_heads": 8, "ffn_dim": 512, "dropout_rate": 0.1}, # Final linear projection {"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"} ] dec = [ {"type": "linear", "input_dim": 32, "output_dim": 64, "activation": "relu", "normalization": "batch"}, {"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "relu", "normalization": "batch"}, {"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "identity", "normalization": "none"} ] cfg = AutoencoderConfig( input_dim=64, latent_dim=32, autoencoder_type="classic", encoder_blocks=enc, decoder_blocks=dec ) ``` #### Hierarchical encoder (multiple scales) ```python enc = [ # Local features {"type": "linear", "input_dim": 784, "output_dim": 512, "activation": "relu", "normalization": "batch"}, {"type": "linear", "input_dim": 512, "output_dim": 256, "activation": "relu", "normalization": "batch"}, # Mid-level features with residual {"type": "linear", "input_dim": 256, "output_dim": 256, "activation": "relu", "normalization": "batch", "use_residual": True}, {"type": "linear", "input_dim": 256, "output_dim": 256, "activation": "relu", "normalization": "batch", "use_residual": True}, # High-level features {"type": "linear", "input_dim": 256, "output_dim": 128, "activation": "relu", "normalization": "batch"}, {"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"} ] ``` #### Sequence-to-sequence with recurrent encoder ```python enc = [ {"type": "recurrent", "input_dim": 100, "hidden_size": 128, "num_layers": 2, "rnn_type": "lstm", "bidirectional": True, "output_dim": 256}, {"type": "linear", "input_dim": 256, "output_dim": 128, "activation": "tanh", "normalization": "layer"} ] dec = [ {"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "tanh", "normalization": "layer"}, {"type": "linear", "input_dim": 128, "output_dim": 100, "activation": "identity", "normalization": "none"} ] ``` ### Input shape handling - **2D inputs (B, D)**: Work with Linear blocks directly. Attention/Recurrent/Conv blocks treat as (B, 1, D) - **3D inputs (B, T, D)**: Work with all block types. Linear blocks operate per-timestep - **Output shapes**: Decoder typically outputs same shape as input. For sequence models, final shape depends on decoder architecture ## Configuration (configuration_autoencoder.py) AutoencoderConfig is the core configuration class. Important fields: - input_dim: feature dimension (D) - latent_dim: latent size - encoder_blocks, decoder_blocks: block lists (see block types above) - activation, dropout_rate, use_batch_norm: defaults used by some presets - autoencoder_type: classic | variational | beta_vae | denoising | sparse | contractive | recurrent - Reconstruction losses: mse | bce | l1 | huber | smooth_l1 | kl_div | cosine | focal | dice | tversky | ssim | perceptual - Preprocessing: use_learnable_preprocessing, preprocessing_type, learn_inverse_preprocessing Example: ```python from configuration_autoencoder import AutoencoderConfig cfg = AutoencoderConfig( input_dim=128, latent_dim=32, autoencoder_type="variational", encoder_blocks=[{"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu"}], decoder_blocks=[{"type": "linear", "input_dim": 32, "output_dim": 128, "activation": "identity", "normalization": "none"}], ) ``` ## Models (modeling_autoencoder.py) Main classes: - AutoencoderModel: core module exposing forward that returns last_hidden_state (latent) and reconstructed - AutoencoderForReconstruction: HF-compatible model wrapper with from_pretrained/save_pretrained Forward usage: ```python from modeling_autoencoder import AutoencoderModel x = torch.randn(8, 20) out = model(x, return_dict=True) print(out.last_hidden_state.shape, out.reconstructed.shape) ``` ### Variational behavior If cfg.autoencoder_type == "variational" or "beta_vae": - The model uses an internal VariationalBlock to compute mu and logvar - Samples z during training; uses mu during eval - KL term available via model._mu/_logvar (exposed in hidden_states when requested) ```python out = model(x, return_dict=True, output_hidden_states=True) latent, mu, logvar = out.hidden_states ``` ## Preprocessing (preprocessing.py) - PreprocessingBlock wraps LearnablePreprocessor and can be placed before/after the core encoder/decoder - When enabled via config.use_learnable_preprocessing, the model constructs two blocks: pre (forward) and post (inverse) - The block tracks reg_loss, which is added to preprocessing_loss in the model output ```python from template import PreprocessedAutoencoderConfig cfg = PreprocessedAutoencoderConfig(input_dim=64, latent_dim=32, preprocessing_type="neural_scaler") model = AutoencoderModel(cfg) ``` ## Utilities (utils.py) Common helpers: - _get_activation(name) - _get_norm(name, num_groups=None) - _flatten_3d_to_2d(x), _maybe_restore_3d(x, ref) ## Training examples ### Basic MSE reconstruction ```python from modeling_autoencoder import AutoencoderModel from template import ClassicAutoencoderConfig cfg = ClassicAutoencoderConfig(input_dim=784, latent_dim=64) model = AutoencoderModel(cfg) opt = torch.optim.Adam(model.parameters(), lr=1e-3) for x in dataloader: # x: (B, 784) out = model(x, return_dict=True) loss = torch.nn.functional.mse_loss(out.reconstructed, x) loss.backward(); opt.step(); opt.zero_grad() ``` ### VAE with KL term ```python from template import VariationalAutoencoderConfig cfg = VariationalAutoencoderConfig(input_dim=784, latent_dim=32) model = AutoencoderModel(cfg) for x in dataloader: out = model(x, return_dict=True, output_hidden_states=True) recon = torch.nn.functional.mse_loss(out.reconstructed, x) _, mu, logvar = out.hidden_states kl = -0.5 * torch.mean(1 + logvar - mu.pow(2) - logvar.exp()) loss = recon + cfg.beta * kl loss.backward(); opt.step(); opt.zero_grad() ``` ### Sequence reconstruction (Conv + Attention) ```python from template import ConvAttentionAutoencoderConfig cfg = ConvAttentionAutoencoderConfig(input_dim=64, latent_dim=64) model = AutoencoderModel(cfg) x = torch.randn(8, 50, 64) # (B, T, D) out = model(x, return_dict=True) ``` ## End-to-end saving/loading ```python from modeling_autoencoder import AutoencoderForReconstruction model.save_pretrained("./my_ae") reloaded = AutoencoderForReconstruction.from_pretrained("./my_ae") ``` ## Troubleshooting - Check that block input_dim/output_dim align across adjacent blocks - For attention/recurrent/conv blocks, prefer 3D inputs (B, T, D). 2D inputs are coerced to (B, 1, D) - For variational/beta-VAE, ensure latent_dim is set; KL term available via hidden states - When preprocessing is enabled, preprocessing_loss is included in the output for logging/regularization ## Full AutoencoderConfig reference Below is a comprehensive reference for all fields in configuration_autoencoder.AutoencoderConfig. Some fields are primarily used by presets or advanced features but are documented here for completeness. - input_dim (int, default=784): Input feature dimension D. For sequences, D is per-timestep feature size. - hidden_dims (List[int], default=[512,256,128]): Legacy convenience list for simple MLPs. Prefer encoder_blocks. - encoder_blocks (List[dict] | None): Block list for encoder. See Block-based architecture for block schemas. - decoder_blocks (List[dict] | None): Block list for decoder. If omitted, model may derive a simple decoder from hidden_dims. - latent_dim (int, default=64): Latent space dimension. - activation (str, default="relu"): Default activation for Linear blocks when using legacy paths or presets. - dropout_rate (float, default=0.1): Default dropout used in presets and some layers. - use_batch_norm (bool, default=True): Default normalization flag used in presets ("batch" if True, else "none"). - tie_weights (bool, default=False): If True, share/tie encoder and decoder weights (feature not always active depending on architecture). - reconstruction_loss (str, default="mse"): Which loss to use in AutoencoderForReconstruction. One of: - "mse", "bce", "l1", "huber", "smooth_l1", "kl_div", "cosine", "focal", "dice", "tversky", "ssim", "perceptual". - autoencoder_type (str, default="classic"): Architecture variant. One of: - "classic", "variational", "beta_vae", "denoising", "sparse", "contractive", "recurrent". - beta (float, default=1.0): KL weight for VAE/beta-VAE. - temperature (float, default=1.0): Reserved for temperature-based operations. - noise_factor (float, default=0.1): Denoising strength used by Denoising variants. - rnn_type (str, default="lstm"): For recurrent variants. One of: "lstm", "gru", "rnn". - num_layers (int, default=2): Number of RNN layers for recurrent variants. - bidirectional (bool, default=True): Whether RNN is bidirectional in recurrent variants. - sequence_length (int | None, default=None): Optional fixed sequence length; if None, variable length is supported. - teacher_forcing_ratio (float, default=0.5): For recurrent decoders that use teacher forcing. - use_learnable_preprocessing (bool, default=False): Enable learnable preprocessing. - preprocessing_type (str, default="none"): One of: "none", "neural_scaler", "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson". - preprocessing_hidden_dim (int, default=64): Hidden size for preprocessing networks. - preprocessing_num_layers (int, default=2): Number of layers for preprocessing networks. - learn_inverse_preprocessing (bool, default=True): Whether to learn inverse transform for reconstruction. - flow_coupling_layers (int, default=4): Number of coupling layers for normalizing flows. Derived helpers and flags: - has_block_lists: True if either encoder_blocks or decoder_blocks is provided. - is_variational: True if autoencoder_type in {"variational", "beta_vae"}. - is_denoising, is_sparse, is_contractive, is_recurrent: Variant flags. - has_preprocessing: True if preprocessing enabled and type != "none". Validation notes: - activation must be one of the supported list in configuration_autoencoder.py - reconstruction_loss must be one of the supported list - Many numeric parameters are validated to be positive or within [0,1] ## Training with Hugging Face Trainer The AutoencoderForReconstruction model computes reconstruction loss internally using config.reconstruction_loss. For VAEs/beta-VAEs, it adds the KL term scaled by config.beta. You can plug it directly into transformers.Trainer. ```python from transformers import Trainer, TrainingArguments from modeling_autoencoder import AutoencoderForReconstruction from template import ClassicAutoencoderConfig import torch from torch.utils.data import Dataset # 1) Config and model cfg = ClassicAutoencoderConfig(input_dim=64, latent_dim=16) model = AutoencoderForReconstruction(cfg) # 2) Dummy dataset (replace with your own) class ToyAEDataset(Dataset): def __init__(self, n=1024, d=64): self.x = torch.randn(n, d) def __len__(self): return self.x.size(0) def __getitem__(self, idx): xi = self.x[idx] return {"input_values": xi, "labels": xi} train_ds = ToyAEDataset() # 3) TrainingArguments args = TrainingArguments( output_dir="./ae-trainer", per_device_train_batch_size=64, learning_rate=1e-3, num_train_epochs=3, logging_steps=50, save_steps=200, report_to=[], # disable wandb if not configured ) # 4) Trainer trainer = Trainer( model=model, args=args, train_dataset=train_ds, ) # 5) Train trainer.train() # 6) Use the model x = torch.randn(4, 64) out = model(input_values=x, return_dict=True) print(out.last_hidden_state.shape, out.reconstructed.shape) ``` Notes: - The dataset must yield dicts with "input_values" and optionally "labels"; if labels are missing, the model uses input as the target. - For sequence inputs, shape is (B, T, D). For simple vectors, (B, D). - Set cfg.reconstruction_loss to e.g. "bce" to switch the internal loss (the decoder head applies sigmoid when BCE is used). - For VAE/beta-VAE, use VariationalAutoencoderConfig/BetaVariationalAutoencoderConfig. ### Example using AutoencoderConfig directly Below shows how to define a configuration purely with block dicts using AutoencoderConfig, without the template classes. ```python from configuration_autoencoder import AutoencoderConfig from modeling_autoencoder import AutoencoderModel import torch # Encoder: Linear -> Attention -> Linear enc = [ {"type": "linear", "input_dim": 128, "output_dim": 128, "activation": "relu", "normalization": "batch", "dropout_rate": 0.1}, {"type": "attention", "input_dim": 128, "num_heads": 4, "ffn_dim": 512, "dropout_rate": 0.1}, {"type": "linear", "input_dim": 128, "output_dim": 64, "activation": "relu", "normalization": "batch"}, ] # Decoder: Linear -> Linear (final identity) dec = [ {"type": "linear", "input_dim": 32, "output_dim": 64, "activation": "relu", "normalization": "batch"}, {"type": "linear", "input_dim": 64, "output_dim": 128, "activation": "identity", "normalization": "none"}, ] cfg = AutoencoderConfig( input_dim=128, latent_dim=32, encoder_blocks=enc, decoder_blocks=dec, autoencoder_type="classic", ) model = AutoencoderModel(cfg) x = torch.randn(4, 128) out = model(x, return_dict=True) print(out.last_hidden_state.shape, out.reconstructed.shape) ``` For a variational model, set autoencoder_type="variational" and the model will internally use a VariationalBlock for mu/logvar and sampling. ## Learnable preprocessing Enable learnable preprocessing and its inverse with the PreprocessedAutoencoderConfig class or via flags. ```python from template import PreprocessedAutoencoderConfig cfg = PreprocessedAutoencoderConfig(input_dim=64, latent_dim=32, preprocessing_type="neural_scaler") ``` Supported preprocessing_type values include: "neural_scaler", "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson". ## Saving and loading ```python from modeling_autoencoder import AutoencoderForReconstruction # Save model.save_pretrained("./my_ae") # Load reloaded = AutoencoderForReconstruction.from_pretrained("./my_ae") ``` ## Reference Core modules: - configuration_autoencoder.AutoencoderConfig - modeling_autoencoder.AutoencoderModel, AutoencoderForReconstruction - blocks: BlockFactory, BlockSequence, Linear/Attention/Recurrent/Convolutional/Variational blocks - preprocessing: PreprocessingBlock (learnable preprocessing wrapper) - template: class-based presets listed above ## License Apache-2.0 (see LICENSE)