library_name: transformers
pipeline_tag: feature-extraction
license: apache-2.0
tags:
- autoencoder
- pytorch
- reconstruction
- preprocessing
- normalizing-flow
- scaler
Autoencoder Implementation for Hugging Face Transformers
A complete autoencoder implementation that integrates seamlessly with the Hugging Face Transformers ecosystem, providing all the standard functionality you expect from transformer models.
Install-and-Use from the Hub (code repo)
If you want to use the implementation directly from the Hub code repository (without a packaged pip install), you can download the repo and add it to sys.path
:
from huggingface_hub import snapshot_download
import sys, torch
# 1) Download the code+weights for your repo “as is”
repo_dir = snapshot_download(
repo_id="amaye15/autoencoder",
repo_type="model",
allow_patterns=["*.py", "config.json", "*.safetensors"], # note the * wildcards
)
# 2) Add to import path so plain imports work
sys.path.append(repo_dir)
# 3) Import your classes from the repo code
from configuration_autoencoder import AutoencoderConfig
from modeling_autoencoder import AutoencoderForReconstruction
# 4) Load the placeholder weights from the local folder (no internet, no code refresh)
model = AutoencoderForReconstruction.from_pretrained(repo_dir)
# 5) Quick smoke test
x = torch.randn(8, 20)
out = model(input_values=x)
print("latent:", out.last_hidden_state.shape, "reconstructed:", out.reconstructed.shape)
🚀 Features
- Full Hugging Face Integration: Compatible with
AutoModel
,AutoConfig
, andAutoTokenizer
patterns - Standard Training Workflows: Works with
Trainer
,TrainingArguments
, and all HF training utilities - Model Hub Compatible: Save and share models on Hugging Face Hub with
push_to_hub()
- Flexible Architecture: Configurable encoder-decoder architecture with various activation functions
- Multiple Loss Functions: Support for MSE, BCE, L1, Huber, Smooth L1, KL Divergence, Cosine, Focal, Dice, Tversky, SSIM, and Perceptual loss
- Multiple Autoencoder Types (7): Classic, Variational (VAE), Beta-VAE, Denoising, Sparse, Contractive, and Recurrent autoencoders
- Extended Activation Functions: 18+ activation functions including ReLU, GELU, Swish, Mish, ELU, and more
- Learnable Preprocessing: Neural Scaler, Normalizing Flow, MinMax Scaler (learnable), Robust Scaler (learnable), and Yeo-Johnson preprocessors (2D and 3D tensors)
- Extensible Design: Easy to extend for new autoencoder variants and custom loss functions
- Production Ready: Proper serialization, checkpointing, and inference support
🏗️ Architecture
The implementation consists of three main components:
1. AutoencoderConfig
Configuration class that inherits from PretrainedConfig
:
- Defines model architecture parameters
- Handles validation and serialization
- Enables
AutoConfig.from_pretrained()
functionality
2. AutoencoderModel
Base model class that inherits from PreTrainedModel
:
- Implements encoder-decoder architecture
- Provides latent space representation
- Returns structured outputs with
AutoencoderOutput
3. AutoencoderForReconstruction
Task-specific model for reconstruction:
- Adds reconstruction loss calculation
- Compatible with
Trainer
for easy training - Returns
AutoencoderForReconstructionOutput
with loss
🔧 Quick Start
Basic Usage
from configuration_autoencoder import AutoencoderConfig
from modeling_autoencoder import AutoencoderForReconstruction
import torch
# Create configuration
config = AutoencoderConfig(
input_dim=784, # Input dimensionality (e.g., 28x28 images flattened)
hidden_dims=[512, 256], # Encoder hidden layers
latent_dim=64, # Latent space dimension
activation="gelu", # Activation function (18+ options available)
reconstruction_loss="mse", # Loss function (12+ options available)
autoencoder_type="classic", # Autoencoder type (7 types available)
# Optional learnable preprocessing
use_learnable_preprocessing=True,
preprocessing_type="neural_scaler", # or "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson"
)
# Create model
model = AutoencoderForReconstruction(config)
# Forward pass
input_data = torch.randn(32, 784) # Batch of 32 samples
outputs = model(input_values=input_data)
print(f"Reconstruction loss: {outputs.loss}")
print(f"Latent shape: {outputs.last_hidden_state.shape}")
print(f"Reconstructed shape: {outputs.reconstructed.shape}")
Training with Hugging Face Trainer
from transformers import Trainer, TrainingArguments
from torch.utils.data import Dataset
class AutoencoderDataset(Dataset):
def __init__(self, data):
self.data = torch.FloatTensor(data)
def __len__(self):
return len(self.data)
def __getitem__(self, idx):
return {
"input_values": self.data[idx],
"labels": self.data[idx] # For autoencoder, input = target
}
# Prepare data
train_dataset = AutoencoderDataset(your_training_data)
val_dataset = AutoencoderDataset(your_validation_data)
# Training arguments
training_args = TrainingArguments(
output_dir="./autoencoder_output",
num_train_epochs=10,
per_device_train_batch_size=64,
per_device_eval_batch_size=64,
warmup_steps=500,
weight_decay=0.01,
logging_dir="./logs",
evaluation_strategy="steps",
eval_steps=500,
save_steps=1000,
load_best_model_at_end=True,
)
# Create trainer
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
)
# Train
trainer.train()
# Save model
model.save_pretrained("./my_autoencoder")
config.save_pretrained("./my_autoencoder")
Using AutoModel Framework
from register_autoencoder import register_autoencoder_models
from transformers import AutoConfig, AutoModel
# Register models with AutoModel framework
register_autoencoder_models()
# Now you can use standard HF patterns
config = AutoConfig.from_pretrained("./my_autoencoder")
model = AutoModel.from_pretrained("./my_autoencoder")
# Use the model
outputs = model(input_values=your_data)
⚙️ Configuration Options
The AutoencoderConfig
class supports extensive customization:
config = AutoencoderConfig(
input_dim=784, # Input dimension
hidden_dims=[512, 256, 128], # Encoder hidden layers
latent_dim=64, # Latent space dimension
activation="gelu", # Activation function (see full list below)
dropout_rate=0.1, # Dropout rate (0.0 to 1.0)
use_batch_norm=True, # Use batch normalization
tie_weights=False, # Tie encoder/decoder weights
reconstruction_loss="mse", # Loss function (see full list below)
autoencoder_type="variational", # Autoencoder type (see types below)
beta=0.5, # Beta parameter for β-VAE
temperature=1.0, # Temperature for Gumbel softmax
noise_factor=0.1, # Noise factor for denoising AE
# Recurrent autoencoder parameters
rnn_type="lstm", # RNN type: "lstm", "gru", "rnn"
num_layers=2, # Number of RNN layers
bidirectional=True, # Bidirectional encoding
sequence_length=None, # Fixed sequence length (None for variable)
teacher_forcing_ratio=0.5, # Teacher forcing ratio during training
# Learnable preprocessing parameters
use_learnable_preprocessing=False, # Enable learnable preprocessing
preprocessing_type="none", # "none", "neural_scaler", "normalizing_flow"
preprocessing_hidden_dim=64, # Hidden dimension for preprocessing networks
preprocessing_num_layers=2, # Number of layers in preprocessing networks
learn_inverse_preprocessing=True, # Learn inverse transformation
flow_coupling_layers=4, # Number of coupling layers for flows
)
🎛️ Available Activation Functions
Standard Activations:
relu
,leaky_relu
,relu6
,elu
,prelu
tanh
,sigmoid
,hardsigmoid
,hardtanh
gelu
,swish
,silu
,hardswish
mish
,softplus
,softsign
,tanhshrink
,threshold
📊 Available Loss Functions
Regression Losses:
mse
- Mean Squared Errorl1
- L1/MAE Losshuber
- Huber Losssmooth_l1
- Smooth L1 Loss
Classification/Probability Losses:
bce
- Binary Cross Entropykl_div
- KL Divergencefocal
- Focal Loss
Similarity Losses:
cosine
- Cosine Similarity Lossssim
- Structural Similarity Lossperceptual
- Perceptual Loss
Segmentation Losses:
dice
- Dice Losstversky
- Tversky Loss
🏗️ Available Autoencoder Types
Classic Autoencoder (classic
)
- Standard encoder-decoder architecture
- Direct reconstruction loss minimization
Variational Autoencoder (variational
)
- Probabilistic latent space with mean and variance
- KL divergence regularization
- Reparameterization trick for sampling
Beta-VAE (beta_vae
)
- Variational autoencoder with adjustable β parameter
- Better disentanglement of latent factors
Denoising Autoencoder (denoising
)
- Adds noise to input during training
- Learns robust representations
- Configurable noise factor
Sparse Autoencoder (sparse
)
- Encourages sparse latent representations
- L1 regularization on latent activations
- Useful for feature selection
Contractive Autoencoder (contractive
)
- Penalizes large gradients of latent w.r.t. input
- Learns smooth manifold representations
- Robust to small input perturbations
Recurrent Autoencoder (recurrent
)
- LSTM/GRU/RNN encoder-decoder architecture
- Bidirectional encoding for better sequence representations
- Variable length sequence support with padding
- Teacher forcing during training for stable learning
- Sequence-to-sequence reconstruction
## 📊 Model Outputs
### AutoencoderOutput
The base model `AutoencoderModel` returns the following output:
@dataclass
class AutoencoderOutput(ModelOutput):
last_hidden_state: torch.FloatTensor = None # Latent representation
reconstructed: torch.FloatTensor = None # Reconstructed input
hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states
attentions: Tuple[torch.FloatTensor] = None # Not used
AutoencoderForReconstructionOutput
@dataclass
class AutoencoderForReconstructionOutput(ModelOutput):
loss: torch.FloatTensor = None # Reconstruction loss
reconstructed: torch.FloatTensor = None # Reconstructed input
last_hidden_state: torch.FloatTensor = None # Latent representation
hidden_states: Tuple[torch.FloatTensor] = None # Intermediate states
🔬 Advanced Usage
Custom Loss Functions
You can easily extend the model with custom loss functions:
class CustomAutoencoder(AutoencoderForReconstruction):
def _compute_reconstruction_loss(self, reconstructed, target):
# Custom loss implementation
return your_custom_loss(reconstructed, target)
Recurrent Autoencoder for Sequences
Perfect for time series, text, and sequential data:
config = AutoencoderConfig(
input_dim=50, # Feature dimension per timestep
latent_dim=32, # Compressed representation size
autoencoder_type="recurrent",
rnn_type="lstm", # or "gru", "rnn"
num_layers=2, # Number of RNN layers
bidirectional=True, # Bidirectional encoding
teacher_forcing_ratio=0.7, # Teacher forcing during training
sequence_length=None # Variable length sequences
)
# Usage with sequence data
model = AutoencoderForReconstruction(config)
sequence_data = torch.randn(batch_size, seq_len, input_dim)
outputs = model(input_values=sequence_data)
Learnable Preprocessing
Deep learning-based data normalization that adapts to your data:
# Neural Scaler - Learnable alternative to StandardScaler
config = AutoencoderConfig(
input_dim=20,
latent_dim=10,
use_learnable_preprocessing=True,
preprocessing_type="neural_scaler",
preprocessing_hidden_dim=64
)
# Normalizing Flow - Invertible transformations
config = AutoencoderConfig(
input_dim=20,
latent_dim=10,
use_learnable_preprocessing=True,
preprocessing_type="normalizing_flow",
flow_coupling_layers=4
)
# Works with all autoencoder types and sequence data
model = AutoencoderForReconstruction(config)
outputs = model(input_values=data)
print(f"Preprocessing loss: {outputs.preprocessing_loss}")
# Learnable MinMax Scaler - scales to [0, 1] with learnable bounds
config = AutoencoderConfig(
input_dim=20,
latent_dim=10,
use_learnable_preprocessing=True,
preprocessing_type="minmax_scaler",
)
# Learnable Robust Scaler - robust to outliers using median/IQR
config = AutoencoderConfig(
input_dim=20,
latent_dim=10,
use_learnable_preprocessing=True,
preprocessing_type="robust_scaler",
)
# Learnable Yeo-Johnson - power transform for skewed distributions
config = AutoencoderConfig(
input_dim=20,
latent_dim=10,
use_learnable_preprocessing=True,
preprocessing_type="yeo_johnson",
)
Variational Autoencoder Extension
The configuration supports variational autoencoders:
config = AutoencoderConfig(
autoencoder_type="variational",
beta=0.5, # β-VAE parameter
# ... other parameters
)
Integration with Datasets Library
from datasets import Dataset
# Convert your data to HF Dataset
dataset = Dataset.from_dict({
"input_values": your_data_list
})
# Use with Trainer
trainer = Trainer(
model=model,
train_dataset=dataset,
# ... other arguments
)
📁 Project Structure
autoencoder/
├── __init__.py # Package initialization
├── configuration_autoencoder.py # Configuration class
├── modeling_autoencoder.py # Model implementations
├── register_autoencoder.py # AutoModel registration
├── pyproject.toml # Project metadata and dependencies
└── README.md # This file
🤝 Contributing
This implementation follows Hugging Face conventions and can be easily extended:
- Adding new architectures: Extend
AutoencoderModel
or create new model classes - Custom configurations: Add parameters to
AutoencoderConfig
- Task-specific heads: Create new classes like
AutoencoderForReconstruction
- Integration: Register new models with the AutoModel framework
📚 References
🎯 Use Cases
This autoencoder implementation is perfect for:
- Dimensionality Reduction: Compress high-dimensional data to lower dimensions
- Anomaly Detection: Identify outliers based on reconstruction error
- Data Denoising: Remove noise from corrupted data
- Feature Learning: Learn meaningful representations for downstream tasks
- Data Generation: Generate new samples similar to training data
- Pretraining: Initialize encoders for other tasks
🔍 Model Comparison
Feature | Standard PyTorch | This Implementation |
---|---|---|
HF Integration | ❌ | ✅ |
AutoModel Support | ❌ | ✅ |
Trainer Compatible | ❌ | ✅ |
Hub Integration | ❌ | ✅ |
Config Management | Manual | ✅ Automatic |
Serialization | Manual | ✅ Built-in |
Checkpointing | Manual | ✅ Built-in |
🚀 Performance Tips
- Batch Size: Use larger batch sizes for better GPU utilization
- Learning Rate: Start with 1e-3 and adjust based on convergence
- Architecture: Gradually decrease hidden dimensions for better compression
- Regularization: Use dropout and batch normalization for better generalization
- Loss Function: Choose appropriate loss based on your data type
📄 License
This implementation is provided as an example and follows the same license terms as Hugging Face Transformers.