amaye15
/

autoencoder

@@ -1,3 +1,17 @@
 # Autoencoder Implementation for Hugging Face Transformers
 A complete autoencoder implementation that integrates seamlessly with the Hugging Face Transformers ecosystem, providing all the standard functionality you expect from transformer models.
@@ -11,28 +25,13 @@ A complete autoencoder implementation that integrates seamlessly with the Huggin
 - **Multiple Loss Functions**: Support for MSE, BCE, L1, Huber, Smooth L1, KL Divergence, Cosine, Focal, Dice, Tversky, SSIM, and Perceptual loss
 - **Multiple Autoencoder Types (7)**: Classic, Variational (VAE), Beta-VAE, Denoising, Sparse, Contractive, and Recurrent autoencoders
 - **Extended Activation Functions**: 18+ activation functions including ReLU, GELU, Swish, Mish, ELU, and more
-- **Learnable Preprocessing**: Neural Scaler and Normalizing Flow preprocessors (2D and 3D tensors)
 - **Extensible Design**: Easy to extend for new autoencoder variants and custom loss functions
 - **Production Ready**: Proper serialization, checkpointing, and inference support
-## 📦 Installation
-```bash
-uv sync  # or: pip install -e .
-```
-Dependencies (see pyproject.toml):
-- `torch>=2.8.0`
-- `transformers>=4.55.2`
-- `numpy>=2.3.2`
-- `scikit-learn>=1.7.1`
-- `datasets>=4.0.0`
-- `accelerate>=1.10.0`
 ## 🏗️ Architecture
-Note: This repository has been trimmed to essentials for easy reuse and distribution. Example scripts and tests were removed by request.
 The implementation consists of three main components:
 ### 1. AutoencoderConfig
@@ -72,7 +71,7 @@ config = AutoencoderConfig(
     autoencoder_type="classic", # Autoencoder type (7 types available)
     # Optional learnable preprocessing
     use_learnable_preprocessing=True,
-    preprocessing_type="neural_scaler",  # or "normalizing_flow"
 )
 # Create model
@@ -96,10 +95,10 @@ from torch.utils.data import Dataset
 class AutoencoderDataset(Dataset):
     def __init__(self, data):
         self.data = torch.FloatTensor(data)
     def __len__(self):
         return len(self.data)
     def __getitem__(self, idx):
         return {
             "input_values": self.data[idx],
@@ -263,7 +262,11 @@ config = AutoencoderConfig(
 ## 📊 Model Outputs
 ### AutoencoderOutput
 ```python
 @dataclass
 class AutoencoderOutput(ModelOutput):
     last_hidden_state: torch.FloatTensor = None    # Latent representation
@@ -346,6 +349,33 @@ outputs = model(input_values=data)
 print(f"Preprocessing loss: {outputs.preprocessing_loss}")
 ```
 ### Variational Autoencoder Extension
 The configuration supports variational autoencoders:
@@ -376,10 +406,6 @@ trainer = Trainer(
 )
 ```
-## 🧪 Testing
-This repository has been trimmed to essential files. Example scripts and test files were removed by request. You can create your own quick checks using the Quick Start snippet above.
 ## 📁 Project Structure
 ```

+---
+# Metadata for Hugging Face repo card
+library_name: transformers
+pipeline_tag: feature-extraction
+license: apache-2.0
+tags:
+  - autoencoder
+  - pytorch
+  - reconstruction
+  - preprocessing
+  - normalizing-flow
+  - scaler
+---
 # Autoencoder Implementation for Hugging Face Transformers
 A complete autoencoder implementation that integrates seamlessly with the Hugging Face Transformers ecosystem, providing all the standard functionality you expect from transformer models.
 - **Multiple Loss Functions**: Support for MSE, BCE, L1, Huber, Smooth L1, KL Divergence, Cosine, Focal, Dice, Tversky, SSIM, and Perceptual loss
 - **Multiple Autoencoder Types (7)**: Classic, Variational (VAE), Beta-VAE, Denoising, Sparse, Contractive, and Recurrent autoencoders
 - **Extended Activation Functions**: 18+ activation functions including ReLU, GELU, Swish, Mish, ELU, and more
+- **Learnable Preprocessing**: Neural Scaler, Normalizing Flow, MinMax Scaler (learnable), Robust Scaler (learnable), and Yeo-Johnson preprocessors (2D and 3D tensors)
 - **Extensible Design**: Easy to extend for new autoencoder variants and custom loss functions
 - **Production Ready**: Proper serialization, checkpointing, and inference support
 ## 🏗️ Architecture
 The implementation consists of three main components:
 ### 1. AutoencoderConfig
     autoencoder_type="classic", # Autoencoder type (7 types available)
     # Optional learnable preprocessing
     use_learnable_preprocessing=True,
+    preprocessing_type="neural_scaler",  # or "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson"
 )
 # Create model
 class AutoencoderDataset(Dataset):
     def __init__(self, data):
         self.data = torch.FloatTensor(data)
     def __len__(self):
         return len(self.data)
     def __getitem__(self, idx):
         return {
             "input_values": self.data[idx],
 ## 📊 Model Outputs
 ### AutoencoderOutput
+The base model `AutoencoderModel` returns the following output:
+```
 ```python
 @dataclass
 class AutoencoderOutput(ModelOutput):
     last_hidden_state: torch.FloatTensor = None    # Latent representation
 print(f"Preprocessing loss: {outputs.preprocessing_loss}")
 ```
+```python
+# Learnable MinMax Scaler - scales to [0, 1] with learnable bounds
+config = AutoencoderConfig(
+    input_dim=20,
+    latent_dim=10,
+    use_learnable_preprocessing=True,
+    preprocessing_type="minmax_scaler",
+)
+# Learnable Robust Scaler - robust to outliers using median/IQR
+config = AutoencoderConfig(
+    input_dim=20,
+    latent_dim=10,
+    use_learnable_preprocessing=True,
+    preprocessing_type="robust_scaler",
+)
+# Learnable Yeo-Johnson - power transform for skewed distributions
+config = AutoencoderConfig(
+    input_dim=20,
+    latent_dim=10,
+    use_learnable_preprocessing=True,
+    preprocessing_type="yeo_johnson",
+)
+```
 ### Variational Autoencoder Extension
 The configuration supports variational autoencoders:
 )
 ```
 ## 📁 Project Structure
 ```

configuration_autoencoder.py CHANGED Viewed

@@ -43,7 +43,7 @@ class AutoencoderConfig(PretrainedConfig):
             Defaults to 0.5.
         use_learnable_preprocessing (bool, optional): Whether to use learnable preprocessing. Defaults to False.
         preprocessing_type (str, optional): Type of learnable preprocessing. Options: "none", "neural_scaler",
-            "normalizing_flow". Defaults to "none".
         preprocessing_hidden_dim (int, optional): Hidden dimension for preprocessing networks. Defaults to 64.
         preprocessing_num_layers (int, optional): Number of layers in preprocessing networks. Defaults to 2.
         learn_inverse_preprocessing (bool, optional): Whether to learn inverse preprocessing for reconstruction.
@@ -147,7 +147,14 @@ class AutoencoderConfig(PretrainedConfig):
             raise ValueError(f"`sequence_length` must be positive when specified, got {sequence_length}.")
         # Preprocessing validation
-        valid_preprocessing = ["none", "neural_scaler", "normalizing_flow"]
         if preprocessing_type not in valid_preprocessing:
             raise ValueError(
                 f"`preprocessing_type` must be one of {valid_preprocessing}, got {preprocessing_type}."
@@ -244,7 +251,22 @@ class AutoencoderConfig(PretrainedConfig):
     def is_normalizing_flow(self) -> bool:
         """Check if using normalizing flow preprocessing."""
         return self.preprocessing_type == "normalizing_flow"
     def to_dict(self):
         """
         Serializes this instance to a Python dictionary.

             Defaults to 0.5.
         use_learnable_preprocessing (bool, optional): Whether to use learnable preprocessing. Defaults to False.
         preprocessing_type (str, optional): Type of learnable preprocessing. Options: "none", "neural_scaler",
+            "normalizing_flow", "minmax_scaler", "robust_scaler", "yeo_johnson". Defaults to "none".
         preprocessing_hidden_dim (int, optional): Hidden dimension for preprocessing networks. Defaults to 64.
         preprocessing_num_layers (int, optional): Number of layers in preprocessing networks. Defaults to 2.
         learn_inverse_preprocessing (bool, optional): Whether to learn inverse preprocessing for reconstruction.
             raise ValueError(f"`sequence_length` must be positive when specified, got {sequence_length}.")
         # Preprocessing validation
+        valid_preprocessing = [
+            "none",
+            "neural_scaler",
+            "normalizing_flow",
+            "minmax_scaler",
+            "robust_scaler",
+            "yeo_johnson",
+        ]
         if preprocessing_type not in valid_preprocessing:
             raise ValueError(
                 f"`preprocessing_type` must be one of {valid_preprocessing}, got {preprocessing_type}."
     def is_normalizing_flow(self) -> bool:
         """Check if using normalizing flow preprocessing."""
         return self.preprocessing_type == "normalizing_flow"
+    @property
+    def is_minmax_scaler(self) -> bool:
+        """Check if using learnable MinMax scaler preprocessing."""
+        return self.preprocessing_type == "minmax_scaler"
+    @property
+    def is_robust_scaler(self) -> bool:
+        """Check if using learnable Robust scaler preprocessing."""
+        return self.preprocessing_type == "robust_scaler"
+    @property
+    def is_yeo_johnson(self) -> bool:
+        """Check if using learnable Yeo-Johnson power transform preprocessing."""
+        return self.preprocessing_type == "yeo_johnson"
     def to_dict(self):
         """
         Serializes this instance to a Python dictionary.

modeling_autoencoder.py CHANGED Viewed

@@ -143,6 +143,338 @@ class NeuralScaler(nn.Module):
         return x, torch.tensor(0.0, device=x.device)
 class CouplingLayer(nn.Module):
     """Coupling layer for normalizing flows."""
@@ -306,6 +638,12 @@ class LearnablePreprocessor(nn.Module):
             self.preprocessor = NeuralScaler(config)
         elif config.is_normalizing_flow:
             self.preprocessor = NormalizingFlowPreprocessor(config)
         else:
             raise ValueError(f"Unknown preprocessing type: {config.preprocessing_type}")
@@ -399,7 +737,7 @@ class AutoencoderEncoder(nn.Module):
         else:
             # Standard encoder output
             self.fc_out = nn.Linear(input_dim, config.latent_dim)
     def _get_activation(self, activation: str) -> nn.Module:
         """Get activation function by name."""
         activations = {
@@ -423,7 +761,7 @@ class AutoencoderEncoder(nn.Module):
             "threshold": nn.Threshold(threshold=0.1, value=0),
         }
         return activations[activation]
     def forward(self, x: torch.Tensor) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]:
         """Forward pass through encoder."""
         # Add noise for denoising autoencoders
@@ -461,37 +799,37 @@ class AutoencoderEncoder(nn.Module):
 class AutoencoderDecoder(nn.Module):
     """Decoder part of the autoencoder."""
     def __init__(self, config: AutoencoderConfig):
         super().__init__()
         self.config = config
         # Build decoder layers (reverse of encoder)
         layers = []
         input_dim = config.latent_dim
         decoder_dims = config.decoder_dims + [config.input_dim]
         for i, hidden_dim in enumerate(decoder_dims):
             layers.append(nn.Linear(input_dim, hidden_dim))
             # Don't add batch norm, activation, or dropout to the final layer
             if i < len(decoder_dims) - 1:
                 if config.use_batch_norm:
                     layers.append(nn.BatchNorm1d(hidden_dim))
                 layers.append(self._get_activation(config.activation))
                 if config.dropout_rate > 0:
                     layers.append(nn.Dropout(config.dropout_rate))
             else:
                 # Final layer - add appropriate activation based on reconstruction loss
                 if config.reconstruction_loss == "bce":
                     layers.append(nn.Sigmoid())
             input_dim = hidden_dim
         self.decoder = nn.Sequential(*layers)
     def _get_activation(self, activation: str) -> nn.Module:
         """Get activation function by name."""
         activations = {
@@ -515,7 +853,7 @@ class AutoencoderDecoder(nn.Module):
             "threshold": nn.Threshold(threshold=0.1, value=0),
         }
         return activations[activation]
     def forward(self, x: torch.Tensor) -> torch.Tensor:
         """Forward pass through decoder."""
         return self.decoder(x)
@@ -753,19 +1091,19 @@ class RecurrentDecoder(nn.Module):
 class AutoencoderModel(PreTrainedModel):
     """
     The bare Autoencoder Model transformer outputting raw hidden-states without any specific head on top.
     This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the
     library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
     etc.)
     This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the
     PyTorch documentation for all matter related to general usage and behavior.
     """
     config_class = AutoencoderConfig
     base_model_prefix = "autoencoder"
     supports_gradient_checkpointing = False
     def __init__(self, config: AutoencoderConfig):
         super().__init__(config)
         self.config = config
@@ -787,23 +1125,23 @@ class AutoencoderModel(PreTrainedModel):
         # Tie weights if specified
         if config.tie_weights:
             self._tie_weights()
         # Initialize weights
         self.post_init()
     def _tie_weights(self):
         """Tie encoder and decoder weights (transpose relationship)."""
         # This is a simplified weight tying - in practice, you might want more sophisticated tying
         pass
     def get_input_embeddings(self):
         """Get input embeddings (not applicable for basic autoencoder)."""
         return None
     def set_input_embeddings(self, value):
         """Set input embeddings (not applicable for basic autoencoder)."""
         pass
     def forward(
         self,
         input_values: torch.Tensor,

         return x, torch.tensor(0.0, device=x.device)
+class LearnableMinMaxScaler(nn.Module):
+    """Learnable MinMax scaler that adapts bounds during training.
+    Scales features to [0, 1] using batch min/range with learnable adjustments and
+    a learnable affine transform. Supports 2D (B, F) and 3D (B, T, F) inputs.
+    """
+    def __init__(self, config: AutoencoderConfig):
+        super().__init__()
+        self.config = config
+        input_dim = config.input_dim
+        hidden_dim = config.preprocessing_hidden_dim
+        # Networks to learn adjustments to batch min and range
+        self.min_estimator = nn.Sequential(
+            nn.Linear(input_dim, hidden_dim),
+            nn.ReLU(),
+            nn.Linear(hidden_dim, hidden_dim),
+            nn.ReLU(),
+            nn.Linear(hidden_dim, input_dim),
+        )
+        self.range_estimator = nn.Sequential(
+            nn.Linear(input_dim, hidden_dim),
+            nn.ReLU(),
+            nn.Linear(hidden_dim, hidden_dim),
+            nn.ReLU(),
+            nn.Linear(hidden_dim, input_dim),
+            nn.Softplus(),  # Ensure positive adjustment to range
+        )
+        # Learnable affine transformation parameters
+        self.weight = nn.Parameter(torch.ones(input_dim))
+        self.bias = nn.Parameter(torch.zeros(input_dim))
+        # Running statistics for inference
+        self.register_buffer("running_min", torch.zeros(input_dim))
+        self.register_buffer("running_range", torch.ones(input_dim))
+        self.register_buffer("num_batches_tracked", torch.tensor(0, dtype=torch.long))
+        self.momentum = 0.1
+    def forward(self, x: torch.Tensor, inverse: bool = False) -> Tuple[torch.Tensor, torch.Tensor]:
+        if inverse:
+            return self._inverse_transform(x)
+        original_shape = x.shape
+        if x.dim() == 3:
+            x = x.view(-1, x.size(-1))
+        eps = 1e-8
+        if self.training:
+            batch_min = x.min(dim=0, keepdim=True).values
+            batch_max = x.max(dim=0, keepdim=True).values
+            batch_range = (batch_max - batch_min).clamp_min(eps)
+            # Learn adjustments
+            learned_min_adj = self.min_estimator(batch_min)
+            learned_range_adj = self.range_estimator(batch_range)
+            effective_min = batch_min + learned_min_adj
+            effective_range = batch_range + learned_range_adj + eps
+            # Update running stats with raw batch min/range for stable inversion
+            with torch.no_grad():
+                self.num_batches_tracked += 1
+                if self.num_batches_tracked == 1:
+                    self.running_min.copy_(batch_min.squeeze())
+                    self.running_range.copy_(batch_range.squeeze())
+                else:
+                    self.running_min.mul_(1 - self.momentum).add_(batch_min.squeeze(), alpha=self.momentum)
+                    self.running_range.mul_(1 - self.momentum).add_(batch_range.squeeze(), alpha=self.momentum)
+        else:
+            effective_min = self.running_min.unsqueeze(0)
+            effective_range = self.running_range.unsqueeze(0)
+        # Scale to [0, 1]
+        scaled = (x - effective_min) / effective_range
+        # Learnable affine transform
+        transformed = scaled * self.weight + self.bias
+        if len(original_shape) == 3:
+            transformed = transformed.view(original_shape)
+        # Regularization: encourage non-degenerate range and modest affine params
+        reg_loss = 0.01 * (self.weight.var() + self.bias.var())
+        if self.training:
+            reg_loss = reg_loss + 0.001 * (1.0 / effective_range.clamp_min(1e-3)).mean()
+        return transformed, reg_loss
+    def _inverse_transform(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        if not self.config.learn_inverse_preprocessing:
+            return x, torch.tensor(0.0, device=x.device)
+        original_shape = x.shape
+        if x.dim() == 3:
+            x = x.view(-1, x.size(-1))
+        # Reverse affine
+        x = (x - self.bias) / (self.weight + 1e-8)
+        # Reverse MinMax using running stats
+        x = x * self.running_range.unsqueeze(0) + self.running_min.unsqueeze(0)
+        if len(original_shape) == 3:
+            x = x.view(original_shape)
+        return x, torch.tensor(0.0, device=x.device)
+class LearnableRobustScaler(nn.Module):
+    """Learnable Robust scaler using median and IQR with learnable adjustments.
+    Normalizes as (x - median) / IQR with learnable adjustments and an affine head.
+    Supports 2D (B, F) and 3D (B, T, F) inputs.
+    """
+    def __init__(self, config: AutoencoderConfig):
+        super().__init__()
+        self.config = config
+        input_dim = config.input_dim
+        hidden_dim = config.preprocessing_hidden_dim
+        self.median_estimator = nn.Sequential(
+            nn.Linear(input_dim, hidden_dim),
+            nn.ReLU(),
+            nn.Linear(hidden_dim, hidden_dim),
+            nn.ReLU(),
+            nn.Linear(hidden_dim, input_dim),
+        )
+        self.iqr_estimator = nn.Sequential(
+            nn.Linear(input_dim, hidden_dim),
+            nn.ReLU(),
+            nn.Linear(hidden_dim, hidden_dim),
+            nn.ReLU(),
+            nn.Linear(hidden_dim, input_dim),
+            nn.Softplus(),  # Ensure positive IQR adjustment
+        )
+        self.weight = nn.Parameter(torch.ones(input_dim))
+        self.bias = nn.Parameter(torch.zeros(input_dim))
+        self.register_buffer("running_median", torch.zeros(input_dim))
+        self.register_buffer("running_iqr", torch.ones(input_dim))
+        self.register_buffer("num_batches_tracked", torch.tensor(0, dtype=torch.long))
+        self.momentum = 0.1
+    def forward(self, x: torch.Tensor, inverse: bool = False) -> Tuple[torch.Tensor, torch.Tensor]:
+        if inverse:
+            return self._inverse_transform(x)
+        original_shape = x.shape
+        if x.dim() == 3:
+            x = x.view(-1, x.size(-1))
+        eps = 1e-8
+        if self.training:
+            qs = torch.quantile(x, torch.tensor([0.25, 0.5, 0.75], device=x.device), dim=0)
+            q25, med, q75 = qs[0:1, :], qs[1:2, :], qs[2:3, :]
+            iqr = (q75 - q25).clamp_min(eps)
+            learned_med_adj = self.median_estimator(med)
+            learned_iqr_adj = self.iqr_estimator(iqr)
+            effective_median = med + learned_med_adj
+            effective_iqr = iqr + learned_iqr_adj + eps
+            with torch.no_grad():
+                self.num_batches_tracked += 1
+                if self.num_batches_tracked == 1:
+                    self.running_median.copy_(med.squeeze())
+                    self.running_iqr.copy_(iqr.squeeze())
+                else:
+                    self.running_median.mul_(1 - self.momentum).add_(med.squeeze(), alpha=self.momentum)
+                    self.running_iqr.mul_(1 - self.momentum).add_(iqr.squeeze(), alpha=self.momentum)
+        else:
+            effective_median = self.running_median.unsqueeze(0)
+            effective_iqr = self.running_iqr.unsqueeze(0)
+        normalized = (x - effective_median) / effective_iqr
+        transformed = normalized * self.weight + self.bias
+        if len(original_shape) == 3:
+            transformed = transformed.view(original_shape)
+        reg_loss = 0.01 * (self.weight.var() + self.bias.var())
+        if self.training:
+            reg_loss = reg_loss + 0.001 * (1.0 / effective_iqr.clamp_min(1e-3)).mean()
+        return transformed, reg_loss
+    def _inverse_transform(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        if not self.config.learn_inverse_preprocessing:
+            return x, torch.tensor(0.0, device=x.device)
+        original_shape = x.shape
+        if x.dim() == 3:
+            x = x.view(-1, x.size(-1))
+        x = (x - self.bias) / (self.weight + 1e-8)
+        x = x * self.running_iqr.unsqueeze(0) + self.running_median.unsqueeze(0)
+        if len(original_shape) == 3:
+            x = x.view(original_shape)
+        return x, torch.tensor(0.0, device=x.device)
+class LearnableYeoJohnsonPreprocessor(nn.Module):
+    """Learnable Yeo-Johnson power transform with per-feature λ and affine head.
+    Applies Yeo-Johnson transform elementwise with learnable lambda per feature,
+    followed by standardization and a learnable affine transform. Supports 2D and 3D inputs.
+    """
+    def __init__(self, config: AutoencoderConfig):
+        super().__init__()
+        self.config = config
+        input_dim = config.input_dim
+        # Learnable lambda per feature (unconstrained). Initialize around 1.0
+        self.lmbda = nn.Parameter(torch.ones(input_dim))
+        # Learnable affine parameters after standardization
+        self.weight = nn.Parameter(torch.ones(input_dim))
+        self.bias = nn.Parameter(torch.zeros(input_dim))
+        # Running stats for transformed data
+        self.register_buffer("running_mean", torch.zeros(input_dim))
+        self.register_buffer("running_std", torch.ones(input_dim))
+        self.register_buffer("num_batches_tracked", torch.tensor(0, dtype=torch.long))
+        self.momentum = 0.1
+    def _yeo_johnson(self, x: torch.Tensor, lmbda: torch.Tensor) -> torch.Tensor:
+        eps = 1e-6
+        lmbda = lmbda.unsqueeze(0)  # broadcast over batch
+        pos = x >= 0
+        # For x >= 0
+        if_part = torch.where(
+            torch.abs(lmbda) > eps,
+            ((x + 1.0).clamp_min(eps) ** lmbda - 1.0) / lmbda,
+            torch.log((x + 1.0).clamp_min(eps)),
+        )
+        # For x < 0
+        two_minus_lambda = 2.0 - lmbda
+        else_part = torch.where(
+            torch.abs(two_minus_lambda) > eps,
+            -(((1.0 - x).clamp_min(eps)) ** two_minus_lambda - 1.0) / two_minus_lambda,
+            -torch.log((1.0 - x).clamp_min(eps)),
+        )
+        return torch.where(pos, if_part, else_part)
+    def _yeo_johnson_inverse(self, y: torch.Tensor, lmbda: torch.Tensor) -> torch.Tensor:
+        eps = 1e-6
+        lmbda = lmbda.unsqueeze(0)
+        pos = y >= 0
+        # Inverse for y >= 0
+        x_pos = torch.where(
+            torch.abs(lmbda) > eps,
+            (y * lmbda + 1.0).clamp_min(eps) ** (1.0 / lmbda) - 1.0,
+            torch.exp(y) - 1.0,
+        )
+        # Inverse for y < 0
+        two_minus_lambda = 2.0 - lmbda
+        x_neg = torch.where(
+            torch.abs(two_minus_lambda) > eps,
+            1.0 - (1.0 - y * two_minus_lambda).clamp_min(eps) ** (1.0 / two_minus_lambda),
+            1.0 - torch.exp(-y),
+        )
+        return torch.where(pos, x_pos, x_neg)
+    def forward(self, x: torch.Tensor, inverse: bool = False) -> Tuple[torch.Tensor, torch.Tensor]:
+        if inverse:
+            return self._inverse_transform(x)
+        orig_shape = x.shape
+        if x.dim() == 3:
+            x = x.view(-1, x.size(-1))
+        # Apply Yeo-Johnson
+        y = self._yeo_johnson(x, self.lmbda)
+        # Batch stats and running stats on transformed data
+        if self.training:
+            batch_mean = y.mean(dim=0, keepdim=True)
+            batch_std = y.std(dim=0, keepdim=True).clamp_min(1e-6)
+            with torch.no_grad():
+                self.num_batches_tracked += 1
+                if self.num_batches_tracked == 1:
+                    self.running_mean.copy_(batch_mean.squeeze())
+                    self.running_std.copy_(batch_std.squeeze())
+                else:
+                    self.running_mean.mul_(1 - self.momentum).add_(batch_mean.squeeze(), alpha=self.momentum)
+                    self.running_std.mul_(1 - self.momentum).add_(batch_std.squeeze(), alpha=self.momentum)
+            mean = batch_mean
+            std = batch_std
+        else:
+            mean = self.running_mean.unsqueeze(0)
+            std = self.running_std.unsqueeze(0)
+        y_norm = (y - mean) / std
+        out = y_norm * self.weight + self.bias
+        if len(orig_shape) == 3:
+            out = out.view(orig_shape)
+        # Regularize lambda to avoid extreme values; encourage identity around 1
+        reg = 0.001 * (self.lmbda - 1.0).pow(2).mean() + 0.01 * (self.weight.var() + self.bias.var())
+        return out, reg
+    def _inverse_transform(self, x: torch.Tensor) -> Tuple[torch.Tensor, torch.Tensor]:
+        if not self.config.learn_inverse_preprocessing:
+            return x, torch.tensor(0.0, device=x.device)
+        orig_shape = x.shape
+        if x.dim() == 3:
+            x = x.view(-1, x.size(-1))
+        # Reverse affine and normalization with running stats
+        y = (x - self.bias) / (self.weight + 1e-8)
+        y = y * self.running_std.unsqueeze(0) + self.running_mean.unsqueeze(0)
+        # Inverse Yeo-Johnson
+        out = self._yeo_johnson_inverse(y, self.lmbda)
+        if len(orig_shape) == 3:
+            out = out.view(orig_shape)
+        return out, torch.tensor(0.0, device=x.device)
 class CouplingLayer(nn.Module):
     """Coupling layer for normalizing flows."""
             self.preprocessor = NeuralScaler(config)
         elif config.is_normalizing_flow:
             self.preprocessor = NormalizingFlowPreprocessor(config)
+        elif getattr(config, "is_minmax_scaler", False):
+            self.preprocessor = LearnableMinMaxScaler(config)
+        elif getattr(config, "is_robust_scaler", False):
+            self.preprocessor = LearnableRobustScaler(config)
+        elif getattr(config, "is_yeo_johnson", False):
+            self.preprocessor = LearnableYeoJohnsonPreprocessor(config)
         else:
             raise ValueError(f"Unknown preprocessing type: {config.preprocessing_type}")
         else:
             # Standard encoder output
             self.fc_out = nn.Linear(input_dim, config.latent_dim)
     def _get_activation(self, activation: str) -> nn.Module:
         """Get activation function by name."""
         activations = {
             "threshold": nn.Threshold(threshold=0.1, value=0),
         }
         return activations[activation]
     def forward(self, x: torch.Tensor) -> Union[torch.Tensor, Tuple[torch.Tensor, torch.Tensor, torch.Tensor]]:
         """Forward pass through encoder."""
         # Add noise for denoising autoencoders
 class AutoencoderDecoder(nn.Module):
     """Decoder part of the autoencoder."""
     def __init__(self, config: AutoencoderConfig):
         super().__init__()
         self.config = config
         # Build decoder layers (reverse of encoder)
         layers = []
         input_dim = config.latent_dim
         decoder_dims = config.decoder_dims + [config.input_dim]
         for i, hidden_dim in enumerate(decoder_dims):
             layers.append(nn.Linear(input_dim, hidden_dim))
             # Don't add batch norm, activation, or dropout to the final layer
             if i < len(decoder_dims) - 1:
                 if config.use_batch_norm:
                     layers.append(nn.BatchNorm1d(hidden_dim))
                 layers.append(self._get_activation(config.activation))
                 if config.dropout_rate > 0:
                     layers.append(nn.Dropout(config.dropout_rate))
             else:
                 # Final layer - add appropriate activation based on reconstruction loss
                 if config.reconstruction_loss == "bce":
                     layers.append(nn.Sigmoid())
             input_dim = hidden_dim
         self.decoder = nn.Sequential(*layers)
     def _get_activation(self, activation: str) -> nn.Module:
         """Get activation function by name."""
         activations = {
             "threshold": nn.Threshold(threshold=0.1, value=0),
         }
         return activations[activation]
     def forward(self, x: torch.Tensor) -> torch.Tensor:
         """Forward pass through decoder."""
         return self.decoder(x)
 class AutoencoderModel(PreTrainedModel):
     """
     The bare Autoencoder Model transformer outputting raw hidden-states without any specific head on top.
     This model inherits from PreTrainedModel. Check the superclass documentation for the generic methods the
     library implements for all its model (such as downloading or saving, resizing the input embeddings, pruning heads
     etc.)
     This model is also a PyTorch torch.nn.Module subclass. Use it as a regular PyTorch Module and refer to the
     PyTorch documentation for all matter related to general usage and behavior.
     """
     config_class = AutoencoderConfig
     base_model_prefix = "autoencoder"
     supports_gradient_checkpointing = False
     def __init__(self, config: AutoencoderConfig):
         super().__init__(config)
         self.config = config
         # Tie weights if specified
         if config.tie_weights:
             self._tie_weights()
         # Initialize weights
         self.post_init()
     def _tie_weights(self):
         """Tie encoder and decoder weights (transpose relationship)."""
         # This is a simplified weight tying - in practice, you might want more sophisticated tying
         pass
     def get_input_embeddings(self):
         """Get input embeddings (not applicable for basic autoencoder)."""
         return None
     def set_input_embeddings(self, value):
         """Set input embeddings (not applicable for basic autoencoder)."""
         pass
     def forward(
         self,
         input_values: torch.Tensor,