CosineGate: Semantic Dynamic Routing via Cosine Incompatibility in Residual Networks
Abstract
Modern deep residual networks perform substantial redundant computation by evaluating all residual blocks for every input, even when identity mappings suffice. We introduce CosineGate, an end-to-end differentiable architecture for dynamic routing in residual networks that uses cosine incompatibility between identity and residual feature representations as a self-supervised skip signal. CosineGate measures semantic redundancy through the Cosine Incompatibility Ratio (CIR), defined as 1 - cos(x, F(x)), and uses Gumbel-Softmax relaxation to enable per-sample, per-block gating during training. A progressive FLOPs regularization term controls average compute usage without destabilizing optimization. On CIFAR-10, CosineGate spans the accuracy-efficiency Pareto frontier: an aggressive configuration achieves 89.9 percent accuracy with 24.1 percent FLOPs savings, a balanced configuration achieves 91.3 percent accuracy with 28.5 percent savings at epoch 160, and a conservative configuration reaches a peak of 93.2 percent accuracy with minimal compute reduction. These results match or exceed ResNet-20 (91.3 percent) while reducing computation, without auxiliary supervision, distillation, or task-specific heuristics. Our results demonstrate that simple geometric measures of feature incompatibility provide a principled and effective signal for dynamic residual routing.
Community
"I introduce CosineGate, a SOTA dynamic routing mechanism for ResNets that uses the Cosine Incompatibility Ratio (CIR) as a self-supervised signal.
🚀 Why it matters: It matches ResNet-20 accuracy on CIFAR-10 while slashing computation by 28.5%—without needing extra 'predictor' sub-networks or distillation.
🛠️ Key Features:
Fully differentiable (via Gumbel-Softmax).
Bio-inspired (Predictive Coding).
Plug-and-play for efficient computer vision.
Check out our GitHub Repo linked in the sidebar for the implementation!"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Mixture-of-Experts with Gradient Conflict-Driven Subspace Topology Pruning for Emergent Modularity (2025)
- GMoPE:A Prompt-Expert Mixture Framework for Graph Foundation Models (2025)
- Interpretable and Adaptive Node Classification on Heterophilic Graphs via Combinatorial Scoring and Hybrid Learning (2025)
- Virtual Width Networks (2025)
- Selective Sinkhorn Routing for Improved Sparse Mixture of Experts (2025)
- From Graphs to Gates: DNS-HyXNet, A Lightweight and Deployable Sequential Model for Real-Time DNS Tunnel Detection (2025)
- CurvaDion: Curvature-Adaptive Distributed Orthonormalization (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper