Beyond LLMS
Equivariant Deep Learning
Networks that respect symmetry: if the input transforms under a group action, the output transforms predictably. Equivariance generalizes translation equivariance in CNNs to rotations, permutations, and gauge symmetries, reducing sample complexity and improving generalization on structured data.
Prerequisites
Why This Matters
A CNN detects a cat whether it appears on the left or right of the image. This is translation equivariance: shifting the input shifts the feature maps by the same amount. The CNN does not need to learn the cat pattern separately for each position because weight sharing enforces the symmetry.
Equivariant deep learning generalizes this idea to arbitrary symmetries. If your data has rotational symmetry (molecular structures, satellite imagery), permutation symmetry (sets, graphs, point clouds), or gauge symmetry (physical fields), you can build networks that respect these symmetries by construction. The payoff: fewer parameters, less training data, better generalization.
This is the core idea of geometric deep learning (Bronstein et al., 2021): most successful architectures can be understood as equivariant networks for specific symmetry groups.
Core Definitions
Group Action
A group acts on a space through a map satisfying (identity) and (composition). Examples: translation group acts on images by shifting. Rotation group acts on 3D point clouds by rotating. Permutation group acts on sets by reordering.
Equivariance
Statement
A function is equivariant with respect to group if and only if:
Transforming the input, then applying , gives the same result as applying , then transforming the output. The function "commutes" with the group action.
Invariance is the special case where is trivial: for all . The output does not change at all.
Intuition
An equivariant function preserves the structure of transformations. If you rotate a molecule 90 degrees and then predict its energy, you should get the same energy as if you first predict and then (conceptually) rotate. If you rotate it and predict its dipole moment, the dipole should rotate by the same 90 degrees.
Invariance (energy does not change under rotation) and equivariance (dipole rotates with the molecule) are both useful, and which one you want depends on what you are predicting.
Why It Matters
Equivariance is a hard constraint, not a soft regularizer. A network that is equivariant by construction will respect the symmetry perfectly on all inputs, not just approximately on training data. This is a strict generalization guarantee: the network cannot learn to violate the symmetry, even with adversarial data. This is why equivariant networks need dramatically less data than unconstrained networks for tasks with known symmetries.
Failure Mode
The symmetry must be exact. If your data has approximate symmetry (e.g., images are roughly but not exactly rotation-invariant because of gravity), enforcing exact equivariance can hurt. The network cannot learn that "up" and "down" are different if you force rotational invariance. In such cases, data augmentation (soft symmetry) may outperform equivariant architectures (hard symmetry).
Why Equivariance Reduces Parameters
Equivariance Implies Weight Sharing
Statement
A linear map that is equivariant with respect to representations and of satisfies:
The set of such intertwiners is a linear subspace of . Its dimension is determined by Schur's lemma applied to the irreducible decomposition of and : it equals over irreps shared by the two representations, where are the multiplicities. The dimension depends on the representations, not on alone — equivariance reduces the parameter count when irreps are mismatched, and the reduction can be much larger or much smaller than the naive heuristic.
Intuition
The equivariance constraint forces parameter sharing. In a CNN, translation equivariance forces the same filter weights at every position, reducing parameters from to . For rotation equivariance, the constraint forces the filter to be "steerable" (a linear combination of a fixed set of basis filters), further reducing parameters.
Fewer free parameters means the function class is smaller, which improves generalization via the bias-variance tradeoff. The bias is increased (you cannot represent symmetry-breaking functions), but the variance decreases (less overfitting) by exactly the right amount when the symmetry holds.
Why It Matters
This is why equivariant networks work with less data: the parameter sharing from equivariance is not arbitrary compression, it is compression that matches the data symmetry. The amount of compression depends on how the input and output representations decompose into irreps; it is not a clean factor. Common cases give large reductions in practice — a rotation group acting via the regular representation reduces parameters by roughly , and a continuous rotation group restricts a planar filter to its radial profile — but the trivial representation gives no reduction at all, so the right way to think about equivariance is "matching the irrep structure," not "dividing by group size."
Failure Mode
Computing the equivariant subspace requires solving the intertwiner condition for all , which requires knowledge of the group representations. For simple groups (translations, rotations, permutations), the representations are well-known. For complex or non-standard symmetries, finding the representations is a research problem in itself.
Architectures as Equivariant Networks
| Architecture | Symmetry group | Equivariance type | Domain |
|---|---|---|---|
| CNN | Translation | Feature maps shift with input | Images |
| GNN | Permutation | Output permutes with node reordering | Graphs |
| Transformer | Permutation (on tokens) | Equivariant (with positional encoding: breaks symmetry) | Sequences |
| Steerable CNN | Rotation or | Feature maps rotate with input | Oriented images |
| SE(3)-Transformer | Rotation + translation | Equivariant on 3D coordinates | Molecules, proteins |
| SchNet / DimeNet | (Euclidean group) | Invariant predictions, equivariant internal features | Molecular dynamics |
| DeepSets | Permutation | Invariant to set element ordering | Point clouds, sets |
Common Confusions
Equivariance and invariance are different
Invariance means the output does not change under the group action (). Equivariance means the output transforms predictably (). Predicting molecular energy should be invariant to rotation. Predicting molecular forces should be equivariant (forces rotate with the molecule). Using the wrong one is a modeling error, not just a terminology issue.
Data augmentation is not the same as equivariance
Data augmentation (training on rotated/flipped copies of the data) encourages the network to learn approximate equivariance from data. An equivariant architecture enforces exact equivariance by construction. Augmentation needs more data and may not generalize to unseen transformations. Equivariance guarantees the symmetry holds everywhere. The tradeoff: augmentation is more flexible (works with approximate symmetries), equivariance is more efficient (works with exact symmetries).
Exercises
Problem
A function is invariant to the permutation group (any reordering of the input coordinates gives the same output). Give three examples of such functions and one example of a function that is not permutation-invariant.
Problem
Explain why a standard MLP (fully connected network) is not equivariant to any non-trivial group action on its inputs, while a CNN IS equivariant to translations. What structural property of the CNN enforces this?
References
Canonical:
- Cohen & Welling, "Group Equivariant Convolutional Networks" (ICML 2016). The foundational paper. arXiv:1602.07576
- Bronstein, Bruna, Cohen, Velickovic, "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges" (2021). The unifying survey: every later equivariant-architecture paper cites this framing. arXiv:2104.13478
- Cohen & Welling, "Steerable CNNs" (ICLR 2017). The irrep-decomposition framework that justifies the "matching irrep structure" view of weight sharing. arXiv:1612.08498
- Thomas, Smidt, Kearnes, Yang, Li, Kohlhoff, Riley, "Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds" (2018). The first -equivariant point-cloud network, built from spherical harmonics. arXiv:1802.08219
- Fuchs, Worrall, Fischer, Welling, "SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks" (NeurIPS 2020). Equivariant self-attention via Clebsch-Gordan tensor products. arXiv:2006.10503
- Satorras, Hoogeboom, Welling, "E(n) Equivariant Graph Neural Networks" (ICML 2021). The simplest scalar-only -equivariant message passing; used widely as a strong baseline. arXiv:2102.09844
- Cohen, Geiger, Köhler, Welling, "Spherical CNNs" (ICLR 2018). -equivariant convolutions on the sphere via generalized Fourier analysis. arXiv:1801.10130
- Maron, Ben-Hamu, Shamir, Lipman, "Invariant and Equivariant Graph Networks" (ICLR 2019). Universal approximation results for permutation-equivariant networks on graphs. arXiv:1812.09902
- Kondor, Trivedi, "On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups" (ICML 2018). Proves convolution is equivalent to equivariance for compact groups. arXiv:1802.03690
Current:
- Weiler & Cesa, "General E(2)-Equivariant Steerable CNNs" (NeurIPS 2019). arXiv:1911.08251
- Batzner et al., "E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials" (Nat. Commun. 2022). NequIP. arXiv:2101.03164
- Zaheer et al., "Deep Sets" (NeurIPS 2017). Permutation invariance and the universal architecture. arXiv:1703.06114
- Liao, Smidt, "EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations" (ICLR 2024). Current-best equivariant transformer for OC20-style catalysis benchmarks. arXiv:2306.12059
Next Topics
- Riemannian optimization: optimization on manifolds where equivariance constraints define the geometry
- Representation learning: how learned representations encode (or fail to encode) data symmetries
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
3- Convolutional Neural Networkslayer 3 · tier 2
- Graph Neural Networkslayer 3 · tier 2
- Attention for Protein Structure: AlphaFold and Successorslayer 4 · tier 3
Derived topics
3- Representation Learning Theorylayer 3 · tier 2
- Riemannian Optimization and Manifold Constraintslayer 3 · tier 2
- Graph Neural Networks for Moleculeslayer 4 · tier 3