Skip to main content

Beyond LLMS

Equivariant Deep Learning

Networks that respect symmetry: if the input transforms under a group action, the output transforms predictably. Equivariance generalizes translation equivariance in CNNs to rotations, permutations, and gauge symmetries, reducing sample complexity and improving generalization on structured data.

AdvancedTier 2CurrentSupporting~50 min

Why This Matters

A CNN detects a cat whether it appears on the left or right of the image. This is translation equivariance: shifting the input shifts the feature maps by the same amount. The CNN does not need to learn the cat pattern separately for each position because weight sharing enforces the symmetry.

Equivariant deep learning generalizes this idea to arbitrary symmetries. If your data has rotational symmetry (molecular structures, satellite imagery), permutation symmetry (sets, graphs, point clouds), or gauge symmetry (physical fields), you can build networks that respect these symmetries by construction. The payoff: fewer parameters, less training data, better generalization.

This is the core idea of geometric deep learning (Bronstein et al., 2021): most successful architectures can be understood as equivariant networks for specific symmetry groups.

Core Definitions

Definition

Group Action

A group GG acts on a space X\mathcal{X} through a map ρ:G×XX\rho: G \times \mathcal{X} \to \mathcal{X} satisfying ρ(e,x)=x\rho(e, x) = x (identity) and ρ(g1,ρ(g2,x))=ρ(g1g2,x)\rho(g_1, \rho(g_2, x)) = \rho(g_1 g_2, x) (composition). Examples: translation group (R2,+)(\mathbb{R}^2, +) acts on images by shifting. Rotation group SO(3)SO(3) acts on 3D point clouds by rotating. Permutation group SnS_n acts on sets by reordering.

Proposition

Equivariance

Statement

A function f:XYf: \mathcal{X} \to \mathcal{Y} is equivariant with respect to group GG if and only if:

f(ρX(g,x))=ρY(g,f(x))gG,  xXf(\rho_X(g, x)) = \rho_Y(g, f(x)) \quad \forall g \in G, \; x \in \mathcal{X}

Transforming the input, then applying ff, gives the same result as applying ff, then transforming the output. The function "commutes" with the group action.

Invariance is the special case where ρY\rho_Y is trivial: f(ρX(g,x))=f(x)f(\rho_X(g, x)) = f(x) for all gg. The output does not change at all.

Intuition

An equivariant function preserves the structure of transformations. If you rotate a molecule 90 degrees and then predict its energy, you should get the same energy as if you first predict and then (conceptually) rotate. If you rotate it and predict its dipole moment, the dipole should rotate by the same 90 degrees.

Invariance (energy does not change under rotation) and equivariance (dipole rotates with the molecule) are both useful, and which one you want depends on what you are predicting.

Why It Matters

Equivariance is a hard constraint, not a soft regularizer. A network that is equivariant by construction will respect the symmetry perfectly on all inputs, not just approximately on training data. This is a strict generalization guarantee: the network cannot learn to violate the symmetry, even with adversarial data. This is why equivariant networks need dramatically less data than unconstrained networks for tasks with known symmetries.

Failure Mode

The symmetry must be exact. If your data has approximate symmetry (e.g., images are roughly but not exactly rotation-invariant because of gravity), enforcing exact equivariance can hurt. The network cannot learn that "up" and "down" are different if you force rotational invariance. In such cases, data augmentation (soft symmetry) may outperform equivariant architectures (hard symmetry).

Why Equivariance Reduces Parameters

Theorem

Equivariance Implies Weight Sharing

Statement

A linear map W:RdinRdoutW: \mathbb{R}^{d_{\text{in}}} \to \mathbb{R}^{d_{\text{out}}} that is equivariant with respect to representations ρin\rho_{\text{in}} and ρout\rho_{\text{out}} of GG satisfies:

Wρin(g)=ρout(g)WgGW \rho_{\text{in}}(g) = \rho_{\text{out}}(g) W \quad \forall g \in G

The set of such intertwiners is a linear subspace of Rdout×din\mathbb{R}^{d_{\text{out}} \times d_{\text{in}}}. Its dimension is determined by Schur's lemma applied to the irreducible decomposition of ρin\rho_{\text{in}} and ρout\rho_{\text{out}}: it equals ρmρoutmρin\sum_\rho m^{\text{out}}_\rho \, m^{\text{in}}_\rho over irreps ρ\rho shared by the two representations, where mρm_\rho are the multiplicities. The dimension depends on the representations, not on G|G| alone — equivariance reduces the parameter count when irreps are mismatched, and the reduction can be much larger or much smaller than the naive dindout/Gd_{\text{in}} d_{\text{out}} / |G| heuristic.

Intuition

The equivariance constraint forces parameter sharing. In a CNN, translation equivariance forces the same filter weights at every position, reducing parameters from O(image size×filter size)O(\text{image size} \times \text{filter size}) to O(filter size)O(\text{filter size}). For rotation equivariance, the constraint forces the filter to be "steerable" (a linear combination of a fixed set of basis filters), further reducing parameters.

Fewer free parameters means the function class is smaller, which improves generalization via the bias-variance tradeoff. The bias is increased (you cannot represent symmetry-breaking functions), but the variance decreases (less overfitting) by exactly the right amount when the symmetry holds.

Why It Matters

This is why equivariant networks work with less data: the parameter sharing from equivariance is not arbitrary compression, it is compression that matches the data symmetry. The amount of compression depends on how the input and output representations decompose into irreps; it is not a clean 1/G1/|G| factor. Common cases give large reductions in practice — a G=8|G| = 8 rotation group acting via the regular representation reduces parameters by roughly 8×8\times, and a continuous rotation group restricts a planar filter to its radial profile — but the trivial representation gives no reduction at all, so the right way to think about equivariance is "matching the irrep structure," not "dividing by group size."

Failure Mode

Computing the equivariant subspace requires solving the intertwiner condition Wρin(g)=ρout(g)WW\rho_{\text{in}}(g) = \rho_{\text{out}}(g)W for all gg, which requires knowledge of the group representations. For simple groups (translations, rotations, permutations), the representations are well-known. For complex or non-standard symmetries, finding the representations is a research problem in itself.

Architectures as Equivariant Networks

ArchitectureSymmetry groupEquivariance typeDomain
CNNTranslation (Z2,+)(\mathbb{Z}^2, +)Feature maps shift with inputImages
GNNPermutation SnS_nOutput permutes with node reorderingGraphs
TransformerPermutation SnS_n (on tokens)Equivariant (with positional encoding: breaks symmetry)Sequences
Steerable CNNRotation SO(2)SO(2) or O(2)O(2)Feature maps rotate with inputOriented images
SE(3)-TransformerRotation + translation SE(3)SE(3)Equivariant on 3D coordinatesMolecules, proteins
SchNet / DimeNetE(3)E(3) (Euclidean group)Invariant predictions, equivariant internal featuresMolecular dynamics
DeepSetsPermutation SnS_nInvariant to set element orderingPoint clouds, sets

Common Confusions

Watch Out

Equivariance and invariance are different

Invariance means the output does not change under the group action (f(gx)=f(x)f(gx) = f(x)). Equivariance means the output transforms predictably (f(gx)=gf(x)f(gx) = gf(x)). Predicting molecular energy should be invariant to rotation. Predicting molecular forces should be equivariant (forces rotate with the molecule). Using the wrong one is a modeling error, not just a terminology issue.

Watch Out

Data augmentation is not the same as equivariance

Data augmentation (training on rotated/flipped copies of the data) encourages the network to learn approximate equivariance from data. An equivariant architecture enforces exact equivariance by construction. Augmentation needs more data and may not generalize to unseen transformations. Equivariance guarantees the symmetry holds everywhere. The tradeoff: augmentation is more flexible (works with approximate symmetries), equivariance is more efficient (works with exact symmetries).

Exercises

ExerciseCore

Problem

A function f:RnRf: \mathbb{R}^n \to \mathbb{R} is invariant to the permutation group SnS_n (any reordering of the input coordinates gives the same output). Give three examples of such functions and one example of a function that is not permutation-invariant.

ExerciseAdvanced

Problem

Explain why a standard MLP (fully connected network) is not equivariant to any non-trivial group action on its inputs, while a CNN IS equivariant to translations. What structural property of the CNN enforces this?

References

Canonical:

  • Cohen & Welling, "Group Equivariant Convolutional Networks" (ICML 2016). The foundational paper. arXiv:1602.07576
  • Bronstein, Bruna, Cohen, Velickovic, "Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges" (2021). The unifying survey: every later equivariant-architecture paper cites this framing. arXiv:2104.13478
  • Cohen & Welling, "Steerable CNNs" (ICLR 2017). The irrep-decomposition framework that justifies the "matching irrep structure" view of weight sharing. arXiv:1612.08498
  • Thomas, Smidt, Kearnes, Yang, Li, Kohlhoff, Riley, "Tensor Field Networks: Rotation- and Translation-Equivariant Neural Networks for 3D Point Clouds" (2018). The first SE(3)SE(3)-equivariant point-cloud network, built from spherical harmonics. arXiv:1802.08219
  • Fuchs, Worrall, Fischer, Welling, "SE(3)-Transformers: 3D Roto-Translation Equivariant Attention Networks" (NeurIPS 2020). Equivariant self-attention via Clebsch-Gordan tensor products. arXiv:2006.10503
  • Satorras, Hoogeboom, Welling, "E(n) Equivariant Graph Neural Networks" (ICML 2021). The simplest scalar-only E(n)E(n)-equivariant message passing; used widely as a strong baseline. arXiv:2102.09844
  • Cohen, Geiger, Köhler, Welling, "Spherical CNNs" (ICLR 2018). SO(3)SO(3)-equivariant convolutions on the sphere via generalized Fourier analysis. arXiv:1801.10130
  • Maron, Ben-Hamu, Shamir, Lipman, "Invariant and Equivariant Graph Networks" (ICLR 2019). Universal approximation results for permutation-equivariant networks on graphs. arXiv:1812.09902
  • Kondor, Trivedi, "On the Generalization of Equivariance and Convolution in Neural Networks to the Action of Compact Groups" (ICML 2018). Proves convolution is equivalent to equivariance for compact groups. arXiv:1802.03690

Current:

  • Weiler & Cesa, "General E(2)-Equivariant Steerable CNNs" (NeurIPS 2019). arXiv:1911.08251
  • Batzner et al., "E(3)-Equivariant Graph Neural Networks for Data-Efficient and Accurate Interatomic Potentials" (Nat. Commun. 2022). NequIP. arXiv:2101.03164
  • Zaheer et al., "Deep Sets" (NeurIPS 2017). Permutation invariance and the ρ(ϕ(xi))\rho(\sum \phi(x_i)) universal architecture. arXiv:1703.06114
  • Liao, Smidt, "EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations" (ICLR 2024). Current-best equivariant transformer for OC20-style catalysis benchmarks. arXiv:2306.12059

Next Topics

Last reviewed: April 26, 2026

Canonical graph

Required before and derived from this topic

These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.