NAND Universality: One Gate Builds Every Function

Why This Matters

A NAND2 cell in a 28nm CMOS standard-cell library uses 4 transistors. An AND2 cell uses 6, because AND is literally NAND followed by an inverter (2 more transistors). Pick any combinational function, route it through synthesis, and the netlist that comes out the other side is dominated by NAND2 and NOR2 cells. The reason is structural, not stylistic: NAND alone is functionally complete, and in CMOS it is the cheapest universal gate.

This is the smallest non-trivial universality theorem in computing. The proof fits on a napkin, but its consequence is that one mask pattern, repeated, suffices to build an adder, a multiplier, or a CPU control unit.

Core Definitions

Definition

Functional completeness

A set $S$ of Boolean operators is functionally complete if every Boolean function $f: \{0,1\}^n \to \{0,1\}$ can be written as a composition of operators in $S$ . Equivalently, $S$ generates the full clone of Boolean operations.

Definition

NAND

The binary operator $\text{NAND}(a, b) = \neg(a \wedge b)$ . Truth table:

$a$	$b$	$a \uparrow b$
0	0	1
0	1	1
1	0	1
1	1	0

We write $a \uparrow b$ for $\text{NAND}(a, b)$ (Sheffer stroke).

Definition

NOR

The binary operator $\text{NOR}(a, b) = \neg(a \vee b)$ , written $a \downarrow b$ (Peirce arrow). Dual to NAND in the obvious sense.

Definition

Sum-of-products (SOP) form

A Boolean expression written as an OR of ANDs of literals (variables or their negations). Every Boolean function admits an SOP form, derived directly from the rows of its truth table where the output is $1$ .

Building Everything From NAND

The plan: show that $\{\text{NAND}\}$ generates $\{\neg, \wedge, \vee\}$ . Since every truth table has an SOP form, and SOP only needs $\{\neg, \wedge, \vee\}$ , this closes the loop.

NOT from NAND. Tie both inputs together:

$\neg a = a \uparrow a$

Check: if $a = 0$ , then $0 \uparrow 0 = 1$ . If $a = 1$ , then $1 \uparrow 1 = 0$ . Done with one gate.

AND from NAND. NAND is NOT-of-AND, so invert it:

$a \wedge b = \neg(a \uparrow b) = (a \uparrow b) \uparrow (a \uparrow b)$

Two NAND gates: one to compute $a \uparrow b$ , one to invert.

OR from NAND. Use De Morgan: $a \vee b = \neg(\neg a \wedge \neg b)$ . In NAND terms, $\neg x \wedge \neg y$ is exactly $\neg x \uparrow \neg y$ inverted; but a cleaner derivation uses the identity directly:

$a \vee b = (\neg a) \uparrow (\neg b) = (a \uparrow a) \uparrow (b \uparrow b)$

Verification at $a=0, b=0$ : $(0 \uparrow 0) \uparrow (0 \uparrow 0) = 1 \uparrow 1 = 0$ . At $a=1, b=0$ : $(1 \uparrow 1) \uparrow (0 \uparrow 0) = 0 \uparrow 1 = 1$ . Three NAND gates total.

XOR worked example. Take $f(a,b) = a \oplus b$ . SOP form: $a \oplus b = (\neg a \wedge b) \vee (a \wedge \neg b)$ . Translating to NAND, one canonical 4-gate circuit is:

g1 = a NAND b
g2 = a NAND g1
g3 = b NAND g1
g4 = g2 NAND g3   // output = a XOR b

Truth check at $a=1, b=0$ : $g_1 = 1 \uparrow 0 = 1$ , $g_2 = 1 \uparrow 1 = 0$ , $g_3 = 0 \uparrow 1 = 1$ , $g_4 = 0 \uparrow 1 = 1$ . Correct.

In C, simulating with bitwise operations on 64-bit lanes:

// NAND on a single bit, packed into the LSB
static inline unsigned nand1(unsigned a, unsigned b) {
    return (~(a & b)) & 1u;
}

// XOR built only from NAND
unsigned xor_from_nand(unsigned a, unsigned b) {
    unsigned g1 = nand1(a, b);
    unsigned g2 = nand1(a, g1);
    unsigned g3 = nand1(b, g1);
    return nand1(g2, g3);
}

Why CMOS Prefers NAND

In static CMOS, every gate has a pull-up network of pMOS transistors and a pull-down network of nMOS transistors. The two networks are duals: when one conducts, the other is off.

A NAND2 cell:

Pull-down: two nMOS transistors in series between the output and ground. Both inputs must be high for the output to be pulled low.
Pull-up: two pMOS transistors in parallel between the output and $V_{DD}$ . If either input is low, the output is pulled high.

Total: 4 transistors.

        VDD
         |
   ----+---+----        pMOS in parallel
   |        |
   A--[pM]  B--[pM]
   |        |
   +---+----+
       |
      OUT
       |
   A--[nM]
       |
   B--[nM]              nMOS in series
       |
      GND

An AND2 cell does not exist as a single CMOS stage. Static CMOS is naturally inverting, so AND2 is implemented as NAND2 followed by an inverter (INV is 2 transistors: one pMOS, one nMOS). Total: 6 transistors.

NOR2 is the dual: nMOS in parallel, pMOS in series, 4 transistors. NOR2 is also universal by symmetric arguments ( $\neg a = a \downarrow a$ , $a \vee b = (a \downarrow b) \downarrow (a \downarrow b)$ , etc.).

Why prefer NAND2 over NOR2 in practice? nMOS transistors have roughly $2\times$ to $3\times$ higher mobility than pMOS of the same width. In NAND2, the slow series stack is pMOS-free (it's nMOS); the pMOS only appears in the parallel pull-up. In NOR2, the series stack is pMOS, which is the slow path. So NAND2 has a better delay-area product. Standard-cell libraries (e.g. TSMC, GlobalFoundries 28nm) ship NAND2X1 as one of the smallest and fastest cells, and synthesis tools (Synopsys Design Compiler, Yosys + ABC) map heavily onto it.

Non-Universal Sets

Not every set works. $\{\wedge, \vee\}$ alone is not functionally complete: you cannot express NOT.

The reason is monotonicity. Both AND and OR are monotone Boolean functions: if you flip an input from 0 to 1, the output never flips from 1 to 0. Any composition of monotone functions is monotone. But NOT is not monotone ( $\neg 0 = 1$ , $\neg 1 = 0$ ). So no expression in $\{\wedge, \vee\}$ can compute $\neg x$ .

This is one of the five preservation classes in Post's theorem (1941), which fully classifies functionally complete sets: a set is complete iff it escapes all five Post classes (monotone, self-dual, $0$ -preserving, $1$ -preserving, affine). NAND escapes all five by itself; that is exactly why $\{\uparrow\}$ alone is universal.

Main Theorem

Theorem

NAND is functionally complete

Statement

Every Boolean function $f: \{0,1\}^n \to \{0,1\}$ can be expressed as a finite composition of NAND operations.

Intuition

Truth tables give SOP. SOP needs only NOT, AND, OR. All three are NAND expressions of length at most 3. Substitute and you are done.

Proof Sketch

Let $f$ be arbitrary. Construct its SOP form from the truth table: for each input $\mathbf{x}$ with $f(\mathbf{x}) = 1$ , write the minterm (AND of literals matching $\mathbf{x}$ ), then OR all minterms. This expresses $f$ in $\{\neg, \wedge, \vee\}$ .

Now substitute:

$\neg a \mapsto a \uparrow a$
$a \wedge b \mapsto (a \uparrow b) \uparrow (a \uparrow b)$
$a \vee b \mapsto (a \uparrow a) \uparrow (b \uparrow b)$

The result is a NAND-only expression for $f$ . Size grows by at most a constant factor per node.

Why It Matters

This is what lets a fab tape out a chip using a heavily reused NAND2 cell. It also justifies why logic synthesis tools target NAND/NOR forms as an intermediate representation before technology mapping.

Failure Mode

The proof gives an SOP-based construction, which is exponential in the worst case (consider parity on $n$ bits: $2^{n-1}$ minterms). Universality is a statement about expressibility, not about size. Real synthesis spends most of its time minimizing the NAND netlist via algorithms like ESPRESSO and AIG rewriting (ABC).

report a correction →

Practical Takeaway: Standard-Cell Libraries

A typical 28nm standard-cell library ships hundreds of cells: INV, BUF, NAND2/3/4, NOR2/3, AOI21, OAI21, MUX2, XOR2, flip-flops, latches. But the foundation is NAND2 and NOR2. Synthesis flows typically:

Read RTL (Verilog/VHDL).
Elaborate to a generic gate-level netlist.
Convert to an And-Inverter Graph (AIG): only AND2 and INV nodes. AIG is universal because $\{\wedge, \neg\}$ is complete.
Optimize the AIG (rewriting, refactoring; see Berkeley's ABC tool).
Technology map the AIG onto the target cell library, where NAND2 is usually the cheapest match for an AND-INV pair.

The same chip, retargeted to a library where NOR2 is cheaper (rare but possible in some process corners), would emerge dominated by NOR2 instead. The universality is what makes the retargeting mechanical.

Forward Connection: Neural Network Universality

NAND universality has a structural cousin in neural networks: a feedforward network with one hidden layer of sigmoid (or ReLU) units and enough width can approximate any continuous function on a compact set (Cybenko 1989, Hornik 1991). The flavor is different, since approximation is over reals, not exact composition over $\{0,1\}$ , but the moral is shared: a single nonlinear primitive, replicated enough times, spans the space. NAND is the discrete analogue.

Common Confusions

Watch Out

Universal does not mean efficient

NAND-only realizability says you can always build $f$ ; it does not say the resulting circuit has minimal depth, area, or delay. Parity on 64 bits has a 6-deep XOR tree, but the naive SOP-to-NAND translation balloons to thousands of gates. Synthesis tools exist precisely to bridge "expressible" and "small".

Watch Out

AND2 is not a primitive CMOS gate

Beginners often draw AND2 as if it were a single CMOS stage. It is not. Static CMOS stages always invert (the pull-down conducts when inputs make the pull-up disconnect). So AND2 = NAND2 + INV physically. The NAND-first convention in textbooks reflects the silicon, not pedagogy.

Watch Out

{AND, OR} fails for a monotonicity reason, not a counting reason

Some sources say "you need an odd number of operations" or similar. The real reason $\{\wedge, \vee\}$ cannot express NOT is that both operators are monotone, and compositions of monotone functions are monotone. NOT is not monotone. No amount of nesting fixes this.

Exercises

ExerciseCore

Problem

Build a 2-to-1 multiplexer $\mathrm{mux}(s, a, b) = (\neg s \wedge a) \vee (s \wedge b)$ using NAND gates only. Count the gates.

ExerciseCore

Problem

Prove that $\{\wedge, \vee\}$ alone (AND and OR without NOT) cannot express the NOT function. Make the argument formal using the monotonicity property.

ExerciseAdvanced

Problem

A standard-cell library on a given process node prices NAND2 at 1.0 unit-area, NOR2 at 1.2, INV at 0.6, AOI21 (AND-OR-INVERT, computing $\overline{a \cdot b + c}$ ) at 1.4. You need to synthesize the function $f(a, b, c) = a \cdot b + c$ . Compare two implementations: (i) the naive AND-OR cascade using AOI21 + INV, (ii) a NAND-only translation. Which is smaller, and by how much?

References

Canonical:

Petzold, Code: The Hidden Language of Computer Hardware and Software (2nd ed, 2022), ch. 10–14 — builds logic gates from relays and traces NAND/NOR to full adders and memory; the foundational narrative for this page
Patterson & Hennessy, Computer Organization and Design: The Hardware/Software Interface (5th ed, 2014), Appendix B — formal treatment of Boolean algebra, gate-level minimization, and CMOS implementation of universal gates
Scott, But How Do It Know? The Basic Principles of Computers for Everyone (2009) — traces NAND gates through to a working CPU with unusual clarity; excellent companion for readers who want the physical picture
Roth & Kinney, Fundamentals of Logic Design (7th ed, 2013), ch. 2–3 — rigorous coverage of functional completeness proofs, Sheffer stroke, and SOP-to-NAND/NOR transformations
Mano & Ciletti, Digital Design (5th ed, 2013), ch. 2–3 — standard undergraduate treatment of NAND/NOR universality with worked gate-minimization examples

Accessible:

Nisan & Schocken, The Elements of Computing Systems (2nd ed, 2021) — project-based: students implement every chip from NAND up, making universality viscerally concrete
Horowitz & Hill, The Art of Electronics (3rd ed, 2015), ch. 10 — situates logic families and CMOS inversion in the physical context of real circuits

Next Topics

Boolean Algebra and Normal Forms — canonical sum-of-products and product-of-sums representations that underlie the NAND translation procedure
Combinational Circuit Synthesis and Minimization — Karnaugh maps, Quine–McCluskey, and how synthesis tools move from "expressible" to "small"
CMOS Gate Implementation — why static CMOS stages always invert, how NAND2 and NOR2 are laid out in silicon, and the pass-transistor alternatives
Neural Network Universality — the continuous analogue: Cybenko's theorem, width vs. depth trade-offs, and the comparison with discrete Boolean universality