Skip to main content

Logic Gates · 30 min

NAND Universality

Every Boolean function reduces to NAND gates alone. One gate type, one mask pattern, and in CMOS it costs fewer transistors than AND or OR.

Why This Matters

A NAND2 cell in a 28nm CMOS standard-cell library uses 4 transistors. An AND2 cell uses 6, because AND is literally NAND followed by an inverter (2 more transistors). Pick any combinational function, route it through synthesis, and the netlist that comes out the other side is dominated by NAND2 and NOR2 cells. The reason is structural, not stylistic: NAND alone is functionally complete, and in CMOS it is the cheapest universal gate.

This is the smallest non-trivial universality theorem in computing. The proof fits on a napkin, but its consequence is that one mask pattern, repeated, suffices to build an adder, a multiplier, or a CPU control unit.

Core Definitions

Definition

Functional completeness

A set SS of Boolean operators is functionally complete if every Boolean function f:{0,1}n{0,1}f: \{0,1\}^n \to \{0,1\} can be written as a composition of operators in SS. Equivalently, SS generates the full clone of Boolean operations.

Definition

NAND

The binary operator NAND(a,b)=¬(ab)\text{NAND}(a, b) = \neg(a \wedge b). Truth table:

aabbaba \uparrow b
001
011
101
110

We write aba \uparrow b for NAND(a,b)\text{NAND}(a, b) (Sheffer stroke).

Definition

NOR

The binary operator NOR(a,b)=¬(ab)\text{NOR}(a, b) = \neg(a \vee b), written aba \downarrow b (Peirce arrow). Dual to NAND in the obvious sense.

Definition

Sum-of-products (SOP) form

A Boolean expression written as an OR of ANDs of literals (variables or their negations). Every Boolean function admits an SOP form, derived directly from the rows of its truth table where the output is 11.

Building Everything From NAND

The plan: show that {NAND}\{\text{NAND}\} generates {¬,,}\{\neg, \wedge, \vee\}. Since every truth table has an SOP form, and SOP only needs {¬,,}\{\neg, \wedge, \vee\}, this closes the loop.

NOT from NAND. Tie both inputs together:

¬a=aa\neg a = a \uparrow a

Check: if a=0a = 0, then 00=10 \uparrow 0 = 1. If a=1a = 1, then 11=01 \uparrow 1 = 0. Done with one gate.

AND from NAND. NAND is NOT-of-AND, so invert it:

ab=¬(ab)=(ab)(ab)a \wedge b = \neg(a \uparrow b) = (a \uparrow b) \uparrow (a \uparrow b)

Two NAND gates: one to compute aba \uparrow b, one to invert.

OR from NAND. Use De Morgan: ab=¬(¬a¬b)a \vee b = \neg(\neg a \wedge \neg b). In NAND terms, ¬x¬y\neg x \wedge \neg y is exactly ¬x¬y\neg x \uparrow \neg y inverted; but a cleaner derivation uses the identity directly:

ab=(¬a)(¬b)=(aa)(bb)a \vee b = (\neg a) \uparrow (\neg b) = (a \uparrow a) \uparrow (b \uparrow b)

Verification at a=0,b=0a=0, b=0: (00)(00)=11=0(0 \uparrow 0) \uparrow (0 \uparrow 0) = 1 \uparrow 1 = 0. At a=1,b=0a=1, b=0: (11)(00)=01=1(1 \uparrow 1) \uparrow (0 \uparrow 0) = 0 \uparrow 1 = 1. Three NAND gates total.

XOR worked example. Take f(a,b)=abf(a,b) = a \oplus b. SOP form: ab=(¬ab)(a¬b)a \oplus b = (\neg a \wedge b) \vee (a \wedge \neg b). Translating to NAND, one canonical 4-gate circuit is:

g1 = a NAND b
g2 = a NAND g1
g3 = b NAND g1
g4 = g2 NAND g3   // output = a XOR b

Truth check at a=1,b=0a=1, b=0: g1=10=1g_1 = 1 \uparrow 0 = 1, g2=11=0g_2 = 1 \uparrow 1 = 0, g3=01=1g_3 = 0 \uparrow 1 = 1, g4=01=1g_4 = 0 \uparrow 1 = 1. Correct.

In C, simulating with bitwise operations on 64-bit lanes:

// NAND on a single bit, packed into the LSB
static inline unsigned nand1(unsigned a, unsigned b) {
    return (~(a & b)) & 1u;
}

// XOR built only from NAND
unsigned xor_from_nand(unsigned a, unsigned b) {
    unsigned g1 = nand1(a, b);
    unsigned g2 = nand1(a, g1);
    unsigned g3 = nand1(b, g1);
    return nand1(g2, g3);
}

Why CMOS Prefers NAND

In static CMOS, every gate has a pull-up network of pMOS transistors and a pull-down network of nMOS transistors. The two networks are duals: when one conducts, the other is off.

A NAND2 cell:

  • Pull-down: two nMOS transistors in series between the output and ground. Both inputs must be high for the output to be pulled low.
  • Pull-up: two pMOS transistors in parallel between the output and VDDV_{DD}. If either input is low, the output is pulled high.

Total: 4 transistors.

        VDD
         |
   ----+---+----        pMOS in parallel
   |        |
   A--[pM]  B--[pM]
   |        |
   +---+----+
       |
      OUT
       |
   A--[nM]
       |
   B--[nM]              nMOS in series
       |
      GND

An AND2 cell does not exist as a single CMOS stage. Static CMOS is naturally inverting, so AND2 is implemented as NAND2 followed by an inverter (INV is 2 transistors: one pMOS, one nMOS). Total: 6 transistors.

NOR2 is the dual: nMOS in parallel, pMOS in series, 4 transistors. NOR2 is also universal by symmetric arguments (¬a=aa\neg a = a \downarrow a, ab=(ab)(ab)a \vee b = (a \downarrow b) \downarrow (a \downarrow b), etc.).

Why prefer NAND2 over NOR2 in practice? nMOS transistors have roughly 2×2\times to 3×3\times higher mobility than pMOS of the same width. In NAND2, the slow series stack is pMOS-free (it's nMOS); the pMOS only appears in the parallel pull-up. In NOR2, the series stack is pMOS, which is the slow path. So NAND2 has a better delay-area product. Standard-cell libraries (e.g. TSMC, GlobalFoundries 28nm) ship NAND2X1 as one of the smallest and fastest cells, and synthesis tools (Synopsys Design Compiler, Yosys + ABC) map heavily onto it.

Non-Universal Sets

Not every set works. {,}\{\wedge, \vee\} alone is not functionally complete: you cannot express NOT.

The reason is monotonicity. Both AND and OR are monotone Boolean functions: if you flip an input from 0 to 1, the output never flips from 1 to 0. Any composition of monotone functions is monotone. But NOT is not monotone (¬0=1\neg 0 = 1, ¬1=0\neg 1 = 0). So no expression in {,}\{\wedge, \vee\} can compute ¬x\neg x.

This is one of the five preservation classes in Post's theorem (1941), which fully classifies functionally complete sets: a set is complete iff it escapes all five Post classes (monotone, self-dual, 00-preserving, 11-preserving, affine). NAND escapes all five by itself; that is exactly why {}\{\uparrow\} alone is universal.

Main Theorem

Theorem

NAND is functionally complete

Statement

Every Boolean function f:{0,1}n{0,1}f: \{0,1\}^n \to \{0,1\} can be expressed as a finite composition of NAND operations.

Intuition

Truth tables give SOP. SOP needs only NOT, AND, OR. All three are NAND expressions of length at most 3. Substitute and you are done.

Proof Sketch

Let ff be arbitrary. Construct its SOP form from the truth table: for each input x\mathbf{x} with f(x)=1f(\mathbf{x}) = 1, write the minterm (AND of literals matching x\mathbf{x}), then OR all minterms. This expresses ff in {¬,,}\{\neg, \wedge, \vee\}.

Now substitute:

  • ¬aaa\neg a \mapsto a \uparrow a
  • ab(ab)(ab)a \wedge b \mapsto (a \uparrow b) \uparrow (a \uparrow b)
  • ab(aa)(bb)a \vee b \mapsto (a \uparrow a) \uparrow (b \uparrow b)

The result is a NAND-only expression for ff. Size grows by at most a constant factor per node.

Why It Matters

This is what lets a fab tape out a chip using a heavily reused NAND2 cell. It also justifies why logic synthesis tools target NAND/NOR forms as an intermediate representation before technology mapping.

Failure Mode

The proof gives an SOP-based construction, which is exponential in the worst case (consider parity on nn bits: 2n12^{n-1} minterms). Universality is a statement about expressibility, not about size. Real synthesis spends most of its time minimizing the NAND netlist via algorithms like ESPRESSO and AIG rewriting (ABC).

Practical Takeaway: Standard-Cell Libraries

A typical 28nm standard-cell library ships hundreds of cells: INV, BUF, NAND2/3/4, NOR2/3, AOI21, OAI21, MUX2, XOR2, flip-flops, latches. But the foundation is NAND2 and NOR2. Synthesis flows typically:

  1. Read RTL (Verilog/VHDL).
  2. Elaborate to a generic gate-level netlist.
  3. Convert to an And-Inverter Graph (AIG): only AND2 and INV nodes. AIG is universal because {,¬}\{\wedge, \neg\} is complete.
  4. Optimize the AIG (rewriting, refactoring; see Berkeley's ABC tool).
  5. Technology map the AIG onto the target cell library, where NAND2 is usually the cheapest match for an AND-INV pair.

The same chip, retargeted to a library where NOR2 is cheaper (rare but possible in some process corners), would emerge dominated by NOR2 instead. The universality is what makes the retargeting mechanical.

Forward Connection: Neural Network Universality

NAND universality has a structural cousin in neural networks: a feedforward network with one hidden layer of sigmoid (or ReLU) units and enough width can approximate any continuous function on a compact set (Cybenko 1989, Hornik 1991). The flavor is different, since approximation is over reals, not exact composition over {0,1}\{0,1\}, but the moral is shared: a single nonlinear primitive, replicated enough times, spans the space. NAND is the discrete analogue.

Common Confusions

Watch Out

Universal does not mean efficient

NAND-only realizability says you can always build ff; it does not say the resulting circuit has minimal depth, area, or delay. Parity on 64 bits has a 6-deep XOR tree, but the naive SOP-to-NAND translation balloons to thousands of gates. Synthesis tools exist precisely to bridge "expressible" and "small".

Watch Out

AND2 is not a primitive CMOS gate

Beginners often draw AND2 as if it were a single CMOS stage. It is not. Static CMOS stages always invert (the pull-down conducts when inputs make the pull-up disconnect). So AND2 = NAND2 + INV physically. The NAND-first convention in textbooks reflects the silicon, not pedagogy.

Watch Out

{AND, OR} fails for a monotonicity reason, not a counting reason

Some sources say "you need an odd number of operations" or similar. The real reason {,}\{\wedge, \vee\} cannot express NOT is that both operators are monotone, and compositions of monotone functions are monotone. NOT is not monotone. No amount of nesting fixes this.

Exercises

ExerciseCore

Problem

Build a 2-to-1 multiplexer mux(s,a,b)=(¬sa)(sb)\mathrm{mux}(s, a, b) = (\neg s \wedge a) \vee (s \wedge b) using NAND gates only. Count the gates.

ExerciseCore

Problem

Prove that {,}\{\wedge, \vee\} alone (AND and OR without NOT) cannot express the NOT function. Make the argument formal using the monotonicity property.

ExerciseAdvanced

Problem

A standard-cell library on a given process node prices NAND2 at 1.0 unit-area, NOR2 at 1.2, INV at 0.6, AOI21 (AND-OR-INVERT, computing ab+c\overline{a \cdot b + c}) at 1.4. You need to synthesize the function f(a,b,c)=ab+cf(a, b, c) = a \cdot b + c. Compare two implementations: (i) the naive AND-OR cascade using AOI21 + INV, (ii) a NAND-only translation. Which is smaller, and by how much?

References

Canonical:

  • Petzold, Code: The Hidden Language of Computer Hardware and Software (2nd ed, 2022), ch. 10–14 — builds logic gates from relays and traces NAND/NOR to full adders and memory; the foundational narrative for this page
  • Patterson & Hennessy, Computer Organization and Design: The Hardware/Software Interface (5th ed, 2014), Appendix B — formal treatment of Boolean algebra, gate-level minimization, and CMOS implementation of universal gates
  • Scott, But How Do It Know? The Basic Principles of Computers for Everyone (2009) — traces NAND gates through to a working CPU with unusual clarity; excellent companion for readers who want the physical picture
  • Roth & Kinney, Fundamentals of Logic Design (7th ed, 2013), ch. 2–3 — rigorous coverage of functional completeness proofs, Sheffer stroke, and SOP-to-NAND/NOR transformations
  • Mano & Ciletti, Digital Design (5th ed, 2013), ch. 2–3 — standard undergraduate treatment of NAND/NOR universality with worked gate-minimization examples

Accessible:

  • Nisan & Schocken, The Elements of Computing Systems (2nd ed, 2021) — project-based: students implement every chip from NAND up, making universality viscerally concrete
  • Horowitz & Hill, The Art of Electronics (3rd ed, 2015), ch. 10 — situates logic families and CMOS inversion in the physical context of real circuits

Next Topics