Foundations
Skewness, Kurtosis, and Higher Moments
Distribution shape beyond mean and variance: skewness measures tail asymmetry, kurtosis measures tail extremeness, cumulants are the cleaner language, and heavy-tailed distributions break all of these.
Why This Matters
Mean and variance tell you the center and spread of a distribution. They say nothing about shape. Two distributions can have identical mean and variance but completely different tail behavior, asymmetry, and outlier frequency. Skewness and kurtosis capture this shape information.
Most textbooks get the interpretation wrong. Kurtosis is not peakedness. It measures how much extreme values dominate the distribution. This page gives the correct interpretations and shows exactly when these statistics fail.
Correct Interpretations
Skewness
The third standardized moment:
Correct interpretation: the third standardized moment. Positive skew means the right tail contributes more cubed-deviation mass and is the heuristic "longer tail" reading; negative skew is the mirror. Zero skewness only means the third standardized moment is zero. It does not imply symmetry of the distribution and it does not imply symmetric tails: there are asymmetric distributions whose positive and negative cubed deviations happen to cancel. Symmetric distributions have zero skewness whenever the third moment exists, but the converse fails.
Skewness is not about which way the distribution leans
People say "right-skewed means the distribution leans right." Wrong. Right-skewed (positive skew) means the RIGHT TAIL is longer. The bulk of the mass is actually on the LEFT. Income distributions are right-skewed: most people earn moderate amounts, but a long right tail extends to very high incomes. The mean is pulled right of the median.
Kurtosis
The fourth standardized moment:
The normal distribution has kurtosis exactly 3.
Correct interpretation: how much extreme values (outliers) dominate the distribution, compared to moderate deviations. High kurtosis means rare but extreme values contribute disproportionately. It is about TAIL WEIGHT, not peak shape.
Why Kurtosis Measures Tails, Not Peaks
The $z^4$ weighting makes extreme values dominate the kurtosis integral
Kurtosis is NOT peakedness
This is one of the most persistent wrong claims in statistics education. Kurtosis does not measure how peaked or flat the distribution is. A distribution can be flat-topped with high kurtosis (if it has heavy tails) or peaked with low kurtosis (if it has light tails). The fourth power amplifies extreme z-scores. Kurtosis measures the contribution of the tails, period.
Excess Kurtosis
Subtracts the normal distribution's kurtosis as a baseline. The normal has excess kurtosis 0. Positive excess kurtosis means heavier tails than normal. Negative excess kurtosis (possible, minimum is ) means lighter tails than normal.
This is not a different concept. It is just kurtosis recentered so the Gaussian baseline is zero.
Coefficient of Variation
Relative variability compared to the mean. Dimensionless. Useful for comparing spread across different scales (e.g., comparing variability of heights in centimeters vs. inches gives the same CV). The CV is meaningful mainly for positive-scale quantities where zero has a real interpretation; for sign-changing data, the absolute-value form is the standard convention but interpretation is still fragile.
When CV is useful: positive-scale data where zero has meaning (waiting times, concentrations, demand).
When CV is garbage: mean near zero (CV explodes), data crosses zero (interpretation breaks), or variance is infinite.
When Moments Exist and When They Do Not
Higher moments require heavier integrability conditions. They fail in a strict order: the fourth moment fails before the third, the third before the second.
| Distribution | Mean? | Variance? | Skewness? | Kurtosis? |
|---|---|---|---|---|
| Normal | Yes | Yes | Yes | Yes |
| Uniform | Yes | Yes | Yes | Yes |
| Laplace | Yes | Yes | Yes | Yes |
| Student , | Yes | Yes | Yes | Yes |
| Student , | Yes | Yes | Yes | No |
| Student , | Yes | Yes | No | No |
| Student , | Yes | No | No | No |
| Student , (Cauchy) | No | No | No | No |
| Pareto, | Yes | Yes | Yes | Yes |
| Pareto, | Yes | Yes | Yes | No |
| Pareto, | Yes | Yes | No | No |
| Pareto, | Yes | No | No | No |
| Pareto, | No | No | No | No |
The rule for Student : the -th moment exists only if .
The rule for Pareto: the -th raw moment exists only if .
These two rules alone cover most cases you will encounter.
Cumulants: The Better Language
Moments mix information from lower orders into higher orders. Cumulants isolate the genuinely new information at each order.
Cumulants
Cumulants are defined through the cumulant generating function (log of the moment generating function):
The first four cumulants are:
| Cumulant | Value | Meaning |
|---|---|---|
| Mean | Center | |
| Variance | Spread | |
| Third central moment | Asymmetry | |
| Fourth central moment | Tail departure from Gaussian |
Gaussian Characterization via Cumulants
Statement
Suppose the moment generating function of is finite in a neighborhood of zero, so the cumulant generating function is analytic at and all cumulants are well-defined. Under this regularity assumption, is Gaussian if and only if all cumulants of order vanish: for all .
Without the MGF-finiteness assumption, "all cumulants vanish for " can be ill-defined (cumulants beyond a finite order may not exist) or insufficient on its own to pin down the distribution.
Intuition
The Gaussian is the only distribution that is "pure location and scale." Every other distribution carries additional shape information in its higher cumulants. If you measure any departure from Gaussianity, it shows up as a nonzero cumulant somewhere.
Why It Matters
This theorem is why cumulants are the natural language for measuring non-Gaussianity. The third cumulant measures asymmetry departure from Gaussian. The fourth cumulant measures tail departure. Each higher cumulant captures a new independent direction of non-Gaussianity. This is the theoretical basis for tests of normality and for independent component analysis (ICA).
Failure Mode
The theorem requires existence of the MGF in a neighborhood of zero, which excludes heavy-tailed distributions. For distributions where the MGF does not exist (e.g., Student with low degrees of freedom), cumulants beyond a certain order do not exist, and the characterization cannot be applied.
| Property | Moments | Cumulants |
|---|---|---|
| Easy to define | Yes | Slightly less |
| Easy to interpret at low order | Yes | Yes |
| Clean under sums of independent variables | No | Yes ( is additive) |
| Redundant across orders | More | Less |
| Better for serious theory | Not really | Yes |
The additivity property is the main reason cumulants matter: if and are independent, then for all . Moments do not have this property beyond .
Tail Probability: What Practitioners Actually Care About
Rather than memorizing kurtosis values, look at tail probabilities directly.
| Distribution | Interpretation | ||
|---|---|---|---|
| Normal | ~4.6% | ~0.27% | Baseline |
| Student , | ~6.5% | ~1.0% | Heavier tails |
| Laplace | ~6.7% | ~1.2% | Heavier tails |
| Uniform | 0% | 0% | Bounded, no tail events |
This table is more useful than raw kurtosis values because it shows what actually happens in practice: how often do extreme events occur?
Exercises
Problem
Compute the skewness and kurtosis of the exponential distribution with rate . Is it right-skewed or left-skewed?
Problem
The Cauchy distribution has no finite mean. What happens if you compute the sample mean of 1000 Cauchy observations and repeat this experiment 100 times? What do you observe about the sample means?
Problem
Prove that for independent random variables and , the cumulant of the sum equals the sum of cumulants: .
References
Canonical:
- Casella & Berger, Statistical Inference (2002), Chapter 2
- DeCarlo, "On the Meaning and Use of Kurtosis" (Psychological Methods, 1997). The definitive correction to the "peakedness" myth.
Current:
- Westfall, "Kurtosis as Peakedness, 1905-2014. R.I.P." (The American Statistician, 2014). Comprehensive debunking.
- Stuart and Ord, Kendall's Advanced Theory of Statistics, Vol. 1: Distribution Theory (1994). Standard moments-and-cumulants reference.
- McCullagh, Tensor Methods in Statistics (1987). Modern cumulant/tensor-moment treatment with Bell-polynomial moment-cumulant conversion.
Next Topics
- Sub-Gaussian random variables: the tail class defined by MGF bounds, where kurtosis-like behavior is controlled globally
- Concentration inequalities: when moments exist, they give tail bounds
- Robust statistics: what to use when moments do not exist or are unreliable
Last reviewed: April 26, 2026
Canonical graph
Required before and derived from this topic
These links come from prerequisite edges in the curriculum graph. Editorial suggestions are shown here only when the target page also cites this page as a prerequisite.
Required prerequisites
2- Common Probability Distributionslayer 0A · tier 1
- Expectation, Variance, Covariance, and Momentslayer 0A · tier 1
Derived topics
3- Concentration Inequalitieslayer 1 · tier 1
- Sub-Gaussian Random Variableslayer 2 · tier 1
- Robust Statistics and M-Estimatorslayer 3 · tier 2
Graph-backed continuations