Index: The Book of Statistical ProofsProbability Distributions ▷ Univariate continuous distributions ▷ Normal distribution ▷ Probability of being within standard deviations from mean

Theorem: (also called “68-95-99.7 rule”) Let $X$ be a random variable following a normal distribution with mean $\mu$ and variance $\sigma^2$. Then, about $68\%$, $95\%$ and $99.7\%$ of the values of $X$ will fall within 1, 2 and 3 standard deviations from the mean, respectively:

\[\label{eq:norm-probstd} \begin{split} \mathrm{Pr}(\mu-1\sigma \leq X \leq \mu+1\sigma) &\approx 68 \% \\ \mathrm{Pr}(\mu-2\sigma \leq X \leq \mu+2\sigma) &\approx 95 \% \\ \mathrm{Pr}(\mu-3\sigma \leq X \leq \mu+3\sigma) &\approx 99.7 \% \; . \end{split}\]

Proof: The cumulative distribution function of a normally distributed random variable $X$ is

\[\label{eq:norm-cdf} F_X(x) = \frac{1}{2} \left[ 1 + \mathrm{erf}\left( \frac{x-\mu}{\sqrt{2} \sigma} \right) \right]\]

where $\mathrm{erf}(x)$ is the error function defined as

\[\label{eq:erf} \mathrm{erf}(x) = \frac{2}{\sqrt{\pi}} \int_{0}^{x} \exp(-t^2) \, \mathrm{d}t\]

which exhibits a point-symmetry property:

\[\label{eq:erf-symm} \mathrm{erf}(-x) = -\mathrm{erf}(x) \; .\]

Thus, the probability that $X$ falls between $\mu - a \cdot \sigma$ and $\mu + a \cdot \sigma$ is equal to:

\[\label{eq:prob-std} \begin{split} p(a) &= \mathrm{Pr}(\mu-a\sigma \leq X \leq \mu+a\sigma) \\ &= F_X(\mu+a\sigma) - F_X(\mu-a\sigma) \\ &\overset{\eqref{eq:norm-cdf}}{=} \frac{1}{2} \left[ 1 + \mathrm{erf}\left( \frac{\mu+a\sigma-\mu}{\sqrt{2} \sigma} \right) \right] - \frac{1}{2} \left[ 1 + \mathrm{erf}\left( \frac{\mu-a\sigma-\mu}{\sqrt{2} \sigma} \right) \right] \\ &= \frac{1}{2} \left[ \mathrm{erf}\left( \frac{\mu+a\sigma-\mu}{\sqrt{2} \sigma} \right) - \mathrm{erf}\left( \frac{\mu-a\sigma-\mu}{\sqrt{2} \sigma} \right) \right] \\ &= \frac{1}{2} \left[ \mathrm{erf}\left( \frac{a}{\sqrt{2}} \right) - \mathrm{erf}\left( -\frac{a}{\sqrt{2}} \right) \right] \\ &\overset{\eqref{eq:erf-symm}}{=} \frac{1}{2} \left[ \mathrm{erf}\left( \frac{a}{\sqrt{2}} \right) + \mathrm{erf}\left( \frac{a}{\sqrt{2}} \right) \right] \\ &= \mathrm{erf}\left( \frac{a}{\sqrt{2}} \right) \\ \end{split}\]

With that, we can use numerical implementations of the error function to calculate:

\[\label{eq:norm-probstd-qed} \begin{split} \mathrm{Pr}(\mu-1\sigma \leq X \leq \mu+1\sigma) &= p(1) = 68.27 \% \\ \mathrm{Pr}(\mu-2\sigma \leq X \leq \mu+2\sigma) &= p(2) = 95.45 \% \\ \mathrm{Pr}(\mu-3\sigma \leq X \leq \mu+3\sigma) &= p(3) = 99.73 \% \; . \end{split}\]
Sources:

Metadata: ID: P321 | shortcut: norm-probstd | author: JoramSoch | date: 2022-05-08, 18:56.