Proofs are printed in boldDefinitions are set in italics
Proofs: by Number, by TopicDefinitions: by Number, by Topic
Specials: General Theorems, Probability Distributions, Statistical Models, Model Selection Criteria


Chapter I: General Theorems

  1. Probability theory

    1.1. Random experiments
       1.1.1. Random experiment
       1.1.2. Sample space
       1.1.3. Event space
       1.1.4. Probability space

    1.2. Random variables
       1.2.1. Random event
       1.2.2. Random variable
       1.2.3. Random vector
       1.2.4. Random matrix
       1.2.5. Constant
       1.2.6. Discrete vs. continuous
       1.2.7. Univariate vs. multivariate

    1.3. Probability
       1.3.1. Probability
       1.3.2. Joint probability
       1.3.3. Marginal probability
       1.3.4. Conditional probability
       1.3.5. Exceedance probability
       1.3.6. Statistical independence
       1.3.7. Conditional independence
       1.3.8. Probability under independence
       1.3.9. Mutual exclusivity
       1.3.10. Probability under exclusivity

    1.4. Probability axioms
       1.4.1. Axioms of probability
       1.4.2. Monotonicity of probability
       1.4.3. Probability of the empty set
       1.4.4. Probability of the complement
       1.4.5. Range of probability
       1.4.6. Addition law of probability
       1.4.7. Law of total probability
       1.4.8. Probability of exhaustive events (1)
       1.4.9. Probability of exhaustive events (2)

    1.5. Probability distributions
       1.5.1. Probability distribution
       1.5.2. Joint distribution
       1.5.3. Marginal distribution
       1.5.4. Conditional distribution
       1.5.5. Sampling distribution

    1.6. Probability functions
       1.6.1. Probability mass function
       1.6.2. Probability mass function of sum of independents
       1.6.3. Probability mass function of strictly increasing function
       1.6.4. Probability mass function of strictly decreasing function
       1.6.5. Probability mass function of invertible function
       1.6.6. Probability density function
       1.6.7. Probability density function of sum of independents
       1.6.8. Probability density function of strictly increasing function
       1.6.9. Probability density function of strictly decreasing function
       1.6.10. Probability density function of invertible function
       1.6.11. Probability density function of linear transformation
       1.6.12. Probability density function in terms of cumulative distribution function
       1.6.13. Cumulative distribution function
       1.6.14. Cumulative distribution function of sum of independents
       1.6.15. Cumulative distribution function of strictly increasing function
       1.6.16. Cumulative distribution function of strictly decreasing function
       1.6.17. Cumulative distribution function of discrete random variable
       1.6.18. Cumulative distribution function of continuous random variable
       1.6.19. Probability integral transform
       1.6.20. Inverse transformation method
       1.6.21. Distributional transformation
       1.6.22. Joint cumulative distribution function
       1.6.23. Quantile function
       1.6.24. Quantile function in terms of cumulative distribution function
       1.6.25. Characteristic function
       1.6.26. Characteristic function of arbitrary function
       1.6.27. Moment-generating function
       1.6.28. Moment-generating function of arbitrary function
       1.6.29. Moment-generating function of linear transformation
       1.6.30. Moment-generating function of linear combination
       1.6.31. Cumulant-generating function
       1.6.32. Probability-generating function

    1.7. Expected value
       1.7.1. Definition
       1.7.2. Sample mean
       1.7.3. Non-negative random variable
       1.7.4. Non-negativity
       1.7.5. Linearity
       1.7.6. Monotonicity
       1.7.7. (Non-)Multiplicativity
       1.7.8. Expectation of a trace
       1.7.9. Expectation of a quadratic form
       1.7.10. Law of total expectation
       1.7.11. Law of the unconscious statistician
       1.7.12. Expected value of a random vector
       1.7.13. Expected value of a random matrix

    1.8. Variance
       1.8.1. Definition
       1.8.2. Sample variance
       1.8.3. Partition into expected values
       1.8.4. Non-negativity
       1.8.5. Variance of a constant
       1.8.6. Invariance under addition
       1.8.7. Scaling upon multiplication
       1.8.8. Variance of a sum
       1.8.9. Variance of linear combination
       1.8.10. Additivity under independence
       1.8.11. Law of total variance
       1.8.12. Precision

    1.9. Covariance
       1.9.1. Definition
       1.9.2. Sample covariance
       1.9.3. Partition into expected values
       1.9.4. Symmetry
       1.9.5. Self-covariance
       1.9.6. Covariance under independence
       1.9.7. Relationship to correlation
       1.9.8. Law of total covariance
       1.9.9. Covariance matrix
       1.9.10. Sample covariance matrix
       1.9.11. Covariance matrix and expected values
       1.9.12. Symmetry
       1.9.13. Positive semi-definiteness
       1.9.14. Invariance under addition of vector
       1.9.15. Scaling upon multiplication with matrix
       1.9.16. Cross-covariance matrix
       1.9.17. Covariance matrix of a sum
       1.9.18. Covariance matrix and correlation matrix
       1.9.19. Precision matrix
       1.9.20. Precision matrix and correlation matrix

    1.10. Correlation
       1.10.1. Definition
       1.10.2. Range
       1.10.3. Sample correlation coefficient
       1.10.4. Relationship to standard scores
       1.10.5. Correlation matrix
       1.10.6. Sample correlation matrix

    1.11. Measures of central tendency
       1.11.1. Median
       1.11.2. Mode

    1.12. Measures of statistical dispersion
       1.12.1. Standard deviation
       1.12.2. Full width at half maximum

    1.13. Further summary statistics
       1.13.1. Minimum
       1.13.2. Maximum

    1.14. Further moments
       1.14.1. Moment
       1.14.2. Moment in terms of moment-generating function
       1.14.3. Raw moment
       1.14.4. First raw moment is mean
       1.14.5. Second raw moment and variance
       1.14.6. Central moment
       1.14.7. First central moment is zero
       1.14.8. Second central moment is variance
       1.14.9. Standardized moment

  2. Information theory

    2.1. Shannon entropy
       2.1.1. Definition
       2.1.2. Non-negativity
       2.1.3. Concavity
       2.1.4. Conditional entropy
       2.1.5. Joint entropy
       2.1.6. Cross-entropy
       2.1.7. Convexity of cross-entropy
       2.1.8. Gibbs’ inequality
       2.1.9. Log sum inequality

    2.2. Differential entropy
       2.2.1. Definition
       2.2.2. Negativity
       2.2.3. Invariance under addition
       2.2.4. Addition upon multiplication
       2.2.5. Addition upon matrix multiplication
       2.2.6. Non-invariance and transformation
       2.2.7. Conditional differential entropy
       2.2.8. Joint differential entropy
       2.2.9. Differential cross-entropy

    2.3. Discrete mutual information
       2.3.1. Definition
       2.3.2. Relation to marginal and conditional entropy
       2.3.3. Relation to marginal and joint entropy
       2.3.4. Relation to joint and conditional entropy

    2.4. Continuous mutual information
       2.4.1. Definition
       2.4.2. Relation to marginal and conditional differential entropy
       2.4.3. Relation to marginal and joint differential entropy
       2.4.4. Relation to joint and conditional differential entropy

    2.5. Kullback-Leibler divergence
       2.5.1. Definition
       2.5.2. Non-negativity (1)
       2.5.3. Non-negativity (2)
       2.5.4. Non-symmetry
       2.5.5. Convexity
       2.5.6. Additivity for independent distributions
       2.5.7. Invariance under parameter transformation
       2.5.8. Relation to discrete entropy
       2.5.9. Relation to differential entropy

  3. Estimation theory

    3.1. Point estimates
       3.1.1. Mean squared error
       3.1.2. Partition of the mean squared error into bias and variance

    3.2. Interval estimates
       3.2.1. Confidence interval
       3.2.2. Construction of confidence intervals using Wilks’ theorem

  4. Frequentist statistics

    4.1. Likelihood theory
       4.1.1. Likelihood function
       4.1.2. Log-likelihood function
       4.1.3. Maximum likelihood estimation
       4.1.4. MLE can be biased
       4.1.5. Maximum log-likelihood
       4.1.6. Method of moments

    4.2. Statistical hypotheses
       4.2.1. Statistical hypothesis
       4.2.2. Simple vs. composite
       4.2.3. Point/exact vs. set/inexact
       4.2.4. One-tailed vs. two-tailed

    4.3. Hypothesis testing
       4.3.1. Statistical test
       4.3.2. Null hypothesis
       4.3.3. Alternative hypothesis
       4.3.4. One-tailed vs. two-tailed
       4.3.5. Test statistic
       4.3.6. Size of a test
       4.3.7. Power of a test
       4.3.8. Significance level
       4.3.9. Critical value
       4.3.10. p-value
       4.3.11. Distribution of p-value under null hypothesis

  5. Bayesian statistics

    5.1. Probabilistic modeling
       5.1.1. Generative model
       5.1.2. Likelihood function
       5.1.3. Prior distribution
       5.1.4. Full probability model
       5.1.5. Joint likelihood
       5.1.6. Joint likelihood is product of likelihood and prior
       5.1.7. Posterior distribution
       5.1.8. Posterior density is proportional to joint likelihood
       5.1.9. Marginal likelihood
       5.1.10. Marginal likelihood is integral of joint likelihood

    5.2. Prior distributions
       5.2.1. Flat vs. hard vs. soft
       5.2.2. Uniform vs. non-uniform
       5.2.3. Informative vs. non-informative
       5.2.4. Empirical vs. non-empirical
       5.2.5. Conjugate vs. non-conjugate
       5.2.6. Maximum entropy priors
       5.2.7. Empirical Bayes priors
       5.2.8. Reference priors

    5.3. Bayesian inference
       5.3.1. Bayes’ theorem
       5.3.2. Bayes’ rule
       5.3.3. Empirical Bayes
       5.3.4. Variational Bayes


Chapter II: Probability Distributions

  1. Univariate discrete distributions

    1.1. Discrete uniform distribution
       1.1.1. Definition
       1.1.2. Probability mass function
       1.1.3. Cumulative distribution function
       1.1.4. Quantile function

    1.2. Bernoulli distribution
       1.2.1. Definition
       1.2.2. Probability mass function
       1.2.3. Mean
       1.2.4. Variance
       1.2.5. Range of variance
       1.2.6. Shannon entropy

    1.3. Binomial distribution
       1.3.1. Definition
       1.3.2. Probability mass function
       1.3.3. Mean
       1.3.4. Variance
       1.3.5. Range of variance
       1.3.6. Shannon entropy

    1.4. Poisson distribution
       1.4.1. Definition
       1.4.2. Probability mass function
       1.4.3. Mean
       1.4.4. Variance

  2. Multivariate discrete distributions

    2.1. Categorical distribution
       2.1.1. Definition
       2.1.2. Probability mass function
       2.1.3. Mean
       2.1.4. Covariance
       2.1.5. Shannon entropy

    2.2. Multinomial distribution
       2.2.1. Definition
       2.2.2. Probability mass function
       2.2.3. Mean
       2.2.4. Covariance
       2.2.5. Shannon entropy

  3. Univariate continuous distributions

    3.1. Continuous uniform distribution
       3.1.1. Definition
       3.1.2. Standard uniform distribution
       3.1.3. Probability density function
       3.1.4. Cumulative distribution function
       3.1.5. Quantile function
       3.1.6. Mean
       3.1.7. Median
       3.1.8. Mode

    3.2. Normal distribution
       3.2.1. Definition
       3.2.2. Standard normal distribution
       3.2.3. Relationship to standard normal distribution (1)
       3.2.4. Relationship to standard normal distribution (2)
       3.2.5. Relationship to standard normal distribution (3)
       3.2.6. Relationship to chi-squared distribution
       3.2.7. Relationship to t-distribution
       3.2.8. Special case of multivariate normal distribution
       3.2.9. Gaussian integral
       3.2.10. Probability density function
       3.2.11. Moment-generating function
       3.2.12. Cumulative distribution function
       3.2.13. Cumulative distribution function without error function
       3.2.14. Probability of being within standard deviations from mean
       3.2.15. Quantile function
       3.2.16. Mean
       3.2.17. Median
       3.2.18. Mode
       3.2.19. Variance
       3.2.20. Full width at half maximum
       3.2.21. Extreme points
       3.2.22. Inflection points
       3.2.23. Differential entropy
       3.2.24. Kullback-Leibler divergence
       3.2.25. Maximum entropy distribution
       3.2.26. Linear combination

    3.3. t-distribution
       3.3.1. Definition
       3.3.2. Non-standardized t-distribution
       3.3.3. Relationship to non-standardized t-distribution
       3.3.4. Special case of multivariate t-distribution
       3.3.5. Probability density function

    3.4. Gamma distribution
       3.4.1. Definition
       3.4.2. Standard gamma distribution
       3.4.3. Relationship to standard gamma distribution (1)
       3.4.4. Relationship to standard gamma distribution (2)
       3.4.5. Special case of Wishart distribution
       3.4.6. Probability density function
       3.4.7. Cumulative distribution function
       3.4.8. Quantile function
       3.4.9. Mean
       3.4.10. Variance
       3.4.11. Logarithmic expectation
       3.4.12. Expectation of x ln x
       3.4.13. Differential entropy
       3.4.14. Kullback-Leibler divergence

    3.5. Exponential distribution
       3.5.1. Definition
       3.5.2. Special case of gamma distribution
       3.5.3. Probability density function
       3.5.4. Cumulative distribution function
       3.5.5. Quantile function
       3.5.6. Mean
       3.5.7. Median
       3.5.8. Mode

    3.6. Log-normal distribution
       3.6.1. Definition
       3.6.2. Probability density function
       3.6.3. Cumulative distribution function
       3.6.4. Median
       3.6.5. Mode
       3.6.6. Quantile Function

    3.7. Chi-squared distribution
       3.7.1. Definition
       3.7.2. Special case of gamma distribution
       3.7.3. Probability density function
       3.7.4. Moments

    3.8. F-distribution
       3.8.1. Definition
       3.8.2. Probability density function

    3.9. Beta distribution
       3.9.1. Definition
       3.9.2. Probability density function
       3.9.3. Moment-generating function
       3.9.4. Cumulative distribution function
       3.9.5. Mean
       3.9.6. Variance

    3.10. Wald distribution
       3.10.1. Definition
       3.10.2. Probability density function
       3.10.3. Moment-generating function
       3.10.4. Mean
       3.10.5. Variance

  4. Multivariate continuous distributions

    4.1. Multivariate normal distribution
       4.1.1. Definition
       4.1.2. Special case of matrix-normal distribution
       4.1.3. Probability density function
       4.1.4. Mean
       4.1.5. Covariance
       4.1.6. Differential entropy
       4.1.7. Kullback-Leibler divergence
       4.1.8. Linear transformation
       4.1.9. Marginal distributions
       4.1.10. Conditional distributions
       4.1.11. Conditions for independence

    4.2. Multivariate t-distribution
       4.2.1. Definition
       4.2.2. Probability density function
       4.2.3. Relationship to F-distribution

    4.3. Normal-gamma distribution
       4.3.1. Definition
       4.3.2. Special case of normal-Wishart distribution
       4.3.3. Probability density function
       4.3.4. Mean
       4.3.5. Covariance
       4.3.6. Differential entropy
       4.3.7. Kullback-Leibler divergence
       4.3.8. Marginal distributions
       4.3.9. Conditional distributions
       4.3.10. Drawing samples

    4.4. Dirichlet distribution
       4.4.1. Definition
       4.4.2. Probability density function
       4.4.3. Kullback-Leibler divergence
       4.4.4. Exceedance probabilities

  5. Matrix-variate continuous distributions

    5.1. Matrix-normal distribution
       5.1.1. Definition
       5.1.2. Equivalence to multivariate normal distribution
       5.1.3. Probability density function
       5.1.4. Mean
       5.1.5. Covariance
       5.1.6. Differential entropy
       5.1.7. Kullback-Leibler divergence
       5.1.8. Transposition
       5.1.9. Linear transformation
       5.1.10. Marginal distributions
       5.1.11. Drawing samples

    5.2. Wishart distribution
       5.2.1. Definition
       5.2.2. Kullback-Leibler divergence

    5.3. Normal-Wishart distribution
       5.3.1. Definition
       5.3.2. Probability density function
       5.3.3. Mean


Chapter III: Statistical Models

  1. Univariate normal data

    1.1. Univariate Gaussian
       1.1.1. Definition
       1.1.2. Maximum likelihood estimation
       1.1.3. One-sample t-test
       1.1.4. Two-sample t-test
       1.1.5. Paired t-test
       1.1.6. Conjugate prior distribution
       1.1.7. Posterior distribution
       1.1.8. Log model evidence
       1.1.9. Accuracy and complexity

    1.2. Univariate Gaussian with known variance
       1.2.1. Definition
       1.2.2. Maximum likelihood estimation
       1.2.3. One-sample z-test
       1.2.4. Two-sample z-test
       1.2.5. Paired z-test
       1.2.6. Conjugate prior distribution
       1.2.7. Posterior distribution
       1.2.8. Log model evidence
       1.2.9. Accuracy and complexity
       1.2.10. Log Bayes factor
       1.2.11. Expectation of log Bayes factor
       1.2.12. Cross-validated log model evidence
       1.2.13. Cross-validated log Bayes factor
       1.2.14. Expectation of cross-validated log Bayes factor

    1.3. Simple linear regression
       1.3.1. Definition
       1.3.2. Special case of multiple linear regression
       1.3.3. Ordinary least squares (1)
       1.3.4. Ordinary least squares (2)
       1.3.5. Expectation of estimates
       1.3.6. Variance of estimates
       1.3.7. Distribution of estimates
       1.3.8. Correlation of estimates
       1.3.9. Effects of mean-centering
       1.3.10. Regression line
       1.3.11. Regression line includes center of mass
       1.3.12. Projection of data point to regression line
       1.3.13. Sums of squares
       1.3.14. Transformation matrices
       1.3.15. Weighted least squares (1)
       1.3.16. Weighted least squares (2)
       1.3.17. Maximum likelihood estimation (1)
       1.3.18. Maximum likelihood estimation (2)
       1.3.19. Sum of residuals is zero
       1.3.20. Correlation with covariate is zero
       1.3.21. Residual variance in terms of sample variance
       1.3.22. Correlation coefficient in terms of slope estimate
       1.3.23. Coefficient of determination in terms of correlation coefficient

    1.4. Multiple linear regression
       1.4.1. Definition
       1.4.2. Special case of general linear model
       1.4.3. Ordinary least squares (1)
       1.4.4. Ordinary least squares (2)
       1.4.5. Total sum of squares
       1.4.6. Explained sum of squares
       1.4.7. Residual sum of squares
       1.4.8. Total, explained and residual sum of squares
       1.4.9. Estimation matrix
       1.4.10. Projection matrix
       1.4.11. Residual-forming matrix
       1.4.12. Estimation, projection and residual-forming matrix
       1.4.13. Idempotence of projection and residual-forming matrix
       1.4.14. Weighted least squares (1)
       1.4.15. Weighted least squares (2)
       1.4.16. Maximum likelihood estimation
       1.4.17. Maximum log-likelihood
       1.4.18. Deviance function
       1.4.19. Akaike information criterion
       1.4.20. Bayesian information criterion
       1.4.21. Corrected Akaike information criterion

    1.5. Bayesian linear regression
       1.5.1. Conjugate prior distribution
       1.5.2. Posterior distribution
       1.5.3. Log model evidence
       1.5.4. Deviance information criterion
       1.5.5. Posterior probability of alternative hypothesis
       1.5.6. Posterior credibility region excluding null hypothesis

  2. Multivariate normal data

    2.1. General linear model
       2.1.1. Definition
       2.1.2. Ordinary least squares
       2.1.3. Weighted least squares
       2.1.4. Maximum likelihood estimation

    2.2. Transformed general linear model
       2.2.1. Definition
       2.2.2. Derivation of the distribution
       2.2.3. Equivalence of parameter estimates

    2.3. Inverse general linear model
       2.3.1. Definition
       2.3.2. Derivation of the distribution
       2.3.3. Best linear unbiased estimator
       2.3.4. Corresponding forward model
       2.3.5. Derivation of parameters
       2.3.6. Proof of existence

    2.4. Multivariate Bayesian linear regression
       2.4.1. Conjugate prior distribution
       2.4.2. Posterior distribution
       2.4.3. Log model evidence

  3. Poisson data

    3.1. Poisson-distributed data
       3.1.1. Definition
       3.1.2. Maximum likelihood estimation
       3.1.3. Conjugate prior distribution
       3.1.4. Posterior distribution
       3.1.5. Log model evidence

    3.2. Poisson distribution with exposure values
       3.2.1. Definition
       3.2.2. Maximum likelihood estimation
       3.2.3. Conjugate prior distribution
       3.2.4. Posterior distribution
       3.2.5. Log model evidence

  4. Probability data

    4.1. Beta-distributed data
       4.1.1. Definition
       4.1.2. Method of moments

    4.2. Dirichlet-distributed data
       4.2.1. Definition
       4.2.2. Maximum likelihood estimation

  5. Categorical data

    5.1. Binomial observations
       5.1.1. Definition
       5.1.2. Conjugate prior distribution
       5.1.3. Posterior distribution
       5.1.4. Log model evidence

    5.2. Multinomial observations
       5.2.1. Definition
       5.2.2. Conjugate prior distribution
       5.2.3. Posterior distribution
       5.2.4. Log model evidence

    5.3. Logistic regression
       5.3.1. Definition
       5.3.2. Probability and log-odds
       5.3.3. Log-odds and probability


Chapter IV: Model Selection

  1. Goodness-of-fit measures

    1.1. Residual variance
       1.1.1. Definition
       1.1.2. Maximum likelihood estimator is biased
       1.1.3. Construction of unbiased estimator

    1.2. R-squared
       1.2.1. Definition
       1.2.2. Derivation of R² and adjusted R²
       1.2.3. Relationship to maximum log-likelihood

    1.3. Signal-to-noise ratio
       1.3.1. Definition
       1.3.2. Relationship with R²

  2. Classical information criteria

    2.1. Akaike information criterion
       2.1.1. Definition
       2.1.2. Corrected AIC
       2.1.3. Corrected AIC and uncorrected AIC
       2.1.4. Corrected AIC and maximum log-likelihood

    2.2. Bayesian information criterion
       2.2.1. Definition
       2.2.2. Derivation

    2.3. Deviance information criterion
       2.3.1. Definition
       2.3.2. Deviance

  3. Bayesian model selection

    3.1. Log model evidence
       3.1.1. Definition
       3.1.2. Derivation
       3.1.3. Expression using prior and posterior
       3.1.4. Partition into accuracy and complexity
       3.1.5. Uniform-prior log model evidence
       3.1.6. Cross-validated log model evidence
       3.1.7. Empirical Bayesian log model evidence
       3.1.8. Variational Bayesian log model evidence

    3.2. Log family evidence
       3.2.1. Definition
       3.2.2. Derivation
       3.2.3. Calculation from log model evidences

    3.3. Log Bayes factor
       3.3.1. Definition
       3.3.2. Derivation
       3.3.3. Calculation from log model evidences

    3.4. Bayes factor
       3.4.1. Definition
       3.4.2. Transitivity
       3.4.3. Computation using Savage-Dickey Density Ratio
       3.4.4. Computation using Encompassing Prior Method
       3.4.5. Encompassing model

    3.5. Posterior model probability
       3.5.1. Definition
       3.5.2. Derivation
       3.5.3. Calculation from Bayes factors
       3.5.4. Calculation from log Bayes factor
       3.5.5. Calculation from log model evidences

    3.6. Bayesian model averaging
       3.6.1. Definition
       3.6.2. Derivation
       3.6.3. Calculation from log model evidences