List of Contents
Templates: Proof – Definition
Chapter I: General Theorems
Probability theory
Random experiments
- Random experiment
- Sample space
- Event space
- Probability space
- Measured data
- Statistical sample
- Sample size
- Sample statistic
- Descriptive vs. inferential
Random variables
- Random event
- Random variable
- Random vector
- Random matrix
- Constant
- Discrete vs. continuous
- Univariate vs. multivariate
- independent and identically distributed
Probability
- Probability
- Joint probability
- Marginal probability
- Conditional probability
- Exceedance probability
- Statistical independence
- Conditional independence
- Self-independence
- Probability under independence
- Mutual exclusivity
- Probability under exclusivity
Probability axioms
- Axioms of probability
- Monotonicity of probability (1)
- Monotonicity of probability (2)
- Probability of the empty set (1)
- Probability of the empty set (2)
- Probability of the complement
- Range of probability
- Addition law of probability
- Bonferroni’s inequality
- Boole’s inequality
- Law of total probability
- Probability of exhaustive events (1)
- Probability of exhaustive events (2)
Probability distributions
- Probability distribution
- Joint distribution
- Marginal distribution
- Conditional distribution
- Unimodal vs. multimodal distribution
- Sampling distribution
- Statistical parameter
- Location parameter
- Scale parameter
- Rate parameter
- Shape parameter
- Degrees of freedom
Probability mass function
- Definition
- Probability mass function of sum of independents
- Probability mass function of strictly increasing function
- Probability mass function of strictly decreasing function
- Probability mass function of invertible function
- Joint probability mass function
Probability density function
- Definition
- Probability density function of sum of independents
- Probability density function of strictly increasing function
- Probability density function of strictly decreasing function
- Probability density function of invertible function
- Probability density function of linear transformation
- Probability density function in terms of cumulative distribution function
- Joint probability density function
Cumulative distribution function
- Definition
- Cumulative distribution function of sum of independents
- Cumulative distribution function of strictly increasing function
- Cumulative distribution function of strictly decreasing function
- Cumulative distribution function of discrete random variable
- Cumulative distribution function of continuous random variable
- Exceedance probability based on cumulative distribution function
- Probability integral transform
- Inverse transformation method
- Distributional transformation
- Joint cumulative distribution function
Moment-generating function
- Definition
- Moment-generating function of sum of independents
- Moment-generating function of arbitrary function
- Moment-generating function of linear transformation
- Moment-generating function of linear combination
Probability-generating function
- Definition
- Probability-generating function in terms of expected value
- Probability-generating function of zero
- Probability-generating function of one
Other probability functions
- Quantile function
- Quantile function in terms of cumulative distribution function
- Characteristic function
- Characteristic function of arbitrary function
- Cumulant-generating function
Expected value
- Definition
- Sample mean
- Non-negative random variable
- Non-negativity
- Linearity
- Monotonicity
- (Non-)Multiplicativity
- Law of total expectation
- Law of the unconscious statistician
- Squared expectation of a product
- Jensen’s inequality
- Markov’s inequality
- Chebyshev’s inequality
- Weak law of large numbers
- Expected value minimizes squared error
- Expected value of a random vector
- Expectation of a quadratic form
- Expectation of a bilinear form
- Expected value of a random matrix
- Expectation of a trace
Variance
- Definition
- Sample variance
- Pooled sample variance
- Partition into expected values
- Non-negativity
- Variance of a constant
- Invariance under addition
- Scaling upon multiplication
- Variance of a sum
- Variance of linear combination
- Additivity under independence
- Law of total variance
- Precision
Skewness
- Definition
- Sample skewness
- Partition into expected values
Covariance
- Definition
- Sample covariance
- Partition into expected values
- Symmetry
- Self-covariance
- Covariance under independence
- Law of total covariance
- Relationship to correlation
Covariance matrix
- Definition
- Sample covariance matrix
- Partition into expected values
- Symmetry
- Positive semi-definiteness
- Invariance under addition of vector
- Scaling upon multiplication with matrix
- Covariance matrix and correlation matrix
- Cross-covariance matrix
- Cross-covariance and expected values
- Covariance matrix of a sum
- Precision matrix
- Precision matrix and correlation matrix
Correlation
- Definition
- Sample correlation coefficient
- Range
- Correlation under independence
- Relationship to standard scores
- Conditional correlation
- Partial correlation
- Correlation matrix
- Sample correlation matrix
Measures of central tendency
- Median
- Median minimizes mean absolute error
- Mode
Measures of statistical dispersion
- Standard deviation
- Sample standard deviation
- Pooled sample standard deviation
- Full width at half maximum
Further summary statistics
- Minimum
- Maximum
Further moments
- Moment
- Moment in terms of moment-generating function
- Raw moment
- First raw moment is mean
- Second raw moment and variance
- Central moment
- First central moment is zero
- Second central moment is variance
- Standardized moment
- First standardized moment is zero
- Second standardized moment is one
- Third standardized moment is skewness
Information theory
Shannon entropy
- Definition
- Non-negativity
- Concavity
- Conditional entropy
- Joint entropy
- Cross-entropy
- Convexity of cross-entropy
- Gibbs’ inequality
- Log sum inequality
Differential entropy
- Definition
- Negativity
- Invariance under addition
- Addition upon multiplication
- Addition upon matrix multiplication
- Non-invariance and transformation
- Conditional differential entropy
- Joint differential entropy
- Differential cross-entropy
Discrete mutual information
- Definition
- Relation to marginal and conditional entropy
- Relation to marginal and joint entropy
- Relation to joint and conditional entropy
Continuous mutual information
- Definition
- Relation to marginal and conditional differential entropy
- Relation to marginal and joint differential entropy
- Relation to joint and conditional differential entropy
Kullback-Leibler divergence
- Definition
- Non-negativity (1)
- Non-negativity (2)
- Non-negativity (3)
- Non-symmetry
- Convexity
- Additivity for independent distributions
- Invariance under parameter transformation
- Relation to discrete entropy
- Relation to differential entropy
Estimation theory
Basic concepts of estimation
- Estimator
- Biased vs. unbiased
Point estimates
- Mean squared error
- Partition of the mean squared error into bias and variance
Interval estimates
- Confidence interval
- Construction of confidence intervals using Wilks’ theorem
Frequentist statistics
Likelihood theory
- Likelihood function
- Log-likelihood function
- Maximum likelihood estimation
- Maximum log-likelihood
- MLE can be biased
- Likelihood ratio
- Log-likelihood ratio
- Method of moments
Statistical hypotheses
- Statistical hypothesis
- Simple vs. composite
- Point/exact vs. set/inexact
- One-tailed vs. two-tailed
Hypothesis testing
- Statistical test
- Null hypothesis
- Alternative hypothesis
- One-tailed vs. two-tailed
- Test statistic
- Size of a test
- Power of a test
- Significance level
- Critical value
- p-value
- Distribution of p-value under null hypothesis
- Minimum detectable effect
- Minimum required sample size
Bayesian statistics
Probabilistic modeling
- Generative model
- Likelihood function
- Prior distribution
- Prior predictive distribution
- Prior predictive distribution is marginal of joint likelihood
- Full probability model
- Joint likelihood
- Joint likelihood is product of likelihood and prior
- Posterior distribution
- Posterior density is proportional to joint likelihood
- Combined posterior distribution from independent data
- Posterior predictive distribution
- Posterior predictive distribution is marginal of joint likelihood
- Maximum-a-posteriori estimation
- Marginal likelihood
- Marginal likelihood is integral of joint likelihood
Prior distributions
- Flat vs. hard vs. soft
- Uniform vs. non-uniform
- Informative vs. non-informative
- Empirical vs. non-empirical
- Conjugate vs. non-conjugate
- Maximum entropy priors
- Empirical Bayes priors
- Reference priors
Bayesian inference
- Odds ratios
- Bayes’ theorem
- Bayes’ rule
- Empirical Bayes
- Variational Bayes
- Decomposition of the free energy
- Free energy is lower bound on log model evidence
Machine learning
Scoring rules
- Scoring rule
- Proper scoring rule
- Strictly proper scoring rule
- Log probability scoring rule
- Log probability is strictly proper scoring rule
- Brier scoring rule
- Brier scoring rule is strictly proper scoring rule
Chapter II: Probability Distributions
Univariate discrete distributions
Discrete uniform distribution
- Definition
- Probability mass function
- Cumulative distribution function
- Quantile function
- Shannon entropy
- Kullback-Leibler divergence
- Maximum entropy distribution
Bernoulli distribution
- Definition
- Special case of categorical distribution
- Probability mass function
- Mean
- Variance
- Range of variance
- Shannon entropy
- Kullback-Leibler divergence
Binomial distribution
- Definition
- Special case of multinomial distribution
- Probability mass function
- Cumulative distribution function
- Probability-generating function
- Mean
- Variance
- Range of variance
- Shannon entropy
- Kullback-Leibler divergence
- Conditional binomial
Beta-binomial distribution
- Definition
- Probability mass function
- Probability mass function in terms of gamma function
- Cumulative distribution function
Poisson distribution
- Definition
- Probability mass function
- Mean
- Variance
- Shannon entropy
Multivariate discrete distributions
Categorical distribution
- Definition
- Probability mass function
- Mean
- Covariance
- Shannon entropy
Multinomial distribution
- Definition
- Probability mass function
- Cumulative distribution function
- Mean
- Covariance
- Shannon entropy
- Marginal distributions
Univariate continuous distributions
Continuous uniform distribution
- Definition
- Standard uniform distribution
- Probability density function
- Cumulative distribution function
- Quantile function
- Mean
- Median
- Mode
- Variance
- Differential entropy
- Kullback-Leibler divergence
- Maximum entropy distribution
Normal distribution
- Definition
- Special case of multivariate normal distribution
- Standard normal distribution
- Relationship to standard normal distribution (1)
- Relationship to standard normal distribution (2)
- Relationship to standard normal distribution (3)
- Relationship to chi-squared distribution
- Relationship to t-distribution
- Gaussian integral
- Probability density function
- Moment-generating function
- Cumulative distribution function
- Cumulative distribution function without error function
- Probability of being within standard deviations from mean
- Quantile function
- Mean
- Median
- Mode
- Variance
- Full width at half maximum
- Extreme points
- Inflection points
- Differential entropy
- Kullback-Leibler divergence
- Maximum entropy distribution
- Linear combination of independent normals
- Normal and uncorrelated does not imply independent
- Marginally normal does not imply jointly normal
t-distribution
- Definition
- Special case of multivariate t-distribution
- Non-standardized t-distribution
- Relationship to non-standardized t-distribution
- Probability density function
Gamma distribution
- Definition
- Special case of Wishart distribution
- Standard gamma distribution
- Relationship to standard gamma distribution (1)
- Relationship to standard gamma distribution (2)
- Scaling of a gamma random variable
- Probability density function
- Moment-generating function
- Cumulative distribution function
- Quantile function
- Mean
- Median
- Mode
- Variance
- Logarithmic expectation
- Expectation of x ln x
- Differential entropy
- Kullback-Leibler divergence
Exponential distribution
- Definition
- Special case of gamma distribution
- Probability density function
- Moment-generating function
- Cumulative distribution function
- Quantile function
- Mean
- Median
- Mode
- Variance
- Skewness
Log-normal distribution
- Definition
- Probability density function
- Cumulative distribution function
- Quantile function
- Mean
- Median
- Mode
- Variance
- Product of independent log-normals
- Geometric mean of independent log-normals
Chi-squared distribution
- Definition
- Special case of gamma distribution (1)
- Special case of gamma distribution (2)
- Probability density function
- Raw moments
F-distribution
- Definition
- Probability density function
Beta distribution
- Definition
- Special case of Dirichlet distribution
- Relationship to chi-squared distribution
- Relationship to F-distribution
- Probability density function
- Moment-generating function
- Cumulative distribution function
- Mean
- Median
- Mode
- Variance
Wald distribution
- Definition
- Probability density function
- Moment-generating function
- Mean
- Variance
- Skewness
- Method of moments
ex-Gaussian distribution
- Definition
- Probability density function
- Moment-generating function
- Mean
- Variance
- Skewness
- Method of moments
Multivariate continuous distributions
Multivariate normal distribution
- Definition
- Special case of matrix-normal distribution
- Relationship to chi-squared distribution
- Probability density function
- Moment-generating function
- Mean
- Mode
- Covariance
- Expectation of a quadratic form
- Conditional correlation
- Partial correlation
- Differential entropy
- Mutual information
- Kullback-Leibler divergence
- Maximum entropy distribution
- Linear transformation
- Marginal distributions
- Conditional distributions
- Conditions for independence
- Independence of products
- Drawing samples
Bivariate normal distribution
- Definition
- Probability density function
- Probability density function in terms of correlation coefficient
- Construction from standard normal distributions
- Mutual information
- Linear combination
Multivariate t-distribution
- Definition
- Probability density function
- Relationship to F-distribution
- Marginal distributions
Normal-gamma distribution
- Definition
- Special case of normal-Wishart distribution
- Probability density function
- Mean
- Covariance
- Differential entropy
- Kullback-Leibler divergence
- Marginal distributions
- Conditional distributions
- Drawing samples
Dirichlet distribution
- Definition
- Probability density function
- Kullback-Leibler divergence
- Exceedance probabilities
Matrix-variate continuous distributions
Matrix-normal distribution
- Definition
- Equivalence to multivariate normal distribution
- Probability density function
- Mean
- Covariance
- Cross-covariances
- Expectation of quadratic forms
- Second-order expectations
- Differential entropy
- Kullback-Leibler divergence
- Transposition
- Linear transformation
- Marginal distributions
- Redundancy of parameters (1)
- Redundancy of parameters (2)
- Drawing samples
Wishart distribution
- Definition
- Kullback-Leibler divergence
Normal-Wishart distribution
- Definition
- Probability density function
- Mean
Chapter III: Statistical Models
Univariate normal data
Univariate Gaussian
- Definition
- Maximum likelihood estimation
- One-sample t-test
- Two-sample t-test
- Paired t-test
- F-test for equality of variances
- Power analysis for one-sample t-test
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
- Accuracy and complexity
- Cross-validated log model evidence
- Cross-validated log Bayes factor
Univariate Gaussian with known variance
- Definition
- Maximum likelihood estimation
- One-sample z-test
- Two-sample z-test
- Paired z-test
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
- Accuracy and complexity
- Log Bayes factor
- Expectation of log Bayes factor
- Cross-validated log model evidence
- Cross-validated log Bayes factor
- Expectation of cross-validated log Bayes factor
Analysis of variance
- One-way ANOVA
- Treatment sum of squares
- Ordinary least squares for one-way ANOVA
- Sums of squares in one-way ANOVA
- F-test for main effect in one-way ANOVA
- F-statistic in terms of OLS estimates
- Reparametrization of one-way ANOVA
- Two-way ANOVA
- Interaction sum of squares
- Ordinary least squares for two-way ANOVA
- Sums of squares in two-way ANOVA
- Cochran’s theorem for two-way ANOVA
- F-test for main effect in two-way ANOVA
- F-test for interaction in two-way ANOVA
- F-test for grand mean in two-way ANOVA
- F-statistics in terms of OLS estimates
Simple linear regression
- Definition
- Special case of multiple linear regression
- Ordinary least squares (1)
- Ordinary least squares (2)
- Expectation of estimates
- Variance of estimates
- Distribution of estimates
- Correlation of estimates
- Effects of mean-centering
- Regression line
- Regression line includes center of mass
- Projection of data point to regression line
- Sums of squares
- Partition of sums of squares
- Transformation matrices
- Weighted least squares (1)
- Weighted least squares (2)
- Maximum likelihood estimation (1)
- Maximum likelihood estimation (2)
- t-test for intercept parameter
- t-test for slope parameter
- F-test for model comparison
- Sum of residuals is zero
- Correlation with covariate is zero
- Residual variance in terms of sample variance
- Correlation coefficient in terms of slope estimate
- Coefficient of determination in terms of correlation coefficient
Multiple linear regression
- Definition
- Special case of general linear model
- Ordinary least squares (1)
- Ordinary least squares (2)
- Ordinary least squares (3)
- Ordinary least squares for two regressors
- Total sum of squares
- Explained sum of squares
- Residual sum of squares
- Total, explained and residual sum of squares
- Estimation matrix
- Projection matrix
- Residual-forming matrix
- Estimation, projection and residual-forming matrix
- Symmetry of projection and residual-forming matrix
- Idempotence of projection and residual-forming matrix
- Independence of estimated parameters and residuals
- Distribution of OLS estimates, signal and residuals
- Distribution of WLS estimates, signal and residuals
- Distribution of residual sum of squares
- Weighted least squares (1)
- Weighted least squares (2)
- Maximum likelihood estimation
- Maximum log-likelihood
- Log-likelihood ratio
- t-contrast
- F-contrast
- Contrast-based t-test
- Contrast-based F-test
- t-test for single regressor
- F-test for multiple regressors
- Deviance function
- Akaike information criterion
- Bayesian information criterion
- Corrected Akaike information criterion
Bayesian linear regression
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
- Accuracy and complexity
- Deviance information criterion
- Maximum-a-posteriori estimation
- Expression of posterior parameters using error terms
- Posterior probability of alternative hypothesis
- Posterior credibility region excluding null hypothesis
- Combined posterior distribution from independent data sets
- Log Bayes factor for comparison of two regression models
Bayesian linear regression with known covariance
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
- Accuracy and complexity
Multivariate normal data
Multivariate Gaussian
- Bivariate normally distributed data
- Multivariate normally distributed data
- Maximum likelihood estimation (p = 2)
- Maximum likelihood estimation (p > 2)
General linear model
- Definition
- Ordinary least squares
- Weighted least squares
- Maximum likelihood estimation
- Maximum log-likelihood
- Log-likelihood ratio
- Mutual information
- Log-likelihood ratio and estimated mutual information
Transformed general linear model
- Definition
- Derivation of the distribution
- Equivalence of parameter estimates
Inverse general linear model
- Definition
- Derivation of the distribution
- Best linear unbiased estimator
- Equivalence of log-likelihood ratios
- Corresponding forward model
- Derivation of parameters
- Proof of existence
Multivariate Bayesian linear regression
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
Count data
Binomial observations
- Definition
- Binomial test
- Maximum likelihood estimation
- Maximum log-likelihood
- Maximum-a-posteriori estimation
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
- Log Bayes factor
- Posterior probability
- Cross-validated log model evidence
- Cross-validated log Bayes factor
Multinomial observations
- Definition
- Multinomial test
- Maximum likelihood estimation
- Maximum log-likelihood
- Maximum-a-posteriori estimation
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
- Log Bayes factor
- Posterior probability
- Cross-validated log model evidence
- Cross-validated log Bayes factor
Poisson-distributed data
- Definition
- Maximum likelihood estimation
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
Poisson distribution with exposure values
- Definition
- Maximum likelihood estimation
- Conjugate prior distribution
- Posterior distribution
- Log model evidence
Frequency data
Beta-distributed data
- Definition
- Method of moments
Dirichlet-distributed data
- Definition
- Maximum likelihood estimation
Beta-binomial data
- Definition
- Method of moments
Categorical data
Logistic regression
- Definition
- Probability and log-odds
- Log-odds and probability
Chapter IV: Model Selection
Goodness-of-fit measures
Residual variance
- Definition
- Maximum likelihood estimator is biased (p = 1)
- Maximum likelihood estimator is biased (p > 1)
- Construction of unbiased estimator (p = 1)
- Construction of unbiased estimator (p > 1)
R-squared
- Definition
- Derivation of R² and adjusted R²
- Relationship to residual variance
- Relationship to maximum log-likelihood
- Statistical significance test for R²
- Distribution under null hypothesis
- Mean/mode/median under null hypothesis
- Variance under null hypothesis
F-statistic
- Definition
- Relationship to coefficient of determination
- Relationship to maximum log-likelihood
Signal-to-noise ratio
- Definition
- Relationship to coefficient of determination
- Relationship to maximum log-likelihood
Classical information criteria
Akaike information criterion
- Definition
- Corrected AIC
- Corrected AIC and uncorrected AIC
- Corrected AIC and maximum log-likelihood
Bayesian information criterion
- Definition
- Derivation
Deviance information criterion
- Definition
- Deviance
Bayesian model selection
Model evidence
- Definition
- Derivation
- Log model evidence
- Derivation of the log model evidence
- Expression using prior and posterior
- Partition into accuracy and complexity
- Subtraction of mean from LMEs
- Uniform-prior log model evidence
- Cross-validated log model evidence
- Empirical Bayesian log model evidence
- Variational Bayesian log model evidence
Family evidence
- Definition
- Derivation
- Log family evidence
- Derivation of the log family evidence
- Calculation from log model evidences
- Approximation of log family evidences
Bayes factor
- Definition
- Transitivity
- Computation using Savage-Dickey density ratio
- Computation using encompassing prior method
- Encompassing model
- Log Bayes factor
- Derivation of the log Bayes factor
- Calculation from log model evidences
Posterior model probability
- Definition
- Derivation
- Calculation from Bayes factors
- Calculation from log Bayes factor
- Calculation from log model evidences
Bayesian model averaging
- Definition
- Derivation
- Calculation from log model evidences