Index: The Book of Statistical ProofsStatistical Models ▷ Univariate normal data ▷ Multiple linear regression ▷ Definition

Definition: Let $y$ be an $n \times 1$ vector and let $X$ be an $n \times p$ matrix.

Then, a statement asserting a linear combination of $X$ into $y$

\[\label{eq:mlr-model} y = X\beta + \varepsilon \; ,\]

together with a statement asserting a normal distribution for $\varepsilon$

\[\label{eq:mlr-noise} \varepsilon \sim \mathcal{N}(0, \sigma^2 V)\]

is called a univariate linear regression model or simply, “multiple linear regression”.

  • $y$ is called “measured data”, “dependent variable” or “measurements”;

  • $X$ is called “design matrix”, “set of independent variables” or “predictors”;

  • $V$ is called “covariance matrix” or “covariance structure”;

  • $\beta$ are called “regression coefficients” or “weights”;

  • $\varepsilon$ is called “noise”, “errors” or “error terms”;

  • $\sigma^2$ is called “noise variance” or “error variance”;

  • $n$ is the number of observations;

  • $p$ is the number of predictors.

Alternatively, the linear combination may also be written as

\[\label{eq:mlr-model-sum} y = \sum_{i=1}^{p} \beta_i x_i + \varepsilon\]

or, when the model includes an intercept term, as

\[\label{eq:mlr-model-sum-base} y = \beta_0 + \sum_{i=1}^{p} \beta_i x_i + \varepsilon\]

which is equivalent to adding a constant regressor $x_0 = 1_n$ to the design matrix $X$.

When the covariance structure $V$ is equal to the $n \times n$ identity matrix, this is called multiple linear regression with independent and identically distributed (i.i.d.) observations:

\[\label{eq:mlr-noise-iid} V = I_n \quad \Rightarrow \quad \varepsilon \sim \mathcal{N}(0, \sigma^2 I_n) \quad \Rightarrow \quad \varepsilon_i \overset{\text{i.i.d.}}{\sim} \mathcal{N}(0, \sigma^2) \; .\]

Otherwise, it is called multiple linear regression with correlated observations.

 
Sources:

Metadata: ID: D36 | shortcut: mlr | author: JoramSoch | date: 2020-03-21, 20:09.