Proof: Effects of mean-centering on parameter estimates for simple linear regression
Theorem: In simple linear regression, when the dependent variable $y$ and/or the independent variable $x$ are mean-centered, the ordinary least squares estimate for the intercept changes, but that of the slope does not.
Proof:
1) Under unaltered $y$ and $x$, ordinary least squares estimates for simple linear regression are
\[\label{eq:slr-ols} \begin{split} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \\ \hat{\beta}_1 &= \frac{\sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} = \frac{s_{xy}}{s_x^2} \end{split}\]with sample means $\bar{x}$ and $\bar{y}$, sample variance $s_x^2$ and sample covariance $s_{xy}$, such that $\beta_0$ estimates “the mean $y$ at $x = 0$”.
2) Let $\tilde{x}$ be the mean-centered covariate vector:
Under this condition, the parameter estimates become
\[\label{eq:slr-ols-meancent-x} \begin{split} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{\tilde{x}} \\ &= \bar{y} \\ \hat{\beta}_1 &= \frac{\sum_{i=1}^n (\tilde{x}_i - \bar{\tilde{x}}) (y_i - \bar{y})}{\sum_{i=1}^n (\tilde{x}_i - \bar{\tilde{x}})^2} \\ &= \frac{\sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} = \frac{s_{xy}}{s_x^2} \end{split}\]and we can see that $\hat{\beta}_1(\tilde{x},y) = \hat{\beta}_1(x,y)$, but $\hat{\beta}_0(\tilde{x},y) \neq \hat{\beta}_0(x,y)$, specifically $\beta_0$ now estimates “the mean $y$ at the mean $x$”.
3) Let $\tilde{y}$ be the mean-centered data vector:
Under this condition, the parameter estimates become
\[\label{eq:slr-ols-meancent-y} \begin{split} \hat{\beta}_0 &= \bar{\tilde{y}} - \hat{\beta}_1 \bar{x} \\ &= - \hat{\beta}_1 \bar{x} \\ \hat{\beta}_1 &= \frac{\sum_{i=1}^n (x_i - \bar{x}) (\tilde{y}_i - \bar{\tilde{y}})}{\sum_{i=1}^n (x_i - \bar{x})^2} \\ &= \frac{\sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} = \frac{s_{xy}}{s_x^2} \end{split}\]and we can see that $\hat{\beta}_1(x,\tilde{y}) = \hat{\beta}_1(x,y)$, but $\hat{\beta}_0(x,\tilde{y}) \neq \hat{\beta}_0(x,y)$, specifically $\beta_0$ now estimates “the mean $x$, multiplied with the negative slope”.
4) Finally, consider mean-centering both $x$ and $y$::
Under this condition, the parameter estimates become
\[\label{eq:slr-ols-meancent-xy} \begin{split} \hat{\beta}_0 &= \bar{\tilde{y}} - \hat{\beta}_1 \bar{\tilde{x}} \\ &= 0 \\ \hat{\beta}_1 &= \frac{\sum_{i=1}^n (\tilde{x}_i - \bar{\tilde{x}}) (\tilde{y}_i - \bar{\tilde{y}})}{\sum_{i=1}^n (\tilde{x}_i - \bar{\tilde{x}})^2} \\ &= \frac{\sum_{i=1}^n (x_i - \bar{x}) (y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} = \frac{s_{xy}}{s_x^2} \end{split}\]and we can see that $\hat{\beta}_1(\tilde{x},\tilde{y}) = \hat{\beta}_1(x,y)$, but $\hat{\beta}_0(\tilde{x},\tilde{y}) \neq \hat{\beta}_0(x,y)$, specifically $\beta_0$ is now forced to become zero.
Metadata: ID: P274 | shortcut: slr-meancent | author: JoramSoch | date: 2021-10-27, 12:38.