Index: The Book of Statistical ProofsStatistical ModelsUnivariate normal dataSimple linear regression ▷ Coefficient of determination in terms of correlation coefficient

Theorem: Assume a simple linear regression model with independent observations

\[\label{eq:slr} y = \beta_0 + \beta_1 x + \varepsilon, \; \varepsilon_i \sim \mathcal{N}(0, \sigma^2), \; i = 1,\ldots,n\]

and consider estimation using ordinary least squares. Then, the coefficient of determination is equal to the squared correlation coefficient between $x$ and $y$:

\[\label{eq:slr-R2} R^2 = r_{xy}^2 \; .\]

Proof: The ordinary least squares estimates for simple linear regression are

\[\label{eq:slr-ols} \begin{split} \hat{\beta}_0 &= \bar{y} - \hat{\beta}_1 \bar{x} \\ \hat{\beta}_1 &= \frac{s_{xy}}{s_x^2} \; . \end{split}\]

The coefficient of determination $R^2$ is defined as the proportion of the variance explained by the independent variables, relative to the total variance in the data. This can be quantified as the ratio of explained sum of squares to total sum of squares:

\[\label{eq:slr-R2-s1} R^2 = \frac{\mathrm{ESS}}{\mathrm{TSS}} \; .\]

Using the explained and total sum of squares for simple linear regression, we have:

\[\label{eq:slr-R2-s2} \begin{split} R^2 &= \frac{\sum_{i=1}^{n} (\hat{y}_i - \bar{y})^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} \\ &= \frac{\sum_{i=1}^{n} (\hat{\beta}_0 + \hat{\beta}_1 x_i - \bar{y})^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} \; . \end{split}\]

By applying \eqref{eq:slr-ols}, we can further develop the coefficient of determination:

\[\label{eq:slr-R2-s3} \begin{split} R^2 &= \frac{\sum_{i=1}^{n} (\bar{y} - \hat{\beta}_1 \bar{x} + \hat{\beta}_1 x_i - \bar{y})^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} \\ &= \frac{\sum_{i=1}^{n} \left( \hat{\beta}_1 (x_i - \bar{x}) \right)^2}{\sum_{i=1}^{n} (y_i - \bar{y})^2} \\ &= \hat{\beta}_1^2 \, \frac{\frac{1}{n-1} \sum_{i=1}^{n} (x_i - \bar{x})^2}{\frac{1}{n-1} \sum_{i=1}^{n} (y_i - \bar{y})^2} \\ &= \hat{\beta}_1^2 \, \frac{s_x^2}{s_y^2} \\ &= \left( \frac{s_x}{s_y} \, \hat{\beta}_1 \right)^2 \; . \end{split}\]

Using the relationship between correlation coefficient and slope estimate, we conclude:

\[\label{eq:slr-R2-qed} R^2 = \left( \frac{s_x}{s_y} \, \hat{\beta}_1 \right)^2 = r_{xy}^2 \; .\]
Sources:

Metadata: ID: P280 | shortcut: slr-rsq | author: JoramSoch | date: 2021-10-27, 15:31.