Proof: Correlation coefficient in terms of standard scores
Index:
The Book of Statistical Proofs ▷
General Theorems ▷
Probability theory ▷
Correlation ▷
Relationship to standard scores
Metadata: ID: P299 | shortcut: corr-z | author: JoramSoch | date: 2021-12-14, 02:31.
Theorem: Let $x = \left\lbrace x_1, \ldots, x_n \right\rbrace$ and $y = \left\lbrace y_1, \ldots, y_n \right\rbrace$ be samples from random variables $X$ and $Y$. Then, the sample correlation coefficient $r_{xy}$ can be expressed in terms of the standard scores of $x$ and $y$:
\[\label{eq:corr-z} r_{xy} = \frac{1}{n-1} \sum_{i=1}^n z_i^{(x)} \cdot z_i^{(y)} = \frac{1}{n-1} \sum_{i=1}^n \left( \frac{x_i-\bar{x}}{s_x} \right) \left( \frac{y_i-\bar{y}}{s_y} \right)\]where $\bar{x}$ and $\bar{y}$ are the sample means and $s_x$ and $s_y$ are the sample variances.
Proof: The sample correlation coefficient is defined as
\[\label{eq:corr-samp} r_{xy} = \frac{\sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y})}{\sqrt{\sum_{i=1}^n (x_i-\bar{x})^2} \sqrt{\sum_{i=1}^n (y_i-\bar{y})^2}} \; .\]Using the sample variances of $x$ and $y$, we can write:
\[\label{eq:corr-z-s1} r_{xy} = \frac{\sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y})}{\sqrt{(n-1) s_x^2} \sqrt{(n-1) s_y^2}} \; .\]Rearranging the terms, we arrive at:
\[\label{eq:corr-z-s2} r_{xy} = \frac{1}{(n-1) \, s_x \, s_y} \sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y}) \; .\]Further simplifying, the result is:
\[\label{eq:corr-z-s3} r_{xy} = \frac{1}{n-1} \sum_{i=1}^n \left( \frac{x_i-\bar{x}}{s_x} \right) \left( \frac{y_i-\bar{y}}{s_y} \right) \; .\]∎
Sources: - Wikipedia (2021): "Peason correlation coefficient"; in: Wikipedia, the free encyclopedia, retrieved on 2021-12-14; URL: https://en.wikipedia.org/wiki/Pearson_correlation_coefficient#For_a_sample.
Metadata: ID: P299 | shortcut: corr-z | author: JoramSoch | date: 2021-12-14, 02:31.