Index: The Book of Statistical ProofsGeneral TheoremsProbability theoryCorrelation ▷ Invariance of the sample correlation

Theorem: Let $x = \left\lbrace x_1, \ldots, x_n \right\rbrace$ and $y = \left\lbrace y_1, \ldots, y_n \right\rbrace$ be samples from random variables $X$ and $Y$. Further, let $\tilde{x} = \left\lbrace \tilde{x}_1, \ldots, \tilde{x}_n \right\rbrace$ and $\tilde{y} = \left\lbrace \tilde{y}_1, \ldots, \tilde{y}_n \right\rbrace$ be linearly versions of this sample with

\[\label{eq:samp-lin} \begin{split} \tilde{x}_i &= a x_i + b \\ \tilde{y}_i &= c y_i + d \; . \end{split}\]

Then, the absolute sample correlation of $\tilde{x}$ and $\tilde{y}$ is equal to the absolute sample correlation of $x$ and $y$:

\[\label{eq:corr-samp-inv} |r_{\tilde{x}\tilde{y}}| = |r_{xy}| \; .\]

Proof: The sample correlation is defined in terms of the sample means $\bar{x}$ and $\bar{y}$ as

\[\label{eq:corr-samp} r_{xy} = \frac{\sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y})}{\sqrt{\sum_{i=1}^n (x_i-\bar{x})^2} \sqrt{\sum_{i=1}^n (y_i-\bar{y})^2}} \; .\]

The sample mean behaves linearly under linear transformation:

\[\label{eq:mean-samp-lin} \begin{split} \bar{\tilde{x}} &= a \bar{x} + b \\ \bar{\tilde{y}} &= c \bar{y} + d \; . \end{split}\]

Taking this together, we obtain:

\[\label{eq:corr-samp-inv-s1} \begin{split} r_{\tilde{x}\tilde{y}} &\overset{\eqref{eq:corr-samp}}{=} \frac{\sum_{i=1}^n (\tilde{x}_i - \bar{\tilde{x}}) (\tilde{y}_i - \bar{\tilde{y}})}{\sqrt{\sum_{i=1}^n (\tilde{x}_i - \bar{\tilde{x}})^2} \sqrt{\sum_{i=1}^n (\tilde{y}_i - \bar{\tilde{y}})^2}} \\ &\overset{\eqref{eq:samp-lin}}{=} \frac{\sum_{i=1}^n \left( (a x_i + b) - \bar{\tilde{x}} \right) \left( (c y_i + d) - \bar{\tilde{y}} \right)}{\sqrt{\sum_{i=1}^n \left( (a x_i + b) - \bar{\tilde{x}} \right)^2} \sqrt{\sum_{i=1}^n \left( (c y_i + d) - \bar{\tilde{y}} \right)^2}} \\ &\overset{\eqref{eq:mean-samp-lin}}{=} \frac{\sum_{i=1}^n \left( (a x_i + b) - (a \bar{x} + b) \right) \left( (c y_i + d) - (c \bar{y} + d) \right)}{\sqrt{\sum_{i=1}^n \left( (a x_i + b) - (a \bar{x} + b) \right)^2} \sqrt{\sum_{i=1}^n \left( (c y_i + d) - (c \bar{y} + d) \right)^2}} \\ &= \frac{\sum_{i=1}^n \left( a x_i - a \bar{x} \right) \left( c y_i - c \bar{y} \right)}{\sqrt{\sum_{i=1}^n \left( a x_i - a \bar{x} \right)^2} \sqrt{\sum_{i=1}^n \left( c y_i - c \bar{y} \right)^2}} \\ &= \frac{\sum_{i=1}^n a (x_i-\bar{x}) c (y_i-\bar{y})}{\sqrt{\sum_{i=1}^n a^2 (x_i-\bar{x})^2} \sqrt{\sum_{i=1}^n c^2 (y_i-\bar{y})^2}} \\ &= \frac{a c}{\sqrt{a^2} \sqrt{c^2}} \frac{\sum_{i=1}^n (x_i-\bar{x}) (y_i-\bar{y})}{\sqrt{\sum_{i=1}^n (x_i-\bar{x})^2} \sqrt{\sum_{i=1}^n (y_i-\bar{y})^2}} \\ &\overset{\eqref{eq:corr-samp}}{=} \frac{a c}{|a| |c|} r_{xy} \; . \end{split}\]

Since $(a c)/(\lvert a \rvert \lvert c \rvert)$ is either $+1$ oder $-1$, we have

\[\label{eq:corr-samp-inv-s2} |r_{\tilde{x}\tilde{y}}| = |r_{xy}| \; .\]
Sources:

Metadata: ID: P542 | shortcut: corrsamp-inv | author: JoramSoch | date: 2026-05-29, 12:25.