Covariance vs Correlation Matrix

Sources:

Baffled by Covariance and Correlation??? Get the Math and the Application in Analytics for both the terms... (Srishti Saha)
Understanding the Covariance Matrix (Data Science Plus)

Overview

Covariance $\Rightarrow$ direction of the linear relationship between variables.
Correlation $\Rightarrow$ measure of the strength and direction of a linear relationship.

Correlation values are standardized whereas, covariance values are not.

Covariance Matrix

Focusing on the two-dimensional case, the covariance matrix for two dimensions (or $x$ and $y$ variables) is given by:

C = \begin{pmatrix} \sigma(x,x) & \sigma(x,y) \\ \sigma(y,x) & \sigma(y,y) \end{pmatrix}

import numpy as np
import matplotlib.pyplot as plt

plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (12, 8)

mean = 0
std = 1
num_samples = 500

x = np.random.normal(mean, std, num_samples)
y = np.random.normal(mean, std, num_samples)
X = np.vstack((x, y)).T  # Join both arrays and transpose
# X = np.stack(arrays=[x, y], axis=1) # Equivalent transformation

plt.scatter(X[:, 0], X[:, 1])
plt.title('Generated Data')
plt.axis('equal');

import tensorflow_probability as tfp
import matplotlib.pyplot as plt

tfd = tfp.distributions
data = tfd.MultivariateNormalFullCovariance(
      loc = [0., 5], # Mean for each variable ==> mean(a) = 0, mean(b) = 5
      covariance_matrix = [[1., .7], [.7, 1.]] # Covariance matrix
).sample(1000)

plt.scatter(data[:, 0], data[:, 1], color='blue', alpha=0.4)
plt.axis([-5, 5, 0, 10])
plt.title('Data set')
plt.show();

Correlation Matrix

Unlike covariance, the correlation has an upper and lower cap on a range $[-1, 1]$ .

The correlation coefficient of two variables could be get by dividing the covariance of these variables by the product of the standard deviations of the same values.

\rho_{x,y} = corr(x,y) = \frac{\sigma_{x,y}}{\sigma_{x}^2\sigma_{y}^2}

import pandas as pd

data = np.random.RandomState(seed=0)
correlation = pd.DataFrame(data.rand(10, 10)).corr()

correlation.style.background_gradient(cmap='coolwarm')

PreviousProbability vs Likelihood NextPrecision vs Recall

Last updated 2 months ago

Was this helpful?