Covariance vs Correlation Matrix
Overview
Covariance direction of the linear relationship between variables.
Correlation measure of the strength and direction of a linear relationship.
Correlation values are standardized whereas, covariance values are not.
Covariance Matrix
Focusing on the two-dimensional case, the covariance matrix for two dimensions (or and variables) is given by:
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (12, 8)
mean = 0
std = 1
num_samples = 500
x = np.random.normal(mean, std, num_samples)
y = np.random.normal(mean, std, num_samples)
X = np.vstack((x, y)).T # Join both arrays and transpose
# X = np.stack(arrays=[x, y], axis=1) # Equivalent transformation
plt.scatter(X[:, 0], X[:, 1])
plt.title('Generated Data')
plt.axis('equal');
import tensorflow_probability as tfp
import matplotlib.pyplot as plt
tfd = tfp.distributions
data = tfd.MultivariateNormalFullCovariance(
loc = [0., 5], # Mean for each variable ==> mean(a) = 0, mean(b) = 5
covariance_matrix = [[1., .7], [.7, 1.]] # Covariance matrix
).sample(1000)
plt.scatter(data[:, 0], data[:, 1], color='blue', alpha=0.4)
plt.axis([-5, 5, 0, 10])
plt.title('Data set')
plt.show();
Correlation Matrix
Unlike covariance, the correlation has an upper and lower cap on a range .
The correlation coefficient of two variables could be get by dividing the covariance of these variables by the product of the standard deviations of the same values.
import pandas as pd
data = np.random.RandomState(seed=0)
correlation = pd.DataFrame(data.rand(10, 10)).corr()
correlation.style.background_gradient(cmap='coolwarm')
Last updated
Was this helpful?