Focusing on the two-dimensional case, the covariance matrix for two dimensions (or x and yvariables) is given by:
C=(σ(x,x)σ(y,x)σ(x,y)σ(y,y))
import numpy as np
import matplotlib.pyplot as plt
plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (12, 8)
mean = 0
std = 1
num_samples = 500
x = np.random.normal(mean, std, num_samples)
y = np.random.normal(mean, std, num_samples)
X = np.vstack((x, y)).T # Join both arrays and transpose
# X = np.stack(arrays=[x, y], axis=1) # Equivalent transformation
plt.scatter(X[:, 0], X[:, 1])
plt.title('Generated Data')
plt.axis('equal');
import tensorflow_probability as tfp
import matplotlib.pyplot as plt
tfd = tfp.distributions
data = tfd.MultivariateNormalFullCovariance(
loc = [0., 5], # Mean for each variable ==> mean(a) = 0, mean(b) = 5
covariance_matrix = [[1., .7], [.7, 1.]] # Covariance matrix
).sample(1000)
plt.scatter(data[:, 0], data[:, 1], color='blue', alpha=0.4)
plt.axis([-5, 5, 0, 10])
plt.title('Data set')
plt.show();
Correlation Matrix
The correlation coefficient of two variables could be get by dividing the covariance of these variables by the product of the standard deviations of the same values.
import pandas as pd
data = np.random.RandomState(seed=0)
correlation = pd.DataFrame(data.rand(10, 10)).corr()
correlation.style.background_gradient(cmap='coolwarm')
Unlike covariance, the correlation has an upper and lower cap on a range [−1,1].