Focusing on the two-dimensional case, the covariance matrix for two dimensions (or x and yvariables) is given by:
C=(σ(x,x)σ(y,x)σ(x,y)σ(y,y))
import numpy as npimport matplotlib.pyplot as pltplt.style.use('ggplot')plt.rcParams['figure.figsize']= (12,8)mean =0std =1num_samples =500x = np.random.normal(mean, std, num_samples)y = np.random.normal(mean, std, num_samples)X = np.vstack((x, y)).T # Join both arrays and transpose# X = np.stack(arrays=[x, y], axis=1) # Equivalent transformationplt.scatter(X[:, 0], X[:, 1])plt.title('Generated Data')plt.axis('equal');
import tensorflow_probability as tfpimport matplotlib.pyplot as plttfd = tfp.distributionsdata = tfd.MultivariateNormalFullCovariance( loc = [0., 5], # Mean for each variable ==> mean(a) = 0, mean(b) = 5 covariance_matrix = [[1., .7], [.7, 1.]] # Covariance matrix).sample(1000)plt.scatter(data[:, 0], data[:, 1], color='blue', alpha=0.4)plt.axis([-5, 5, 0, 10])plt.title('Data set')plt.show();
Correlation Matrix
Unlike covariance, the correlation has an upper and lower cap on a range [−1,1].
The correlation coefficient of two variables could be get by dividing the covariance of these variables by the product of the standard deviations of the same values.
ρx,y=corr(x,y)=σx2σy2σx,y
import pandas as pddata = np.random.RandomState(seed=0)correlation = pd.DataFrame(data.rand(10, 10)).corr()correlation.style.background_gradient(cmap='coolwarm')