The iron ML notebook
  • The iron data science notebook
  • ML & Data Science
    • Frequent Questions
      • Discriminative vs Generative models
      • Supervised vs Unsupervised learning
      • Batch vs Online Learning
      • Instance-based vs Model-based Learning
      • Bias-Variance Tradeoff
      • Probability vs Likelihood
      • Covariance vs Correlation Matrix
      • Precision vs Recall
      • How does a ROC curve work?
      • Ridge vs Lasso
      • Anomaly detection methods
      • How to deal with imbalanced datasets?
      • What is "Statistically Significant"?
      • Recommendation systems methods
    • Statistics
      • The basics
      • Distributions
      • Sampling
      • IQR
      • Z-score
      • F-statistic
      • Outliers
      • The bayesian basis
      • Statistic vs Parameter
      • Markov Monte Carlo Chain
    • ML Techniques
      • Pre-process
        • PCA
      • Loss functions
      • Regularization
      • Optimization
      • Metrics
        • Distance measures
      • Activation Functions
      • Selection functions
      • Feature Normalization
      • Cross-validation
      • Hyperparameter tuning
      • Ensemble methods
      • Hard negative mining
      • ML Serving
        • Quantization
        • Kernel Auto-Tuning
        • NVIDIA TensorRT vs ONNX Runtime
    • Machine Learning Algorithms
      • Supervised Learning
        • Support Vector Machines
        • Adaptative boosting
        • Gradient boosting
        • Regression algorithms
          • Linear Regression
          • Lasso regression
          • Multi Layer Perceptron
        • Classification algorithms
          • Perceptron
          • Logistic Regression
          • Multilayer Perceptron
          • kNN
          • Naive Bayes
          • Decision Trees
          • Random Forest
          • Gradient Boosted Trees
      • Unsupervised learning
        • Clustering
          • Clustering metrics
          • kMeans
          • Gaussian Mixture Model
          • Hierarchical clustering
          • DBSCAN
      • Cameras
        • Intrinsic and extrinsic parameters
    • Computer Vision
      • Object Detection
        • Two-Stage detectors
          • Traditional Detection Models
          • R-CNN
          • Fast R-CNN
          • Faster R-CNN
        • One-Stage detectors
          • YOLO
          • YOLO v2
          • YOLO v3
          • YOLOX
        • Techniques
          • NMS
          • ROI Pooling
        • Metrics
          • Objectness Score
          • Coco Metrics
          • IoU
      • MOT
        • SORT
        • Deep SORT
  • Related Topics
    • Intro
    • Python
      • Global Interpreter Lock (GIL)
      • Mutability
      • AsyncIO
    • SQL
    • Combinatorics
    • Data Engineering Questions
    • Distributed computation
      • About threads & processes
      • REST vs gRPC
  • Algorithms & data structures
    • Array
      • Online Stock Span
      • Two Sum
      • Best time to by and sell stock
      • Rank word combination
      • Largest subarray with zero sum
    • Binary
      • Sum of Two Integers
    • Tree
      • Maximum Depth of Binary Tree
      • Same Tree
      • Invert/Flip Binary Tree
      • Binary Tree Paths
      • Binary Tree Maximum Path Sum
    • Matrix
      • Set Matrix Zeroes
    • Linked List
      • Reverse Linked List
      • Detect Cycle
      • Merge Two Sorted Lists
      • Merge k Sorted Lists
    • String
      • Longest Substring Without Repeating Characters
      • Longest Repeating Character Replacement
      • Minimum Window Substring
    • Interval
    • Graph
    • Heap
    • Dynamic Programming
      • Fibonacci
      • Grid Traveler
      • Can Sum
      • How Sum
      • Best Sum
      • Can Construct
      • Count Construct
      • All Construct
      • Climbing Stairs
Powered by GitBook
On this page
  • Overview
  • Covariance Matrix
  • Correlation Matrix

Was this helpful?

  1. ML & Data Science
  2. Frequent Questions

Covariance vs Correlation Matrix

PreviousProbability vs LikelihoodNextPrecision vs Recall

Last updated 3 years ago

Was this helpful?

Sources:

Overview

  • Covariance ⇒\Rightarrow⇒ direction of the linear relationship between variables.

  • Correlation ⇒\Rightarrow⇒ measure of the strength and direction of a linear relationship.

Correlation values are standardized whereas, covariance values are not.

Covariance Matrix

Check .

Focusing on the two-dimensional case, the covariance matrix for two dimensions (or xxx and yyyvariables) is given by:

C=(σ(x,x)σ(x,y)σ(y,x)σ(y,y))C = \begin{pmatrix} \sigma(x,x) & \sigma(x,y) \\ \sigma(y,x) & \sigma(y,y) \end{pmatrix}C=(σ(x,x)σ(y,x)​σ(x,y)σ(y,y)​)
import numpy as np
import matplotlib.pyplot as plt

plt.style.use('ggplot')
plt.rcParams['figure.figsize'] = (12, 8)

mean = 0
std = 1
num_samples = 500

x = np.random.normal(mean, std, num_samples)
y = np.random.normal(mean, std, num_samples)
X = np.vstack((x, y)).T  # Join both arrays and transpose
# X = np.stack(arrays=[x, y], axis=1) # Equivalent transformation

plt.scatter(X[:, 0], X[:, 1])
plt.title('Generated Data')
plt.axis('equal');
import tensorflow_probability as tfp
import matplotlib.pyplot as plt

tfd = tfp.distributions
data = tfd.MultivariateNormalFullCovariance(
      loc = [0., 5], # Mean for each variable ==> mean(a) = 0, mean(b) = 5
      covariance_matrix = [[1., .7], [.7, 1.]] # Covariance matrix
).sample(1000)

plt.scatter(data[:, 0], data[:, 1], color='blue', alpha=0.4)
plt.axis([-5, 5, 0, 10])
plt.title('Data set')
plt.show();

Correlation Matrix

Unlike covariance, the correlation has an upper and lower cap on a range [−1,1][-1, 1][−1,1].

The correlation coefficient of two variables could be get by dividing the covariance of these variables by the product of the standard deviations of the same values.

ρx,y=corr(x,y)=σx,yσx2σy2\rho_{x,y} = corr(x,y) = \frac{\sigma_{x,y}}{\sigma_{x}^2\sigma_{y}^2}ρx,y​=corr(x,y)=σx2​σy2​σx,y​​
import pandas as pd

data = np.random.RandomState(seed=0)
correlation = pd.DataFrame(data.rand(10, 10)).corr()

correlation.style.background_gradient(cmap='coolwarm')

Baffled by Covariance and Correlation??? Get the Math and the Application in Analytics for both the terms...
(Srishti Saha)
Understanding the Covariance Matrix (Data Science Plus)
covariance definition