The iron ML notebook
  • The iron data science notebook
  • ML & Data Science
    • Frequent Questions
      • Discriminative vs Generative models
      • Supervised vs Unsupervised learning
      • Batch vs Online Learning
      • Instance-based vs Model-based Learning
      • Bias-Variance Tradeoff
      • Probability vs Likelihood
      • Covariance vs Correlation Matrix
      • Precision vs Recall
      • How does a ROC curve work?
      • Ridge vs Lasso
      • Anomaly detection methods
      • How to deal with imbalanced datasets?
      • What is "Statistically Significant"?
      • Recommendation systems methods
    • Statistics
      • The basics
      • Distributions
      • Sampling
      • IQR
      • Z-score
      • F-statistic
      • Outliers
      • The bayesian basis
      • Statistic vs Parameter
      • Markov Monte Carlo Chain
    • ML Techniques
      • Pre-process
        • PCA
      • Loss functions
      • Regularization
      • Optimization
      • Metrics
        • Distance measures
      • Activation Functions
      • Selection functions
      • Feature Normalization
      • Cross-validation
      • Hyperparameter tuning
      • Ensemble methods
      • Hard negative mining
      • ML Serving
        • Quantization
        • Kernel Auto-Tuning
        • NVIDIA TensorRT vs ONNX Runtime
    • Machine Learning Algorithms
      • Supervised Learning
        • Support Vector Machines
        • Adaptative boosting
        • Gradient boosting
        • Regression algorithms
          • Linear Regression
          • Lasso regression
          • Multi Layer Perceptron
        • Classification algorithms
          • Perceptron
          • Logistic Regression
          • Multilayer Perceptron
          • kNN
          • Naive Bayes
          • Decision Trees
          • Random Forest
          • Gradient Boosted Trees
      • Unsupervised learning
        • Clustering
          • Clustering metrics
          • kMeans
          • Gaussian Mixture Model
          • Hierarchical clustering
          • DBSCAN
      • Cameras
        • Intrinsic and extrinsic parameters
    • Computer Vision
      • Object Detection
        • Two-Stage detectors
          • Traditional Detection Models
          • R-CNN
          • Fast R-CNN
          • Faster R-CNN
        • One-Stage detectors
          • YOLO
          • YOLO v2
          • YOLO v3
          • YOLOX
        • Techniques
          • NMS
          • ROI Pooling
        • Metrics
          • Objectness Score
          • Coco Metrics
          • IoU
      • MOT
        • SORT
        • Deep SORT
  • Related Topics
    • Intro
    • Python
      • Global Interpreter Lock (GIL)
      • Mutability
      • AsyncIO
    • SQL
    • Combinatorics
    • Data Engineering Questions
    • Distributed computation
      • About threads & processes
      • REST vs gRPC
  • Algorithms & data structures
    • Array
      • Online Stock Span
      • Two Sum
      • Best time to by and sell stock
      • Rank word combination
      • Largest subarray with zero sum
    • Binary
      • Sum of Two Integers
    • Tree
      • Maximum Depth of Binary Tree
      • Same Tree
      • Invert/Flip Binary Tree
      • Binary Tree Paths
      • Binary Tree Maximum Path Sum
    • Matrix
      • Set Matrix Zeroes
    • Linked List
      • Reverse Linked List
      • Detect Cycle
      • Merge Two Sorted Lists
      • Merge k Sorted Lists
    • String
      • Longest Substring Without Repeating Characters
      • Longest Repeating Character Replacement
      • Minimum Window Substring
    • Interval
    • Graph
    • Heap
    • Dynamic Programming
      • Fibonacci
      • Grid Traveler
      • Can Sum
      • How Sum
      • Best Sum
      • Can Construct
      • Count Construct
      • All Construct
      • Climbing Stairs
Powered by GitBook
On this page
  • Properties of distributions
  • Mean
  • Median
  • Mode
  • Range
  • Variance
  • Standard deviation
  • Covariance
  • Related questions
  • Mean vs Median
  • Difference between standard deviation of a sample and standard error of the population mean

Was this helpful?

  1. ML & Data Science
  2. Statistics

The basics

PreviousStatisticsNextDistributions

Last updated 5 years ago

Was this helpful?

Credits & Sources:

Properties of distributions

Mean

Found by adding all of the numbers together and dividing by the number of items in the set:

xˉ=1n(∑i=1nxi)=x1+x2+⋯+xnn{\bar {x}}={\frac {1}{n}}\left(\sum _{i=1}^{n}{x_{i}}\right)={\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}}xˉ=n1​(i=1∑n​xi​)=nx1​+x2​+⋯+xn​​
μ=∫−∞∞xf(x) dx\mu = \int _{-\infty }^{\infty }xf(x)\,dxμ=∫−∞∞​xf(x)dx
  • ...

Median

It depends on whether the number of terms in the distribution. Once the values are sorted.

  • If the given number of terms is odd:

    • It's the value in de middle.

    • 1, 3, 3, 6, 7, 8, 9 ⇒\Rightarrow⇒ 6

  • If the given number of terms is even:

    • It's the average of the two terms in the middle.

    • 1, 2, 3, 4, 5, 6, 8, 9 ⇒\Rightarrow⇒(4+5) ÷\div÷ 2 = 4.5

It's the value m such that the probability is at least 0.5 that a randomly chosen point on the function will be less than or equal to m.

P⁡(X≥m)=P⁡(X≤m)=∫−∞mf(x) dx=12\operatorname {P} (X\geq m)=\operatorname {P} (X\leq m)=\int _{-\infty }^{m}f(x)\,dx={\frac {1}{2}}P(X≥m)=P(X≤m)=∫−∞m​f(x)dx=21​

Mode

It's the most repeated value within the distribution. Example:

  • 1, 2, 2, 3, 4, 7, 9 ⇒\Rightarrow⇒ 2

It's quite common to find more than one mode, especially if there aren't many terms. A distribution with two modes is called bimodal. As well, with three it's called trimodal.

It's the maximum value of the function. As with discrete distributions, there may be more than one mode.

Range

It's the difference between the maximum value and the minimum value. Example:

  • 1, 2, 2, 3, 4, 7, 9 ⇒\Rightarrow⇒ 9 - 1 = 8

It's the difference between the two extreme points on the distribution curve. For any value outside the range of a distribution, the value of the function is equal to 0.

Variance

σ2=1N∑i=1N(xi−μ)2\sigma^2 = \frac {1}{N} \sum _{i=1}^{N} (x_{i} - \mu)^2σ2=N1​i=1∑N​(xi​−μ)2

When data is a sample, this formula is used as a "correction" over the original one.

s2=1N−1∑i=1N(xi−xˉ)2s^2 = \frac {1}{N-1}\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}s2=N−11​i=1∑N​(xi​−xˉ)2
import numpy as np

np.random.seed(333)
np.std(np.random.rand(100))

Standard deviation

σ=1N∑i=1N(xi−μ)2\sigma = \sqrt{ \frac {1}{N} \sum _{i=1}^{N} (x_{i} - \mu)^2}σ=N1​i=1∑N​(xi​−μ)2​

When data is a sample, this formula is used as a "correction" over the original one:

s=1N−1∑i=1N(xi−xˉ)2s = {\sqrt {{\frac {1}{N-1}}\sum _{i=1}^{N}(x_{i}-{\bar {x}})^{2}}} s=N−11​i=1∑N​(xi​−xˉ)2​
import numpy as np

np.random.seed(333)
np.var(np.random.rand(100))

Covariance

σXY=∑(x,y)ϵS(x−μX)(y−μY)N\sigma_{XY} = \frac{ \sum_{(x,y) \epsilon S} (x - \mu_X) (y - \mu_Y) }{N}σXY​=N∑(x,y)ϵS​(x−μX​)(y−μY​)​
cov(x,y)=∑i=1N(xi−x‾)(yi−y‾)N−1cov(x, y) = \frac{ \sum_{i=1}^N (x_i - \overline{x}) (y_i - \overline{y}) }{N - 1}cov(x,y)=N−1∑i=1N​(xi​−x)(yi​−y​)​

Related questions

Mean vs Median

  • Median is much less sensitive to outliers.

  • However, almost all analytic calculations on sets of data are more natural in terms of the mean than the median.

  • The difference between the median and the mean is useful to represent how skewed the data is.

  • The real use of the median comes when the data set may contain extreme outliers. Then, describing the distribution in terms of quartiles can be more informative than quoting μ\muμ and σ\sigmaσ.

Difference between standard deviation of a sample and standard error of the population mean

Given the standard error of the population σ\sigmaσ and the size nnn of a sample, the standard error of a sample of this population is expressed as:

σx−=σn\sigma_{x}^{-} = \frac{\sigma}{\sqrt{n}}σx−​=n​σ​
sx−=sns_{x}^{-} = \frac{s}{\sqrt{n}}sx−​=n​s​

It's obtained by integrating the product of the variable with its probability as defined by the distribution. Given the f(x)f(x)f(x) continuous :

It measures how far a set of random numbers are spread out from their .

It's is the squarred root of the . A low standard deviation indicates that the data points tend to be close to the .

It's a measure of the joint variability of two random variables. measures of the extent to which corresponding elements from two sets of ordered data move in the same direction. It measures how much two variables vary together. It’s similar to , but where variance tells you how a single variable varies, covariance tells you how two variables vary together.

For , the mean is not necessarily the same as the median or the mode. For example, mean income is typically skewed upwards by a small number of people with very large incomes, so that the majority have an income lower than the mean. By contrast, the median income is the level at which half the population is below and half is above. The mode income is the most likely income and favors the larger number of people with lower incomes. Median and mode are often more intuitive measures for such skewed data, BUT many skewed distributions are in fact best described by their mean, including the and distributions.

Meanwhile the expresses how disperse is data with respect to the , the standard error measures the of its .

The of a population mean is generated by repeated sampling and recording of the means obtained. This forms a distribution of different means, and this distribution has its own and .

Since the is seldom known, the standard deviation of a sample is used to approximate this statistic:

Geometric mean
Harmonic mean
Covariance vs Correlation Matrix
skewed distributions
Exponential
Poisson
sampling distribution
mean
variance
population standard deviation
mean
variance
mean
variance
standard deviation
sampling distribution
standard deviation
mean
What would be some examples of when the "mean" would be preferred over the "median"?
Mean, Median, Mode: What They Are, How to Find Them (Statistics How To)
Mean (Wikipedia)
Median (Wikipedia)
Range (Wikipedia)
Mode (Wikipedia)
Variance (Wikipedia)
Standard deviation (Wikipedia)
Standard eror (Wikipedia)
Statistical mean, median, mode and range (Margaret Rouse, Tech Target)
Mean vs. Median: When to Use? (Stack Exchange)
Standard deviation and Variance (Math is Fun)
Understanding Principal Components Analysis (Rishav Kumar)
probability distribution
Source: Siyavula
Comparison of mean, median and mode of two log-normal distributions with different skewness.