The iron ML notebook
  • The iron data science notebook
  • ML & Data Science
    • Frequent Questions
      • Discriminative vs Generative models
      • Supervised vs Unsupervised learning
      • Batch vs Online Learning
      • Instance-based vs Model-based Learning
      • Bias-Variance Tradeoff
      • Probability vs Likelihood
      • Covariance vs Correlation Matrix
      • Precision vs Recall
      • How does a ROC curve work?
      • Ridge vs Lasso
      • Anomaly detection methods
      • How to deal with imbalanced datasets?
      • What is "Statistically Significant"?
      • Recommendation systems methods
    • Statistics
      • The basics
      • Distributions
      • Sampling
      • IQR
      • Z-score
      • F-statistic
      • Outliers
      • The bayesian basis
      • Statistic vs Parameter
      • Markov Monte Carlo Chain
    • ML Techniques
      • Pre-process
        • PCA
      • Loss functions
      • Regularization
      • Optimization
      • Metrics
        • Distance measures
      • Activation Functions
      • Selection functions
      • Feature Normalization
      • Cross-validation
      • Hyperparameter tuning
      • Ensemble methods
      • Hard negative mining
      • ML Serving
        • Quantization
        • Kernel Auto-Tuning
        • NVIDIA TensorRT vs ONNX Runtime
    • Machine Learning Algorithms
      • Supervised Learning
        • Support Vector Machines
        • Adaptative boosting
        • Gradient boosting
        • Regression algorithms
          • Linear Regression
          • Lasso regression
          • Multi Layer Perceptron
        • Classification algorithms
          • Perceptron
          • Logistic Regression
          • Multilayer Perceptron
          • kNN
          • Naive Bayes
          • Decision Trees
          • Random Forest
          • Gradient Boosted Trees
      • Unsupervised learning
        • Clustering
          • Clustering metrics
          • kMeans
          • Gaussian Mixture Model
          • Hierarchical clustering
          • DBSCAN
      • Cameras
        • Intrinsic and extrinsic parameters
    • Computer Vision
      • Object Detection
        • Two-Stage detectors
          • Traditional Detection Models
          • R-CNN
          • Fast R-CNN
          • Faster R-CNN
        • One-Stage detectors
          • YOLO
          • YOLO v2
          • YOLO v3
          • YOLOX
        • Techniques
          • NMS
          • ROI Pooling
        • Metrics
          • Objectness Score
          • Coco Metrics
          • IoU
      • MOT
        • SORT
        • Deep SORT
  • Related Topics
    • Intro
    • Python
      • Global Interpreter Lock (GIL)
      • Mutability
      • AsyncIO
    • SQL
    • Combinatorics
    • Data Engineering Questions
    • Distributed computation
      • About threads & processes
      • REST vs gRPC
  • Algorithms & data structures
    • Array
      • Online Stock Span
      • Two Sum
      • Best time to by and sell stock
      • Rank word combination
      • Largest subarray with zero sum
    • Binary
      • Sum of Two Integers
    • Tree
      • Maximum Depth of Binary Tree
      • Same Tree
      • Invert/Flip Binary Tree
      • Binary Tree Paths
      • Binary Tree Maximum Path Sum
    • Matrix
      • Set Matrix Zeroes
    • Linked List
      • Reverse Linked List
      • Detect Cycle
      • Merge Two Sorted Lists
      • Merge k Sorted Lists
    • String
      • Longest Substring Without Repeating Characters
      • Longest Repeating Character Replacement
      • Minimum Window Substring
    • Interval
    • Graph
    • Heap
    • Dynamic Programming
      • Fibonacci
      • Grid Traveler
      • Can Sum
      • How Sum
      • Best Sum
      • Can Construct
      • Count Construct
      • All Construct
      • Climbing Stairs
Powered by GitBook
On this page
  • Continuous distributions
  • Uniform
  • Normal
  • Lognormal
  • Bivariate Normal
  • Exponential
  • Discrete distributions
  • Binomial
  • Bernoulli
  • Poisson
  • About skewed distributions
  • About kurtosis

Was this helpful?

  1. ML & Data Science
  2. Statistics

Distributions

PreviousThe basicsNextSampling

Last updated 5 years ago

Was this helpful?

Sources:

Continuous distributions

Uniform

A uniform distribution, also called a rectangular distribution, is a probability distribution that has constant probability.

Normal

  • The curve is symmetric at the center (i.e. around the mean, μ).

  • Exactly half of the values are to the left of center and exactly half the values are to the right.

  • The total area under the curve is 1.

How to find out if a sample has a normal distribution?

  • Plotting data using an histogram may give an intuitive insight.

  • Measuring the square error with respect to a normal with same mean and variance.

When a normal distribution is said to be skewed?

Lognormal

Bivariate Normal

A bivariate normal distribution is made up of two independent random variables. The two variables in a bivariate normal are both are normally distributed, and they have a normal distribution when both are added together.

Exponential

The exponential distribution (also called the negative exponential distribution) is a probability distribution that describes time between events in a Poisson process.

The most common form of its probability distribution function is:

Discrete distributions

Binomial

The expected value of a binomial parametrized by N and p is equal to:

Bernoulli

A Bernouilli distribution is a discrete probability distribution for a Bernouilli trial — a random experiment that has only two outcomes (usually called a “Success” or a “Failure”). For example, the probability of getting a heads (a “success”) while flipping a coin is 0.5. The probability of “failure” is 1 – P (1 minus the probability of success, which also equals 0.5 for a coin toss). It is a special case of the binomial distribution for n = 1. In other words, it is a binomial distribution with a single trial (e.g. a single coin toss).

Poisson

A Poisson distribution is a tool that helps to predict the probability of certain events from happening when you know how often the event has occurred. It gives us the probability of a given number of events happening in a fixed interval of time. The Poisson distribution is given by:

About skewed distributions

Skewness is the degree of distortion from the normal distribution or the symmetrical bell curve. It measures the lack of symmetry in data distribution. It differentiates extreme values in one versus the other tail.

And positive skew is when the long tail is on the positive side of the peak, and some people say it is skewed to the right.

import numpy as np

np.random.seed(333)
np.stats.skew(np.random.rand(100))

About kurtosis

Kurtosis is all about the tails of the distribution. And it's used to describe the extreme values in one versus the other tail. It is actually the measure of outliers present in the distribution.

High kurtosis in a data set is an indicator that data has heavy tails or outliers. Meanwhile, low kurtosis in a data set is an indicator that data has light tails or lack of outliers.

  • Mesokurtic: the kurtosis statistic is similar to the on of a normal distribution. It's is usually said that a normal distribution has a kurtosis= 3.

  • Leptokurtic: longer distribution with fatter tails. The peak is higher and sharper than Mesokurtic, which means that data are heavy-tailed or profusion of outliers. Kurtosis > 3

  • Platykurtic: shorter distribution with thinner tails. The peak is lower and broader than Mesokurtic, which means that data are light-tailed or lack of outliers. Kurtosis < 3

This distribution is defined by two parameters: minimum aaa and maximum bbb. So, for a random variable Z∼Uniform(a,b)Z \sim Uniform(a, b)Z∼Uniform(a,b). And the expected value will be: E(X)=a+b2E(X) = \frac{a+b}{2}E(X)=2a+b​.

A normal distribution, sometimes called the bell curve, is a distribution that occurs naturally in many situations. It is expressed as Z∼Norm(μ,σ2)Z \sim Norm(\mu, \sigma²)Z∼Norm(μ,σ2) It has some properties:

The are all equal.

Also, a provides a graphical way to determine the level of normality.

The and are designed to test normality by comparing your data to a normal distribution with the same mean and standard deviation of the sample. If the test is NOT significant, then the data are normal, so any value above .05 indicates normality.

Measuring and and compare with the normal to see how symmetric and sharp it is compared to it.

A distribution is skewed when one of the tails is longer. Thus, the shape of the distribution is asymmetrical. More info about this topic on its own .

A lognormal (or Galton) distribution is a probability distribution with a normally distributed logarithm. with low mean values, large variance, and all-positive values often fit this type of distribution. Values must be positive as log(x)log(x)log(x) exists only for positive values of xxx. The expected value is E[X]=exp(μ+σ22)E[X] = exp(\mu + \frac{\sigma²}{2}) E[X]=exp(μ+2σ2​).

The exponential distribution is mostly used for testing . It’s also an important distribution for building continuous-time . The exponential often models waiting times and can help you to answer questions like: “How much time will go by before a major hurricane hits the Atlantic Seaboard?”

fZ(z∣λ)=λe−λz,z≥0f_Z(z|λ)=λe^{−λz},z\ge0fZ​(z∣λ)=λe−λz,z≥0

When a random variable ZZZ has an exponential distribution with parameter λ\lambdaλ, we say ZZZ is exponential and write Z∼Exp(λ)Z∼Exp(\lambda)Z∼Exp(λ). Given a specific λ\lambdaλ, the expected value of an exponential random variable is equal to the inverse of λ\lambdaλ, this is E[Z∣λ]=1λE[Z|\lambda] = \frac{1}{\lambda}E[Z∣λ]=λ1​.

The binomial distribution gives the discrete probability distribution Pp(n∣N)P_p(n|N)Pp​(n∣N) of obtaining exactly nnn successes out of NNN Bernoulli trials. The binomial distribution is therefore given by:

𝑃(𝑋=𝑘)=(Nk)𝑝𝑘(1−𝑝)𝑁−𝑘𝑃(𝑋=𝑘)= {{N}\choose{k}} 𝑝^𝑘(1−𝑝)^{𝑁−𝑘}P(X=k)=(kN​)pk(1−p)N−k

If XXX is a binomial random variable with parameters ppp and NNN, denoted X∼Bin(N,p)X \sim \text{Bin}(N,p)X∼Bin(N,p), then XXX is the number of events that occurred in the NNN trials (obviously 0≤X≤N0 \le X \le N0≤X≤N). The larger ppp is (while still remaining between 0 and 1), the more events are likely to occur.

E[X]=NpE[X] = NpE[X]=Np

If a random variable ZZZ has a mass distribution, we denote this by writing X∼Bernouilli(p)X \sim \text{Bernouilli}(p)X∼Bernouilli(p). And the for this distribution is px(1–p)1–xp^x (1 – p)^{1 – x}px(1–p)1–x, which can also be written as:

P(n)={1−p,if n=0p,if n=1P(n) = \begin{cases} 1-p, & \text{if}\ n=0 \\ p, & \text{if}\ n=1 \end {cases}P(n)={1−p,p,​if n=0if n=1​

The for a random variable, X, from a Bernoulli distribution is:

E[X]=pE[X] = pE[X]=p
P(Z=k)=λke−λk!,k=0,1,2...P(Z=k) = \frac{\lambda^k e^{-\lambda}}{k!}, k = 0,1,2...P(Z=k)=k!λke−λ​,k=0,1,2...

If a random variable ZZZ has a mass distribution, we denote this by writing Z∼Poi(λ)Z \sim \text{Poi}(\lambda)Z∼Poi(λ). And its expected value is equal to its parameter E[Z∣λ]=λE[ Z|\lambda] = \lambdaE[Z∣λ]=λ.

A is not skewed. It's symmetrical and the mean is exactly at the peak. Thus, the mean, median and mode concur.

It's possible to compute the skewness using the . For positive skewed distributions, the computed value will be > 0 and for negative skews the value will be < 0 as well.

mean, mode and median
Normal Q-Q Plot
Kolmogorov-Smirnov test
Shapiro-Wilk test
product reliability
Markov chains
expected value
numpy function
kurtosis
skewness
section
Normal Distribution
Skewed distributions
probability density function (pdf)
About probability distributions (Statistics How To)
Skewness (Math is Fun)
How do I determine whether my data are normal? (Psychwiki.com)
Skew and Kurtosis: 2 Important Statistics terms you need to know in Data Science (Diva Jain)