Distributions
Last updated
Last updated
A uniform distribution, also called a rectangular distribution, is a probability distribution that has constant probability.
This distribution is defined by two parameters: minimum and maximum . So, for a random variable . And the expected value will be: .
A normal distribution, sometimes called the bell curve, is a distribution that occurs naturally in many situations. It is expressed as It has some properties:
The mean, mode and median are all equal.
The curve is symmetric at the center (i.e. around the mean, μ).
Exactly half of the values are to the left of center and exactly half the values are to the right.
The total area under the curve is 1.
Plotting data using an histogram may give an intuitive insight.
Also, a Normal Q-Q Plot provides a graphical way to determine the level of normality.
The Kolmogorov-Smirnov test and Shapiro-Wilk test are designed to test normality by comparing your data to a normal distribution with the same mean and standard deviation of the sample. If the test is NOT significant, then the data are normal, so any value above .05 indicates normality.
Measuring the square error with respect to a normal with same mean and variance.
A distribution is skewed when one of the tails is longer. Thus, the shape of the distribution is asymmetrical. More info about this topic on its own section.
A lognormal (or Galton) distribution is a probability distribution with a normally distributed logarithm. Skewed distributions with low mean values, large variance, and all-positive values often fit this type of distribution. Values must be positive as exists only for positive values of . The expected value is .
A bivariate normal distribution is made up of two independent random variables. The two variables in a bivariate normal are both are normally distributed, and they have a normal distribution when both are added together.
The exponential distribution (also called the negative exponential distribution) is a probability distribution that describes time between events in a Poisson process.
The exponential distribution is mostly used for testing product reliability. It’s also an important distribution for building continuous-time Markov chains. The exponential often models waiting times and can help you to answer questions like: “How much time will go by before a major hurricane hits the Atlantic Seaboard?”
The most common form of its probability distribution function is:
When a random variable has an exponential distribution with parameter , we say is exponential and write . Given a specific , the expected value of an exponential random variable is equal to the inverse of , this is .
The binomial distribution gives the discrete probability distribution of obtaining exactly successes out of Bernoulli trials. The binomial distribution is therefore given by:
If is a binomial random variable with parameters and , denoted , then is the number of events that occurred in the trials (obviously ). The larger is (while still remaining between 0 and 1), the more events are likely to occur.
The expected value of a binomial parametrized by N and p is equal to:
A Bernouilli distribution is a discrete probability distribution for a Bernouilli trial — a random experiment that has only two outcomes (usually called a “Success” or a “Failure”). For example, the probability of getting a heads (a “success”) while flipping a coin is 0.5. The probability of “failure” is 1 – P (1 minus the probability of success, which also equals 0.5 for a coin toss). It is a special case of the binomial distribution for n = 1. In other words, it is a binomial distribution with a single trial (e.g. a single coin toss).
If a random variable has a mass distribution, we denote this by writing . And the probability density function (pdf) for this distribution is , which can also be written as:
The expected value for a random variable, X, from a Bernoulli distribution is:
A Poisson distribution is a tool that helps to predict the probability of certain events from happening when you know how often the event has occurred. It gives us the probability of a given number of events happening in a fixed interval of time. The Poisson distribution is given by:
If a random variable has a mass distribution, we denote this by writing . And its expected value is equal to its parameter .
Skewness is the degree of distortion from the normal distribution or the symmetrical bell curve. It measures the lack of symmetry in data distribution. It differentiates extreme values in one versus the other tail.
A Normal Distribution is not skewed. It's symmetrical and the mean is exactly at the peak. Thus, the mean, median and mode concur.
And positive skew is when the long tail is on the positive side of the peak, and some people say it is skewed to the right.
It's possible to compute the skewness using the numpy function. For positive skewed distributions, the computed value will be > 0
and for negative skews the value will be < 0
as well.
Kurtosis is all about the tails of the distribution. And it's used to describe the extreme values in one versus the other tail. It is actually the measure of outliers present in the distribution.
High kurtosis in a data set is an indicator that data has heavy tails or outliers. Meanwhile, low kurtosis in a data set is an indicator that data has light tails or lack of outliers.
Mesokurtic: the kurtosis statistic is similar to the on of a normal distribution. It's is usually said that a normal distribution has a kurtosis= 3.
Leptokurtic: longer distribution with fatter tails. The peak is higher and sharper than Mesokurtic, which means that data are heavy-tailed or profusion of outliers. Kurtosis > 3
Platykurtic: shorter distribution with thinner tails. The peak is lower and broader than Mesokurtic, which means that data are light-tailed or lack of outliers. Kurtosis < 3