Skip to main content

Normal Distribution

NormalDistribution

A normal distribution in a variate X with mean mu and variance sigma^2 is a statistic distribution with probability density function

 P(x)=1/(sigmasqrt(2pi))e^(-(x-mu)^2/(2sigma^2))
(1)

on the domain x in (-infty,infty). While statisticians and mathematicians uniformly use the term "normal distribution" for this distribution, physicists sometimes call it a Gaussian distribution and, because of its curved flaring shape, social scientists refer to it as the "bell curve." Feller (1968) uses the symbol phi(x) for P(x) in the above equation, but then switches to n(x) in Feller (1971).

de Moivre developed the normal distribution as an approximation to the binomial distribution, and it was subsequently used by Laplace in 1783 to study measurement errors and by Gauss in 1809 in the analysis of astronomical data (Havil 2003, p. 157).

The normal distribution is implemented in Mathematica as NormalDistribution[mu, sigma].

The so-called "standard normal distribution" is given by taking mu=0 and sigma^2=1 in a general normal distribution. An arbitrary normal distribution can be converted to a standard normal distribution by changing variables to Z=(X-mu)/sigma, so dz=dx/sigma, yielding

 P(x)dx=1/(sqrt(2pi))e^(-z^2/2)dz.
(2)

The Fisher-Behrens problem is the determination of a test for the equality of means for two normal distributions with different variances.

The normal distribution function Phi(z) gives the probability that a standard normal variate assumes a value in the interval [0,z],

Phi(z)=1/(sqrt(2pi))int_0^ze^(-x^2/2)dx
(3)
=1/2erf(z/(sqrt(2))),
(4)

where erf is a function sometimes called the error function. Neither Phi(z) nor erf can be expressed in terms of finite additions, subtractions, multiplications, and root extractions, and so both must be either computed numerically or otherwise approximated.

BinomialGaussian
The normal distribution is the limiting case of a discrete binomial distribution P_p(n|N) as the sample size N becomes large, in which case P_p(n|N) is normal with mean and variance

mu=Np
(5)
sigma^2=Npq,
(6)

with q=1-p.

The distribution P(x) is properly normalized since

 int_(-infty)^inftyP(x)dx=1.
(7)

The cumulative distribution function, which gives the probability that a variate will assume a value <=x, is then the integral of the normal distribution,

D(x)=int_(-infty)^xP(x^')dx^'
(8)
=1/(sigmasqrt(2pi))int_(-infty)^xe^(-(x^'-mu)^2/(2sigma^2))dx^'
(9)
=1/2[1+erf((x-mu)/(sigmasqrt(2)))],
(10)

where erf is the so-called error function.

Normal distributions have many convenient properties, so random variates with unknown distributions are often assumed to be normal, especially in physics and astronomy. Although this can be a dangerous assumption, it is often a good approximation due to a surprising result known as the central limit theorem. This theorem states that the mean of any set of variates with any distribution having a finite mean and variance tends to the normal distribution. Many common attributes such as test scores, height, etc., follow roughly normal distributions, with few members at the high and low ends and many in the middle.

Because they occur so frequently, there is an unfortunate tendency to invoke normal distributions in situations where they may not be applicable. As Lippmann stated, "Everybody believes in the exponential law of errors: the experimenters, because they think it can be proved by mathematics; and the mathematicians, because they believe it has been established by observation" (Whittaker and Robinson 1967, p. 179).

Among the amazing properties of the normal distribution are that the normal sum distribution and normal difference distribution obtained by respectively adding and subtracting variates X and Y from two independent normal distributions with arbitrary means and variances are also normal! The normal ratio distribution obtained from X/Y has a Cauchy distribution.

Using the k-statistic formalism, the unbiased estimator for the variance of a normal distribution is given by

 sigma^2=N/(N-1)s^2,
(11)

where

 s^2=1/Nsum_(i=1)^N(x_i-x^_)^2,
(12)

so

 var(x^_)=(s^2)/(N-1).
(13)

The characteristic function for the normal distribution is

 phi(t)=e^(imt-sigma^2t^2/2),
(14)

and the moment-generating function is

M(t)=" border="0" height="20" width="27">
(15)
=int_(-infty)^infty(e^(tx))/(sigmasqrt(2pi))e^(-(x-mu)^2/(2sigma^2))dx
(16)
=e^(mut+sigma^2t^2/2),
(17)

so

M^'(t)=(mu+sigma^2t)e^(mut+sigma^2t^2/2)
(18)
M^('')(t)=sigma^2e^(mut+sigma^2t^2/2)+e^(mut+sigma^2t^2/2)(mu+tsigma^2)^2,
(19)

and

mu=M^'(0)=mu
(20)
sigma^2=M^('')(0)-[M^'(0)]^2=sigma^2.
(21)

These can also be computed using

R(t)=ln[M(t)]=mut+1/2sigma^2t^2
(22)
R^'(t)=mu+sigma^2t
(23)
R^('')(t)=sigma^2,
(24)

yielding, as before,

mu=R^'(0)=mu
(25)
sigma^2=R^('')(0)=sigma^2.
(26)

The raw moments can also be computed directly by computing the raw moments " border="0" height="17" width="52">,

 mu_n^'=1/(sigmasqrt(2pi))int_(-infty)^inftyx^ne^(-(x-mu)^2/(2sigma^2))dx.
(27)

(Papoulis 1984, pp. 147-148). Now let

u=(x-mu)/(sqrt(2)sigma)
(28)
du=(dx)/(sqrt(2)sigma)
(29)
x=sigmausqrt(2)+mu,
(30)

giving the raw moments in terms of Gaussian integrals,

 mu_n^'=1/(sqrt(pi))int_(-infty)^inftyx^ne^(-u^2)du.
(31)

Evaluating these integrals gives

mu_0^'=1
(32)
mu_1^'=mu
(33)
mu_2^'=mu^2+sigma^2
(34)
mu_3^'=mu(mu^2+3sigma^2)
(35)
mu_4^'=mu^4+6mu^2sigma^2+3sigma^4.
(36)

Now find the central moments,

mu_1=0
(37)
mu_2=sigma^2
(38)
mu_3=0
(39)
mu_4=3sigma^4.
(40)

The variance, skewness, and kurtosis excess are given by

var(x)=sigma^2
(41)
gamma_1=0
(42)
gamma_2=0.
(43)

The cumulant-generating function for a normal distribution is

K(h)=ln(e^(nu_1h)e^(sigma^2h^2/2))
(44)
=nu_1h+1/2sigma^2h^2,
(45)

so

kappa_1=nu_1
(46)
kappa_2=sigma^2
(47)
kappa_r=2." border="0" height="14" width="62">
(48)

For normal variates, kappa_r=0 for 2" border="0" height="14" width="27">, so the variance of k-statistic k_3 is

var(k_3)=(kappa_6)/N+(9kappa_2kappa_4)/(N-1)+(9kappa_3^2)/(N-1)+(6kappa_2^3)/(N(N-1)(N-2))
(49)
=(6kappa_2^3)/(N(N-1)(N-2)).
(50)

Also,

var(k_4)=(24k_2^4N(N-1)^2)/((N-3)(N-2)(N+3)(N+5))
(51)
var(g_1)=(6N(N-1))/((N-2)(N+1)(N+3))
(52)
var(g_2)=(24N(N-1)^2)/((N-3)(N-2)(N+3)(N+5)),
(53)

where

g_1=(k_3)/(k_2^(3/2))
(54)
g_2=(k_4)/(k_2^2).
(55)

The variance of the sample variance s^2 for a general distribution is given by

 var(s^2)=((N-1)[(N-1)mu_4-(N-3)mu_2^2])/(N^3),
(56)

which simplifies in the case of a normal distribution to

 var(s^2)=(2sigma^4(N-1))/(N^2)
(57)

(Kenney and Keeping 1951, p. 164).

If P(x) is a normal distribution, then

 D(x)=1/2[1+erf((x-mu)/(sigmasqrt(2)))],
(58)

so variates X_i with a normal distribution can be generated from variates Y_i having a uniform distribution in (0,1) via

 X_i=sigmasqrt(2)erf^(-1)(2Y_i-1)+mu.
(59)

However, a simpler way to obtain numbers with a normal distribution is to use the Box-Muller transformation.

The differential equation having a normal distribution as its solution is

 (dy)/(dx)=(y(mu-x))/(sigma^2),
(60)

since

 (dy)/y=(mu-x)/(sigma^2)dx
(61)
 ln(y/(y_0))=-1/(2sigma^2)(mu-x)^2
(62)
 y=y_0e^(-(x-mu)^2/(2sigma^2)).
(63)

This equation has been generalized to yield more complicated distributions which are named using the so-called Pearson system.

The normal distribution is also a special case of the chi-squared distribution, since making the substitution

 1/2z=((x-mu)^2)/(2sigma^2)
(64)

gives

d(1/2z)=((x-mu))/(sigma^2)dx
(65)
=(sqrt(z))/sigmadx.
(66)

Now, the real line x in (-infty,infty) is mapped onto the half-infinite interval z in [0,infty) by this transformation, so an extra factor of 2 must be added to d(z/2), transforming P(x)dx into

P(z)dz=1/(sigmasqrt(2pi))e^(-z/2)sigma/(sqrt(z))2(1/2dz)
(67)
=(e^(-z/2)z^(-1/2))/(2^(1/2)Gamma(1/2))dz
(68)

(Kenney and Keeping 1951, p. 98), where use has been made of the identity Gamma(1/2)=sqrt(pi). As promised, (68) is a chi-squared distribution in z with r=1 (and also a gamma distribution with alpha=1/2 and theta=2).

Comments

Popular Posts

Graphical Distribution of Frequency Distribution

Frequency distribution can be presented graphically in any one of the following ways: Histogram Frequency Polygon Smooth Frequency Curve Cumulative Frequency Curve of Ogive Curve Pie-Chart Histogram: - A histogram is an area diagram in which the frequencies corresponding to each class interval of frequency distribution are by the area of a rectangle without leaving no gap between the cosective rectangles. Frequency Polygon: - This is one kind of histogram which is represented by joining the straight lines of the mid points of the upper horizontal side of each rectangle with adjacent rectangles. Smooth Frequency Curve: - This is one kind of histogram which is represented by joining the mid points by free hand of the upper horizontal side of each rectangle with adjacent rectangles. Comulative Frequency Curve or Ogive Curve: - The total frequency of all values less then the upper class boundary of a...

Empirical Relation between Mean, Median and Mode

A distribution in which the values of mean, median and mode coincide (i.e. mean = median = mode) is known as a symmetrical distribution. Conversely, when values of mean, median and mode are not equal the distribution is known as asymmetrical or skewed distribution. In moderately skewed or asymmetrical distribution a very important relationship exists among these three measures of central tendency. In such distributions the distance between the mean and median is about one-third of the distance between the mean and mode, as will be clear from the diagrams 1 and 2 Karl Pearson expressed this relationship as:

Correlation and Linearity

Correlation coefficients measure the strength of association between two variables. The most common correlation coefficient, called the Pearson product-moment correlation coefficient , measures the strength of the linear association between variables. In this tutorial, when we speak simply of a correlation coefficient, we are referring to the Pearson product-moment correlation. Generally, the correlation coefficient of a sample is denoted by r , and the correlation coefficient of a population is denoted by ρ or R . How to Interpret a Correlation Coefficient The sign and the absolute value of a correlation coefficient describe the direction and the magnitude of the relationship between two variables. The value of a correlation coefficient ranges between -1 and 1. The greater the absolute value of a correlation coefficient, the stronger the linear relationship. The str...

Poisson Distribution

A Poisson experiment is a statistical experiment that has the following properties: The experiment results in outcomes that can be classified as successes or failures. The average number of successes (μ) that occurs in a specified region is known. The probability that a success will occur is proportional to the size of the region. The probability that a success will occur in an extremely small region is virtually zero. Note that the specified region could take many forms. For instance, it could be a length, an area, a volume, a period of time, etc. Notation The following notation is helpful, when we talk about the Poisson distribution. e : A constant equal to approximately 2.71828. (Actually, e is the base of the natural logarithm system.) μ: The mean number of successes that occur in a specified region. x : The actual number of successes that occur in a specified region. P( x ; μ): The Poisson probability that exactly x ...