Friday, January 29, 2010

Statistics, Frequency and Frequency Distributions

Statistics: - By “statistics” we mean aggregate or combination of facts affected to a marked extends by multiplicity of causes, numerically expressed and estimated according to reasonable standards of accuracy, collected in a systematic manner and placed in relation to each other.

There are two ways of statistics:

1. Frequency Distribution and

2. Graphical Distribution.

Frequency: - The way to count the number of items a particular value is repeated, is called the frequency of any class.

That means, frequency is the total number of items that a particular value is repeated in a table or data.

Frequency Distributions: - A set of classes together with the frequencies of occurrence of values in again set of data, presented in a tabular form, is referred to as a frequency distribution.




Construct a frequency distribution from the class marks of EEE 36 Batch who got the numbers in total trimester in statistics and probability.

Course:

97, 13, 81, 25, 37, 55, 19, 33, 59, 46, 67, 43, 12, 87, 90, 65, 76, 81, 79, 13, 5, 12, 35, 17, 46, 65, 43, 12, 85, 93,

Solution:

Here, the lowest value=5

And the highest value=97

For this kind of ungrouped data we have to choose k in such a way that 2k number of variables.

Here, k=5

So 2k = 25 = 32 > 30


Empirical Relation between Mean, Median and Mode

A distribution in which the values of mean, median and mode coincide (i.e. mean = median = mode) is known as a symmetrical distribution. Conversely, when values of mean, median and mode are not equal the distribution is known as asymmetrical or skewed distribution. In moderately skewed or asymmetrical distribution a very important relationship exists among these three measures of central tendency. In such distributions the distance between the mean and median is about one-third of the distance between the mean and mode, as will be clear from the diagrams 1 and 2 Karl Pearson expressed this relationship as:


Measures of Central Tendency

Central Tendency: - The tendency of the individual item of a statistical series to cluster around the central value is called the Central Tendency. Sometimes it is called the measure of location or a measure of representation.

Several types of Central Tendency can be defined: The commons are

  1. The Arithmetic Mean
  2. The Median
  3. The Mode
  4. The Geometric Mean
  5. The Harmonic Mean

The Arithmetic Mean: - The Arithmetic Mean of a grouped frequency distribution is defined as


A = any guessed or assumed class mark.

f = Frequency of each class interval.

n = Sum of total frequency.

i = Range of class interval.

d = Deviation of the assumed class mark from each class interval by the range of class interval.

d = (Xi – A) / i

The Median: - The Median of a grouped is defined as




Where,

Me = Median of the total class.

fc = Previous cumulative frequency of all classes above the media class.

fm = Frequency of the corresponding class interval.

i = range of class interval.

L = Lower class boundary of median class.

n = Sum of total frequency.

The Mode: - The Mode of a set of number is that value which occurs with the greatest frequency.

The Mode for a grouped data/frequency distribution is denoted by




Where,

L = Lower limit of modal class interval.

1 = Difference between modal and pre-modal group.

2 = Difference between modal and post-modal group

i = range of class interval.

The Geometric Mean: - The Geometric Mean is for a grouped frequency distribution is denoted by




Where,

G = Geometric Mean

n = sum of total frequency.

fi = Frequency corresponding each class interval.

xi = Class Mark.

The Harmonic Mean: - The Harmonic Mean H for a grouped frequency distribution is




Where,

H = Harmonic Mean.

n = sum of total frequency.

fi = Frequency corresponding each class interval.

xi = Class Mark.

Problem: -

The given frequency is the efficiency score of 115 students in their 70% marks. Find the Arithmetic Mean, Median, Mode, Geometric Mean and Harmonic Mean.

Solution:

Justify Full






Some special measurements following any section of Central Tendency:

  1. Quartiles
  2. Deciles and
  3. Percentile

Quartiles: - The Quartiles are those values in a series which divide the total frequency into four equal parts. It is denoted by Q where

Where,

r = 1, 2, 3,…….

Lr = Lower limit of the Quartiles class,

n = Sum of the total frequency,

r = Position of Quartiles,

Fr = Cumulative frequency of the pre-rth Quartiles class,

fr = Corresponding frequency,

i = Range of class interval.

Deciles: - The Deciles are those values in a series which divide the total frequency into ten equal parts. It is denoted by D where

Where,

r = 1, 2, 3,…….

Lr = Lower limit of the Deciles class,

n = Sum of the total frequency,

r = Position of Deciles,

Fr = Cumulative frequency of the pre-rth Deciles class,

fr = Corresponding frequency,

i = Range of class interval.

Percentiles: - The Percentiles are those values in a series which divide the total frequency into 100 equal parts. It is denoted by P where




Where,

r = 1, 2, 3,…….

Lr = Lower limit of the Percentiles class,

n = Sum of the total frequency,

r = Position of Percentiles,

Fr = Cumulative frequency of the pre-rth Percentiles class,

fr = Corresponding frequency,

i = Range of class interval.

Graphical Distribution of Frequency Distribution

Frequency distribution can be presented graphically in any one of the following ways:
  1. Histogram
  2. Frequency Polygon
  3. Smooth Frequency Curve
  4. Cumulative Frequency Curve of Ogive Curve
  5. Pie-Chart

Histogram: - A histogram is an area diagram in which the frequencies corresponding to each class interval of frequency distribution are by the area of a rectangle without leaving no gap between the cosective rectangles.


http://www.statcan.gc.ca/edu/power-pouvoir/ch9/images/histo1.gif

Frequency Polygon: - This is one kind of histogram which is represented by joining the straight lines of the mid points of the upper horizontal side of each rectangle with adjacent rectangles.


http://www.onekobo.com/Articles/Statistics/statsImgs/24Graph-002.jpg

Smooth Frequency Curve: - This is one kind of histogram which is represented by joining the mid points by free hand of the upper horizontal side of each rectangle with adjacent rectangles.






















Comulative Frequency Curve or Ogive Curve: -
The total frequency of all values less then the upper class boundary of a given class interval is called the Cumulative Frequency up to and including that class interval. The graph of such a distribution is called a Cumulative Frequency or Ogive.

There are two methods of constructing Ogive, namely

  1. The less than method and
  2. The more than method

The less than method: - In the “less than” method, we start with the upper boundary of each class interval and cumulative frequencies; when the frequencies are plotted, we get a rising curve.

The more than method: - In the “more than” method, we start with the lower boundary-rise of each class interval and from the total frequencies we substrate the frequency of each class when these frequencies are plotted we get a declining curve.


Draw a 'less than' ogive curve for the following data:


To Plot an Ogive:

(i) We plot the points with coordinates having abscissae as actual limits and ordinates as the cumulative frequencies, (10, 2), (20, 10), (30, 22), (40, 40), (50, 68), (60, 90), (70, 96) and (80, 100) are the coordinates of the points.

(ii) Join the points plotted by a smooth curve.

(iii) An Ogive is connected to a point on the X-axis representing the actual lower limit of the first class.

Scale:

X -axis 1 cm = 10 marks, Y -axis 1cm = 10 c.f.


Using the data given below, construct a 'more than' cumulative frequency table and draw the Ogive.

To Plot an Ogive

(i) We plot the points with coordinates having abscissae as actual lower limits and ordinates as the cumulative frequencies,

(70.5, 2), (60.5, 7), (50.5, 13), (40.5, 23), (30.5, 37), (20.5, 49),

(10.5, 57), (0.5, 60) are the coordinates of the points.

(ii) Join the points by a smooth curve.

(iii) An Ogive is connected to a point on the X-axis representing the actual upper limit of the last class [in this case) i.e., point (80.5, 0)].

Scale:

X-axis 1 cm = 10 marks

Y-axis 2 cm = 10 c.f

To reconstruct frequency distribution from cumulative frequency distribution.


Pie-Chart: -Represent the following distribution by Pie-Chart: -

Problem: -

Distribution of 100 students classified according to the marks that they securated in a class for an equal class interval.

Class Intervals

Frequency

60-65

8

65-70

20

70-75

27

75-80

15

80-90

10

90-100

20



Measure of Dispersion

Dispersion: - Dispersion refers to the scatteredness of the individual items of statistical series from their central value. So a descriptive measure of scatter of the values about the average is called measure of Dispersion.

The followings are the important methods for measure of dispersion:

  1. The Range
  2. The average/mean Deviation
  3. Quartile Deviation
  4. The 10 – 90 Percentile Range
  5. The Standard Deviation
  6. The Variance

The Range: - The Range of a set of numbers is the difference between the largest and smallest numbers in the set.

The average/mean Deviation: - The average/mean Deviation, of a set of N numbers X1, X2,……..XN is abbreviated MD and is defined by


Quartile Deviation: - Quartile Deviation, of a set of data is denoted by Q and defined by,Q = (Q 3 – Q1)/2

The 10 – 90 Percentile Range: - The 10 – 90 Percentile Range, of a set of data is defined by, 10 – 90 Percentile Range = P90 – P10

The Standard Deviation: - The Standard Deviation of a set of N numbers X1, X2,……..XN is denoted by σ and is defined by


The Variance: - The Variance of a set of data is defined as the square of the standard deviation and is thus given by σ2 and is denoted by

Co-efficient of Variation: - If the average of a statistical data is the mean x and if the absolute dispersion is the standard deviation, then the relative dispersion is called Co-efficient of dispersion. It is denoted by v and is given by, v = (σ/x) × 100

Problem: -

Find the standard deviation and co-efficient of variation of the class test of 100 students of XYZ University.



Moments

Moments: - Moments are certain mathematical constants used to as certain the nature and form of a frequency distribution. Moments in statistics are used to describe the various characteristic of a frequency distribution like Central Tendency, Dispersion, Skewness and Kurtosis. It is symbolized by the Greek letter μ

There are two Moments:

  1. Raw Moments and
  2. Corrected Moments

Raw Moments for grouped data,


Relation between Raw Moments and Corrected Moments for grouped data:


Skewness

Skewness is a measure of the degree of asymmetry of a distribution. If the left tail (tail at small end of the distribution) is more pronounced than the right tail (tail at the large end of the distribution), the function is said to have negative skewness. If the reverse is true, it has positive skewness. If the two are equal, it has zero skewness.

Several types of skewness are defined, the terminology and notation of which are unfortunately rather confusing. "The" skewness of a distribution is defined to be

 gamma_1=(mu_3)/(mu_2^(3/2)),
(1)
Positively Skewed Distribution: - The value of the arithmetic mean is greater than the mode; then the distribution is called Positively Skewed.

Nagatively Skewed Distribution: - If the value of the mode is greater than the arithmetic mean; the distribution is called Negatively Skewed.
http://upload.wikimedia.org/wikipedia/commons/thumb/b/b3/Skewness_Statistics.svg/446px-

Several forms of skewness are also defined. The momental skewness is defined by

 alpha^((m))=1/2gamma_1.
(2)

The Pearson mode skewness is defined by

 ((mean-mode))/sigma.
(3)

Pearson's skewness coefficients are defined by

 (3(mean-mode))/sigma
(4)

and

 (3(mean-median))/sigma.
(5)

The Bowley skewness (also known as quartile skewness coefficient) is defined by

 ((Q_3-Q_2)-(Q_2-Q_1))/(Q_3-Q_1)=(Q_1-2Q_2+Q_3)/(Q_3-Q_1),
(6)

where the Qs denote the interquartile ranges. The momental skewness is

 alpha^((m))=1/2gamma=(mu_3)/(2mu^(3/2)).

Kurtosis

Kurtosis is the degree of peakedness of a distribution, defined as a normalized form of the fourth central moment mu_4 of a distribution. There are several flavors of kurtosis commonly encountered, including the kurtosis proper, denoted beta_2 or alpha_4 defined by

 beta_2=(mu_4)/(mu_2^2),
(1)

where mu_i denotes the ith central moment (and in particular, mu_2 is the variance). This form is implemented in Mathematica as Kurtosis[dist].

The kurtosis "excess" is denoted gamma_2 or b_2, is defined by

 gamma_2=(mu_4)/(mu_2^2)-3,
(2)

and is implemented in Mathematica as KurtosisExcess[dist]. Kurtosis excess is commonly used because gamma_2 of a normal distribution is equal to 0, while the kurtosis proper is equal to 3.

Unfortunately, Abramowitz and Stegun (1972) confusingly refer to beta_2 as the "excess or kurtosis."

Lepto-Kurtic: - If a curve is more peaked than normal curve then it is colled Lepto-Kurtic.

Platy-Kurtic: - If a curve is more flat-tapped than normal curve then it is called Platy-Kurtic.

Meso-Kurtic: -The curve representing a normal shape in a frequency distribution is called Meso-Kurtic.

http://grants.hhp.coe.uh.edu/doconnor/PEP6305/KurtosisPict.jpg