Statistics: - By “statistics” we mean aggregate or combination of facts affected to a marked extends by multiplicity of causes, numerically expressed and estimated according to reasonable standards of accuracy, collected in a systematic manner and placed in relation to each other.

There are two ways of statistics:

1.Frequency Distribution and

2.Graphical Distribution.

Frequency: - The way to count the number of items a particular value is repeated, is called the frequency of any class.

That means, frequency is the total number of items that a particular value is repeated in a table or data.

Frequency Distributions: - A set of classes together with the frequencies of occurrence of values in again set of data, presented in a tabular form, is referred to as a frequency distribution.

frequency distribution

Construct a frequency distribution from the class marks of EEE 36 Batch who got the numbers in total trimester in statistics and probability.

A distribution in which the values of mean, median and mode coincide (i.e. mean = median = mode) is known as a symmetrical distribution. Conversely, when values of mean, median and mode are not equal the distribution is known as asymmetrical or skewed distribution. In moderately skewed or asymmetrical distribution a very important relationship exists among these three measures of central tendency. In such distributions the distance between the mean and median is about one-third of the distance between the mean and mode, as will be clear from the diagrams 1 and 2 Karl Pearson expressed this relationship as:

Central Tendency: - The tendency of the individual item of a statistical series to cluster around the central value is called the Central Tendency. Sometimes it is called the measure of location or a measure of representation.

Several types of Central Tendency can be defined: The commons are

The Arithmetic Mean

The Median

The Mode

The Geometric Mean

The Harmonic Mean

The Arithmetic Mean: - The Arithmetic Mean of a grouped frequency distribution is defined as

A = any guessed or assumed class mark.

f = Frequency of each class interval.

n = Sum of total frequency.

i = Range of class interval.

d = Deviation of the assumed class mark from each class interval by the range of class interval.

d = (Xi – A) / i

The Median: - The Median of a grouped is defined as

Where,

M_{e} = Median of the total class.

f_{c} = Previous cumulative frequency of all classes above the media class.

f_{m} = Frequency of the corresponding class interval.

i = range of class interval.

L = Lower class boundary of median class.

n = Sum of total frequency.

The Mode: - The Mode of a set of number is that value which occurs with the greatest frequency.

The Mode for a grouped data/frequency distribution is denoted by

Where,

L = Lower limit of modal class interval.

∆_{1} = Difference between modal and pre-modal group.

∆_{2} = Difference between modal and post-modal group

i = range of class interval.

The Geometric Mean: - The Geometric Mean is for a grouped frequency distribution is denoted by

Where,

G = Geometric Mean

n = sum of total frequency.

f_{i} = Frequency corresponding each class interval.

x_{i} = Class Mark.

The Harmonic Mean: - The Harmonic Mean H for a grouped frequency distribution is

Where,

H = Harmonic Mean.

n = sum of total frequency.

f_{i} = Frequency corresponding each class interval.

x_{i} = Class Mark.

Problem: -

The given frequency is the efficiency score of 115 students in their 70% marks. Find the Arithmetic Mean, Median, Mode, Geometric Mean and Harmonic Mean.

Solution:

Some special measurements following any section of Central Tendency:

Quartiles

Deciles and

Percentile

Quartiles: - The Quartiles are those values in a series which divide the total frequency into four equal parts. It is denoted by Q where

Where,

r = 1, 2, 3,…….

L_{r} = Lower limit of the Quartiles class,

n = Sum of the total frequency,

r = Position of Quartiles,

F_{r} = Cumulative frequency of the pre-rth Quartiles class,

f_{r} = Corresponding frequency,

i = Range of class interval.

Deciles: - The Deciles are those values in a series which divide the total frequency into ten equal parts. It is denoted by D where

Where,

r = 1, 2, 3,…….

L_{r} = Lower limit of the Deciles class,

n = Sum of the total frequency,

r = Position of Deciles,

F_{r} = Cumulative frequency of the pre-rth Deciles class,

f_{r} = Corresponding frequency,

i = Range of class interval.

Percentiles: - The Percentiles are those values in a series which divide the total frequency into 100 equal parts. It is denoted by P where

Where,

r = 1, 2, 3,…….

L_{r} = Lower limit of the Percentiles class,

n = Sum of the total frequency,

r = Position of Percentiles,

F_{r} = Cumulative frequency of the pre-rth Percentiles class,

Frequency distribution can be presented graphically in any one of the following ways:

Histogram

Frequency Polygon

Smooth Frequency Curve

Cumulative Frequency Curve of Ogive Curve

Pie-Chart

Histogram: - A histogram is an area diagram in which the frequencies corresponding to each class interval of frequency distribution are by the area of a rectangle without leaving no gap between the cosective rectangles.

Frequency Polygon: - This is one kind of histogram which is represented by joining the straight lines of the mid points of the upper horizontal side of each rectangle with adjacent rectangles.

Smooth Frequency Curve: - This is one kind of histogram which is represented by joining the mid points by free hand of the upper horizontal side of each rectangle with adjacent rectangles.

Comulative Frequency Curve or Ogive Curve: - The total frequency of all values less then the upper class boundary of a given class interval is called the Cumulative Frequency up to and including that class interval. The graph of such a distribution is called a Cumulative Frequency or Ogive.

There are two methods of constructing Ogive, namely

The less than method and

The more than method

The less than method: - In the “less than” method, we start with the upper boundary of each class interval and cumulative frequencies; when the frequencies are plotted, we get a rising curve.

The more than method: - In the “more than” method, we start with the lower boundary-rise of each class interval and from the total frequencies we substrate the frequency of each class when these frequencies are plotted we get a declining curve.

Draw a 'less than' ogive curve for the following data:

To Plot an Ogive:

(i) We plot the points with coordinates having abscissae as actual limits and ordinates as the cumulative frequencies, (10, 2), (20, 10), (30, 22), (40, 40), (50, 68), (60, 90), (70, 96) and (80, 100) are the coordinates of the points.

(ii) Join the points plotted by a smooth curve.

(iii) An Ogive is connected to a point on the X-axis representing the actual lower limit of the first class.

Scale:

X -axis 1 cm = 10 marks, Y -axis 1cm = 10 c.f.

Using the data given below, construct a 'more than' cumulative frequency table and draw the Ogive.

To Plot an Ogive

(i) We plot the points with coordinates having abscissae as actual lower limits and ordinates as the cumulative frequencies,

Dispersion: - Dispersion refers to the scatteredness of the individual items of statistical series from their central value. So a descriptive measure of scatter of the values about the average is called measure of Dispersion.

The followings are the important methods for measure of dispersion:

The Range

The average/mean Deviation

Quartile Deviation

The 10 – 90 PercentileRange

The Standard Deviation

The Variance

The Range: - The Range of a set of numbers is the difference between the largest and smallest numbers in the set.

The average/mean Deviation: - The average/mean Deviation, of a set of N numbers X_{1}, X_{2},……..X_{N} is abbreviated MD and is defined by

Quartile Deviation: - Quartile Deviation, of a set of data is denoted by Q and defined by,Q = (Q _{3} – Q_{1})/2

The 10 – 90 PercentileRange: - The 10 – 90 Percentile Range, of a set of data is defined by, 10 – 90 PercentileRange = P_{90} – P_{10}

The Standard Deviation: - The Standard Deviation of a set of N numbers X_{1}, X_{2},……..X_{N} is denoted by σand is defined by

The Variance: - The Variance of a set of data is defined as the square of the standard deviation and is thus given by σ^{2} and is denoted by

Co-efficient of Variation: - If the average of a statistical data is the mean x and if the absolute dispersion is the standard deviation, then the relative dispersion is called Co-efficient of dispersion. It is denoted by v and is given by, v = (σ/x) × 100

Problem: -

Find the standard deviation and co-efficient of variation of the class test of 100 students of XYZUniversity.

Moments: - Moments are certain mathematical constants used to as certain the nature and form of a frequency distribution. Moments in statistics are used to describe the various characteristic of a frequency distribution like Central Tendency, Dispersion, Skewness and Kurtosis. It is symbolized by the Greek letter μ

There are two Moments:

Raw Moments and

Corrected Moments

Raw Moments for grouped data,

Relation between Raw Moments and Corrected Moments for grouped data:

Skewness is a measure of the degree of asymmetry of a distribution. If the left tail (tail at small end of the distribution) is more pronounced than the right tail (tail at the large end of the distribution), the function is said to have negative skewness. If the reverse is true, it has positive skewness. If the two are equal, it has zero skewness.

Several types of skewness are defined, the terminology and notation of which are unfortunately rather confusing. "The" skewness of a distribution is defined to be

(1)

Positively Skewed Distribution: - The value of the arithmetic mean is greater than the mode; then the distribution is called Positively Skewed.

Nagatively Skewed Distribution: - If the value of the mode is greater than the arithmetic mean; the distribution is called Negatively Skewed.

Several forms of skewness are also defined. The momental skewness is defined by

(2)

The Pearson mode skewness is defined by

(3)

Pearson's skewness coefficients are defined by

(4)

and

(5)

The Bowley skewness (also known as quartile skewness coefficient) is defined by

(6)

where the s denote the interquartile ranges. The momental skewness is

Kurtosis is the degree of peakedness of a distribution, defined as a normalized form of the fourth central moment of a distribution. There are several flavors of kurtosis commonly encountered, including the kurtosis proper, denoted or defined by

(1)

where denotes the th central moment (and in particular, is the variance). This form is implemented in Mathematica as Kurtosis[dist].

The kurtosis "excess" is denoted or , is defined by

(2)

and is implemented in Mathematica as KurtosisExcess[dist]. Kurtosis excess is commonly used because of a normal distribution is equal to 0, while the kurtosis proper is equal to 3.

Unfortunately, Abramowitz and Stegun (1972) confusingly refer to as the "excess or kurtosis."

Lepto-Kurtic: - If a curve is more peaked than normal curve then it is colled Lepto-Kurtic.

Platy-Kurtic: - If a curve is more flat-tapped than normal curve then it is called Platy-Kurtic.

Meso-Kurtic: -The curve representing a normal shape in a frequency distribution is called Meso-Kurtic.