Home » Statistics in Psychology » Frequency Distribution Mean, Median,Mode

Frequency Distribution Mean, Median,Mode

Frequency Distribution

                                                                                                                                        BY: M.A.SIRAJI

Frequency distribution:  A frequency distribution is any arrangement of data that shows the frequency of occurrence of different values of the variable or the frequency of occurrence of values falling within arbitrarily defined ranges of the variable called class intervals.

Rules of thumb regarding arranging a set of data into class intervals:

  1. 1.     Select a class interval of such a size that between 10 and 20 such intervals will cover the total range of the observations.
  2. Select class intervals with a range of 3,5,10 or 20 points.
  3. Selecting class interval at a value which is multiple of the size of that interval.
  4. Arrange the class intervals in order of magnitude of values they include with the class of largest values on top.

Daily wage of day laborers:

500           50      170       350       200      275      100     80      75      400       450      700        1000      60

325         420      725        640       553     244       290    90     328     115       400       800         60         55

346          220      210       612       673     444     590     100   99    70    44    510     498     517     430

612         110       225       370      450       565     115      100       899     960     30     210     344      287

120           20        30         82        120      227       310      200      100     99     70     60    350      75      97

330      290        177         100     600      610      443     219      .

Highest wage= 1000

Lowest wage=      20

Difference   =     980

Here N= 80

Class size= 50

Class Interval

Tally

Frequency (f)

960-1009

||

2

860-909

0

860-859

|

1

810-859

0

760-809

|

1

710-759

|

1

660-709

|||

3

610-659

|||

3

560-609

|||

3

510-559

|||

3

460-509

||

2

410-459

|||| |

6

360-409

||

2

310-359

||||| ||||

9

260-309

||||

5

210-259

|||| ||

7

160-209

||||

4

110-159

|||| |

6

60-109

|||| |||| |||| |||

18

10-59

||||

5

 

Apparent and true/ exact limits of class intervals:

The values of the continuous variables fall within certain limits of the measurement scale. These limits are taken as one half unit bellow to one half unit above the apparent /reported one. For example an age value 19 may be thought of as occupying a range 18.5 to 19.5 along the measurement scale. These limits are referred to an exact/real true limit of continuous variables.

The class intervals in a frequency distributions are usually reported in exact limits that reflects accuracy of our measurement.

 

Class Intervals                                    Exact limit

50 -99                                                 49.5-99.5

Midpoint of class point:

45 – 49

40 – 44

35 – 39

30 – 34

25 – 29

20 – 24

15 – 19

To obtain the class midpoint, add half of the class size to the lower exact limit of that class. From the class 30 – 34 we can see the midpoint is

29.5+2.5=32

Age distribution of married women of reproductive age (MWRA)

Class Interval

Frequency (f)

Exact Limit

Class Midpoint

45 – 49

30

44.5 – 49.5

47

40 – 44

55

39.5 – 44.5

42

35 – 39

70

34.5 – 39.5

37

30 – 34

100

29.5 – 34.5

32

25 – 29

150

24.5 – 29.5

27

20 -24

75

19.5 – 24.5

22

15 – 19

20

14.5 – 19.5

17

n= 500

 

 

 

Assumption about the distribution of observations within the class interval:

Age distribution of married women of reproductive age (MWRA)

Table – 1

Class Interval

Frequency (f)

45 – 49

30

40 – 44

55

35 – 39

70

30 – 34

100

25 – 29

150

20 -24

75

15 – 19

20

Though we loss some of information by arranging the data into class intervals, we need to make certain assumption about the distribution of observations within the class intervals. There are two such assumptions.

Assumption 1: The 1st assumption states that the observations are uniformly distributed over the exact limits of the class interval.

From table 1 we can see class 25 – 29 contains 150 frequencies where the class size is 5.

Here

Class interval

frequency

28.5 – 29.5

30

27.5 – 28.5

30

26.5 – 27.5

30

25.5 – 26.5

30

24.5- 25.5

30

N= 150

This assumption (#1) is used in the calculation of such statistics as the median, quartiles and the percentiles, and in the preparation of histograph.


Assumption 2:  
 The 2nd assumption states that the observations within the class intervals are concentrated at the midpoint of the class interval. I.e. the observations within a particular interval are the same as the mid value of that interval.

Class Interval

Frequency

Midpoint(X)

45 – 49

30

47

40 – 44

55

42

35 – 39

70

37

30 – 34

100

32

25 – 29

150

27

20 -24

75

22

15 – 19

20

17

N= 500

This assumption is used in the calculation of such statistics as the means, standard deviations and in the preparation of Frequency polygon.

Table 2

Class Interval

Frequency

Midpoint(X)

Exact Limit

Cf

Cpf

45 – 49

30

47

44.5 – 49.5

500

100

40 – 44

55

42

39.5 – 44.5

470

94

35 – 39

70

37

34.5 – 39.5

415

83

30 – 34

100

32

29.5 – 34.5

345

69

25 – 29

150

27

24.5 – 29.5

245

49

20 -24

75

22

19.5 – 24.5

95

19

15 – 19

20

17

14.5 – 19.5

20

4

N= 500

Cpf  =×100

Cpf = absolute number

100 = relative number

Graphical representation of data:  a graph is a visual portrait of a set of numerical data. By simplifying it helps our understanding. It compels our attention to the essential elements of a set of data

Most common form of graph:

  1. Histogram
  2. Frequency polygon
  3. Cumulative frequency polygon
  4. Cumulative percentage
  5. 1.     Histogram: Histogram is a graph where frequencies are represented by bars in the form of areas. The width of each bar corresponds to the exact limit of the class interval and the height of each bar corresponds to the class frequency

Exact class interval

  1. 2.      Frequency polygom
    1. Putting dot at the intersecting point of the midpoint of the class and the class frequency
    2. Next, joining the dots in order of the class intervals and connecting the lines to the base of the graph.

Midpoint

  1. 3.     Cumulative frequency polygon:  Cumulative polygon differs from a frequency polygon in two respect ;

a)    In this case, we put a dot corresponding to the cumulative frequency instead of class frequency.

b)    While drawing a cumulative frequency polygon we put the dot corresponding to the exact upper limit to the class instead of the class midpoint.

       Exact upper limit

How does frequency distribution differ?

Frequency distribution differs from one another in terms of four important properties:-

  1. 1.     Central location: central location refers to a value near the centre of the distribution or at the point of greatest concentration of values/ observations.
  2. 2.     Variation: Variation refers to the extent of clustering of the values in a distribution about the content value
  3. 3.     Skewness :  Skewness refers to the symmetry or asymmetry of a distribution . if a distribution is asymmetrical, then it can be either positively skewed( skewed to the right) or negatively skewed( skewed to the left)

 

 

  1. Kurtosis:  Kurtosis refers to the flatness or peakdness of one  distribution in relation to the other. In terms of kurtosis, there are three types of distributions

 

a)    Leptokurtic

b)    Platykurtic

c)     Mesokurtic / normal distribution

 

 

 

Measure of central tendency

Central tendency:  Central tendency refers to a tendency in the observations within a distribution to be clustered around a central value of that distribution.

Measures of central tendency: Measures of central tendency are those indexes through which the central tendency of a distribution can be quantified.

The most common measures of central tendency are the mean median and mode, harmonic mean, quadratic mean and geometrical mean.

  1. 1.     Arithmetic mean or mean:

Calculation from ungrouped data:

A=20    B=50    C=100    D=1000     E=220

=

=

= 278.

Calculation for grouped data:

Class Interval

Frequency(f)

Midpoint (X)

fX

130-134

1

132

132

125-129

1

127

127

120-124

3

122

366

115-119

6

117

702

110-114

7

112

784

105-109

12

107

1284

100-104

16

102

1692

95-99

7

97=A

679

90-94

17

92

1564

85-89

5

87

435

80-84

15

82

1230

75-79

6

77

462

70-74

3

72

216

65-60

1

67

67

N=100

So, =

= =96.8

Again, = A+

Where, A= assumed mean

=

Class Interval

Frequency(f)

Midpoint (X)

fX

Cf    =

 

f

130-134

1

132

132

100 7 7

125-129

1

127

127

99 6 6

120-124

3

122

366

98 5 15

115-119

6

117

702

95 4 24

110-114

7

112

784

89 3 21

105-109

12

107

1284

82 2 24

100-104

16

102

1692

70 1 16

95-99

7

97=A

679

54| Mdn clss 0 0

90-94

17| MOd

92

1564

47 -1 -17

85-89

5

87

435

30 -2 -10

80-84

15

82

1230

25 -3 -45

75-79

6

77

462

10 -4 -24

70-74

3

72

216

4 -5 -15

65-60

1

67

67

1 -6 -6

N=100

So,  = A+

97+  ×5=96.96

Median

Median calculation from grouped data:

Median  = l +

Where,

l      = exact lower limit of the median class

Fb   = Sum of the frequencies below the median class

fm   = frequency of median class

So,             Median  = 94.5 +

= 96.64

MODE

In case of raw data mode is the most frequently occurring observation. And in case of a frequency distribution mode is the mid point of the class having highest frequency.

Mode provides a nominal measure.

Mode = L +

Where,

L= exact lower limit of the mode class

The difference in frequency between the model class and its immediate higher class

The difference in frequency between the model class and its immediate lower class

সর্বোচ্চ frequency যে class এ সেটা model class

17-7 = 10

17-5 = 12

So, mode = 89.5 +×5 = 91.772

Again, mode = 3mdn – 2mean

= 3×96.64 – 2 × 96.8 = 96.32

Empirical relationship between Mean, Median and Mode

  1. If a distribution is normal then the mean, median and mode will lie at the same point
  2. If a distribution is not normal i.e. asymmetrical them the three will lie at different points with the mean pulled towards the skewed end.

Properties of Mean, Median and Mode

The mean:  it is a measure of interval and ratio level variables.

The median: it is an ordinal level variable.

The mode: it is a measure of nominal level variable.

Arithmetical mean

Properties:

  1. 1.     The sum of deviations of all the measurement in a distribution from thir arithmetic mean is zero ( i.e. )=0)
  2. 2.     The sum of square of deviations from the arithmetic mean is less than the sum of squares of deviations from any other value i.e.

a)     The second properties indicates that the arithmetic mean is the centre of gravity of a distribution

b)    The  2nd properties follows alternative definition of the arithmetical mean “the mean is that measure of central location about which the sum of  the squares (of deviation) is a minimum (Artil and kalton).

c)     The mean calculated from a sample of size “N” is an estimated population mean.

Advantage of mean

  1. The mean is based upon all the observation in a distribution and cannot be calculated even if a single value is missing. Therefore it is the most representative measure of central tendency
  2. The measure is not affected by sampling fluctuations. So it is the most stable measure of central tendency
  3. The mean is amenable to algebraic treatment i.e. combined mean of two or more distribution can be calculated.

The formula applied is:

=                                                F= 30(N1)       = 65

=  =66.667                            M= 15 (N2)      = 70

Disadvantage of mean

a)     The mean is unduly affected by extremely high or low values. Therefore it becomes a poor measure of central tendency when the distribution is skwed.

b)    The mean cannot be calculated when the frequency distribution has open ended class at the both ends.

Median: advantage

A)  The median can be calculated even when a distribution has open ended classes.

B)   The median is not affected by extremely high or low values.

C)   The median as a measure of central tendency is mostly need in markedly skewed distribution

D)  Median: Disadvantage

a)     The median is not amenable to algebraic treatment.

b)    It is erratic (unpredictable).

Mode: advantage

a)     It can be located/identified by more inspection

b)    It is not necessary to know all items in a distribution to compute mode.

c)     The mode is not affected by sampling fluctuations.

Mode: disadvantage

a)     The mode is ill defined

b)    It is not representative of distribution as it is not based on all the items

When to use mean, median and mode

The mean:

Use when

a)      the measure of central tendency having the greatest stability is wanted. It usually varies items from sample to sample drawn from the same population

b)    When other statistics (e.g. measure of variability) are to be calculated. Many statistics are based on the mean

c)     The distribution of observation is symmetrical about the central

The median:

Use when

a)     The exact midpoint of the distribution is wanted we are interested in whether classes fall within the upper or lower level of the distribution and not particularly in how far they are form the central point.

b)    The distribution markedly skewed. Extreme values markedly affect the mean, not the median

c)     An incomplete distribution if given

The mode:

Use when

a)     A quick and very rough estimate of central value is wanted

b)    We wish to know the most typical case of the distribution

VARIABILITY/DISPERSION

Variability: Variability is the degree to which the various observations in a distribution tend to spread about an average value

Inadequacy of averages :  scores of  two distribution

1)    M: 12, 80,60, 14, 34 ; =40

2)    F:  36,43, 42, 41, 38 ; =40

Measures of dispersion/ variability:

  1. Range
  2. Mean deviation
  3. Quartile deviation
  4. Variation and standard deviation  etc are absolute measures

Coefficient of variation (CV): It is a relative measure. These measures help us to know the compactness, scalterdness of the observations within a distribution.

The range: Range is defined as the difference between the longest and the smallest values. Symbocally R= L-S

Quartive deviance: IT is defined as the average distance of the quartile points from the median of the distributions. We get three quartile points. 1st quartile is that point below which 25% of the observation lie, 2nd quartile is that point below which 50% of the observation lie. 3rd quartile is that point below which 75% of observation lie

Class Interval

Frequency(f)

Midpoint (X)

fX

Cf

130-134

1

132

132

100

125-129

1

127

127

99

120-124

3

122

366

98

115-119

6

117

702

95

110-114

7

112

784

89

105-109

12

107

1284

82

100-104

16

102

1692

70

95-99

7

97

679

54|

90-94

17|

92

1564

47

85-89

5

87

435

30

80-84

15

82

1230

25

75-79

6

77

462

10

70-74

3

72

216

4

65-60

1

67

67

1

N=100

Mean deviation: Mean deviation is the arithmetic mean of the absolute deviation of the scores from the mean of the deviation

X values

|X-|

67

15.6

33

-18.4

45

-6.4

50

-1.4

62

-10.6

= 51.4

= 52.4

Calculation of MD:

Raw data:

MD =   ;       |x|=|X-|

=   = 10.48.

Grouped Data: MD =

Class Interval

Frequency(f)

Midpoint (X)

fX

Cf

|x|=|X-|

|fx|

130-134

1

132

132

100

35.2

125-129

1

127

127

99

30.2

120-124

3

122

366

98

115-119

6

117

702

95

110-114

7

112

784

89

105-109

12

107

1284

82

100-104

16

102

1692

70

95-99

7

97

679

54|

90-94

17|

92

1564

47

85-89

5

87

435

30

80-84

15

82

1230

25

75-79

6

77

462

10

70-74

3

72

216

4

65-60

1

67

67

1

N=100

Here

=  = 96.8

MD =

 

 

 

 

 

X= any raw data or midpoint ; x = deviation from the mean X-

 

Variance and standard deviation:

Variance:  Variance is the squared deviations from the mean of the distribution

Formula

Sample

population

S=

Where x2=( X-)2

S=

σ2=

Where x2=( X-µ)2

σ2=

µ= population mean

In both case we can say (N-1) as digress freedom or unbiased estimate.

Height (in feet

X

x=( X-)

x2

5.5

0.06

0.0036

5.8

0.36

0.1296

5.2

-0.24

0.0576

5.4

-0.04

1.610-03

5.3

-0.14

0.0196

=5.44

=0.212

S=     = = 0.053

 

Regression

                                                                                                                                        BY: M.A.SIRAJI

Regression:  Regression refers to a problem of predicting one unknown variable from the known variable or several variables

Regression is of two types:

  1. Simple regression. X        Y
  2. Multiple regression(X,Y)      Z

Simple regression/simple linear regression: when we can predict one variable from only one other variable.

Suppose X, Y are two variables where Y is dependent and X is independent variable.

Independent variable regarded as predictor in regression.

Dependent variable is regarded as criterior in regression.

Regression equation:

Regression equation of Y on X: = ayx+byxX

Regression equation of X on Y: = ayx+byxY

= predicted Score of Y

Coefficient of determination

r= coefficient of correlation.

r2 = coefficient of determination

= total variation (TV)

=unexplained variation (UV)

=explained variation (EV)

TV= UV+ EV

Properties of r2:

r= 0.5; r2=0.25

r= -0.5; r2=0.25

  1. The value of ris always positive.
  2. The values of rfrom o to 1.
  3. The proportion of total variation can be explained in terms of the magnitude of correlation coefficient.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: