2 Frequency Distributions: Tables, Class Intervals, and Applications
A frequency distribution is a table that shows “classes” or “intervals” of data entries with a count of the number of entries in each class.
Frequency of different elements of a sample or population are distributed according to the size of the variable under consideration. Consists of 2 parts, one shows value (X) and the other shows the frequencies (f).
Example of a frequency table. Let’s assume we have a dataset of students’ grades in a particular course.
2.0.1 Frequency Distribution Table:
Grade | Frequency |
---|---|
A* | 12 |
A | 18 |
B | 10 |
C | 5 |
D | 3 |
2.0.2 Relative Frequency:
- Relative frequency is calculated as the frequency of a specific grade divided by the total number of students.
- For example, the relative frequency of grade A is \[ \frac{12}{50} = 0.24 \]
Frequency Table with Relative Frequency:
Grade | Frequency | Relative Frequency |
---|---|---|
A | 12 | 0.24 |
B | 18 | 0.36 |
C | 10 | 0.20 |
D | 5 | 0.10 |
F | 3 | 0.06 |
Total | 50 | 1.00 |
2.0.3 Class Interval
A class interval is a range of values within which data points in a dataset are grouped. It is an essential concept in statistics, especially when dealing with continuous data, as it helps to organize and summarize large datasets in a meaningful way. Class intervals are used in frequency distribution tables and histograms to represent data effectively.
Characteristics of Class Intervals
Range: The class interval is defined by its lower and upper boundaries. For example, in a class interval of 10-20, 10 is the lower boundary and 20 is the upper boundary.
Width: The width (or size) of a class interval is the difference between the upper and lower boundaries. For instance, if the class interval is 10-20, the width is 10 (20 - 10).
Uniformity: Ideally, all class intervals in a frequency distribution should be of the same width to ensure consistency and ease of interpretation.
Non-overlapping: Class intervals should be mutually exclusive, meaning no data point should fall into more than one interval. Each data point should belong to one and only one class interval.
Inclusive or Exclusive: Class intervals can be either inclusive or exclusive. In an inclusive interval, the upper boundary is included in the interval (e.g., 10-20 includes 20). In an exclusive interval, the upper boundary is not included (e.g., 10-20 includes values up to but not including 20).
Steps to Create Class Intervals
Determine the Range of Data: Calculate the range of the data by subtracting the minimum value from the maximum value.
Decide the Number of Intervals: The number of intervals can be chosen based on the size of the dataset and the level of detail required. Common methods include using Sturges’ rule or the square root choice (taking the square root of the number of data points).
Calculate the Interval Width: Divide the range by the number of intervals to determine the width of each class interval. Adjust the width if necessary to get a convenient number.
Set the Boundaries: Establish the boundaries of each interval, ensuring they cover the entire range of the data without overlapping.
Example
Consider a dataset of student scores 46, 50, 53, 56, 59, 61, 63, 64, 66, 67, 70, 71, 72, 73, 74, 76, 78, 81, 83, 86, 92. To create a frequency distribution table with class intervals, follow these steps:
- Range: 95 - 45 = 50
- Number of Intervals: Suppose we decide to use 5 intervals.
- Interval Width: 50 / 5 = 10
-
Class Intervals:
- 45-54
- 55-64
- 65-74
- 75-84
- 85-94
Frequency Distribution Table with Class Interval
Class Interval | Frequency |
---|---|
45-54 | 3 |
55-64 | 5 |
65-74 | 7 |
75-84 | 4 |
85-94 | 2 |
Uses of Class Intervals
Data Summarization: Class intervals help to condense large datasets into a more manageable form, making it easier to identify patterns and trends.
Visual Representation: They are crucial for creating histograms and frequency polygons, which are graphical representations of data distributions.
Statistical Analysis: Class intervals are used in calculating measures such as mean, median, mode, and standard deviation for grouped data.
The frequency f of a class is the number of data entries in the class. Each class will have a “lower class limit” and an “upper class limit” which are the lowest and highest numbers in each class.
The range is the difference between the maximum and minimum data entries.
For frequency distribution we have to decide number of class intervals and their width to generate the list of class intervals.
The “class width” is the distance between the lower limits of consecutive classes.
If number of observations is N then number of classes should be \(2^k\) just greater than N and class width = (Max – Min) / K.
Eg: If N=200, a value that is just greater than 200 is \(2^8\) = 256. Number of classes = 8.
Class width = \((200 – 0) / 8 = 25\)
Class intervals are \(0-25, 25-50,….,175-200\)
Midpoint = \((Lower class limit + Upper class limit)/ 2\) is used to represent a class interval in a single number.
Relative frequency = \(Class frequency/total frequency = f/n\) . This will tell what percentage of frequencies are included in a given class interval.
2.0.4 Cumulative Frequency :
These are obtained by successive addition of the frequencies of classes.
Less than CF or Ascending CF - calculated from top to bottom which includes less than upper limit class interval.
More than CF or Descending CF - calculated from bottom to top which includes more than lower limit class interval.
Frequency Distribution Table with Cummulative Frequency:
Class Interval | Frequency | Less Than CF | More Than CF | Relative Frequency |
---|---|---|---|---|
45-54 | 3 | 3 | 21 | 0.15 |
55-64 | 5 | 8 | 18 | 0.25 |
65-74 | 7 | 15 | 13 | 0.35 |
75-84 | 4 | 19 | 6 | 0.20 |
85-94 | 2 | 21 | 2 | 0.10 |
Total | 21 | 1.00 |
-
Less Than Cumulative Frequency (Less Than CF):
- This column shows the cumulative frequency of all classes up to and including the current class interval.
- It is calculated by adding the frequency of the current class interval to the cumulative frequency of the previous class interval.
- Example: For the interval 55-64, the Less Than CF is 3 (previous interval) + 5 = 8.
-
More Than Cumulative Frequency (More Than CF):
- This column shows the cumulative frequency of all classes from the current class interval to the last class interval.
- It is calculated by subtracting the cumulative frequency of the previous class interval from the total frequency.
- Example: For the interval 55-64, the More Than CF is 21 (total) - 3 (previous Less Than CF) = 18.
-
Relative Frequency:
- This column shows the proportion of the total frequency that each class interval represents.
- It is calculated by dividing the frequency of the current class interval by the total frequency.
- Example: For the interval 55-64, the relative frequency is \[ \frac{5}{21} = 0.25 \].