Statistics for Dummies: 103

Prajwal Khairnar
4 min readAug 3, 2023

--

Photo by Fabian Quintero on Unsplash

Measures of Central Tendency: Mean, Median, and Mode

Introduction

Central Tendency is the measure of the middle value of a data set. In other words, it is the most typical or typical value in a data set. Measures of central tendency are a set of statistics that summarize the data in a sample or population. They can be used to describe different aspects of the data, depending on what type of information you’re interested in:

There are three main measures of central tendency: mean, median and mode. The mean (or average) is most commonly used because it can be calculated quickly and easily by adding all values together and then dividing by how many there are in your data set. However, if you have an even number of observations then this calculation doesn’t work properly and you need another method such as using medians instead or using modes; both of which we will discuss later on!

Defining the Mean: The Average of a Data Set

The mean is the arithmetic average of a data set. It is calculated by adding all the values in a data set and dividing by the number of values. For example, if you have five numbers {1, 2, 3, 4, 5} then their mean would be ((1+2+3+4+5)/5) = 3. If you have two sets of five numbers each: {1–5} and {2–6}, their means will be different because they contain different numbers of elements even though they are equal in size (i.e., both sets contain five units).

The mean can be used to compare data sets of different sizes but it doesn’t work well with skewed distributions or outliers that affect individual observations more than others

Understanding the Median: The Middle Value of a Data Set

The median is the value at the middle of a data set. It’s not influenced by outliers, and it can be used to compare data sets of different sizes.

The mode is also useful for comparing two or more sets of numbers because it will always be present in every group of numbers (assuming each group has at least two members). However, it doesn’t work well with large data sets — or with ones that are shaped differently from each other:

Finding the Mode: The Most Frequent Value in a Data Set

The mode is the most frequently occurring value in a data set. It has a place in statistics because it can be used as an alternative to mean and median when there are many extreme values or many tied values.

The mode is easy to find: simply pick out the value that appears most often! If you have ten numbers and six of them are 2s, then 2 is your mode (and therefore also your median).

If there are two or more modes, they’re called bimodal distributions because they have two humps on their histogram graphs. The histograms below show examples of unimodal distributions with one mode (a single hump), bimodal distributions with two humps, trimodal distributions with three humps…

Comparing Mean, Median, and Mode: When to Use Each Measure

The mean is used when the data is symmetric and bell-shaped.

The median is used when the data is skewed.

The mode is used when the data is not symmetric

The Effect of Outliers on Mean, Median, and Mode

Mean, median and mode are all measures of central tendency. They are used to describe the center of a distribution, but they do so in different ways. The mean is calculated by adding up all values in a data set and dividing by how many values there were; the median is simply the middle value once you’ve ordered your data from smallest to largest; and the mode is simply whichever value occurs most often (or at least has more occurrences than any other).

The effect that outliers have on these measures depends on whether or not they’re being used as part of your calculation for determining one of these three measures:

If you’re calculating an average using only non-outlier values (e.g., taking mean = sum/n) then having outliers won’t affect your results much at all because they’ll be ignored when calculating averages. However, if you’re using either mean or median as part of other calculations like z-scores (which take into account both positive and negative deviations from zero), then including outliers can potentially lead to inaccurate results because they may pull down averages artificially low if too many high numbers exist among them

Conclusion: Understanding the Importance of Measures of Central Tendency in Data Analysis

In conclusion, measures of central tendency are very important in data analysis. By understanding the center of a data set, we can make comparisons between different data sets and determine if there is a trend in the data.

Measures of central tendency provide us with an understanding as to what is happening with our variable or variables at their most basic level. They are used when we need an overview of what’s going on before we dive into more complicated analysis techniques like regression analysis or correlation coefficients (which we will discuss later).

Journey Links

I will keep updating the list here when new articles are published in the series. Keep an eye on it!

  1. Statistics for Dummies: 101
    Introduction to Statistics
  2. Statistics for Dummies: 102
    Types of Data: Nominal, Ordinal, Interval, and Ratio Scales
  3. Statistics for Dummies: 103
    Measures of Central Tendency: Mean, Median, and Mode
  4. Statistics for Dummies: 104
    Measures of Variability: Range, Variance, and Standard Deviation
  5. Statistics for Dummies: 105
    Probability: Definition and Basic Concepts
  6. Statistics for Dummies: 106
    Mastering Discrete and Continuous Probability Distributions: Key Concepts and Applications
  7. Statistics for Dummies: 107
    Unlocking the Power of Sampling Distributions: Key Insights for Statistical Analysis
  8. Statistics for Dummies: 108
    Demystifying Hypothesis Testing: Essential Concepts for Statistical Analysis

--

--

Prajwal Khairnar
Prajwal Khairnar

Written by Prajwal Khairnar

Data Scientist | IT Engineer | Research interests include Statistics | NLP | Machine Learning, Data Science and Analytics, Clinical Trials