We’ve already looked at how spread out data can be. This time, we’re going to look at something different: the shape of the data.
Imagine Kiki is baking chocolate chip cookies. She follows the same recipe every time and tries to make every cookie exactly the same size. Of course, baking is never perfectly perfect. Some cookies come out a little bigger. Some are a little smaller. But most are very close to the size she was aiming for.
Now imagine Kiki lines up every cookie from the smallest to the largest. There are only a few tiny cookies, and only a few giant ones. Most of them are somewhere in between.
Now count how many cookies fall into each size and draw those counts as bars. A pattern begins to appear. The bars pile up around the middle, then gradually get shorter on both sides.
The result looks like a smooth hill, with the tallest point in the centre and both sides gently sloping away. That familiar bell-shaped curve is called a normal distribution.
So what makes this curve special?
There are three things you will almost always notice.
It is balanced. The left side mirrors the right. If you could fold the bell curve down the middle, both halves would line up perfectly.
The centre does three jobs at once. The average (mean), the middle value (median), and the most common value (mode) are all the same. They meet at the highest point of the bell.
Most values stay close to the centre. As you move further away from the average, the number of observations becomes smaller and smaller. Extremely small values are rare. Extremely large values are rare too.
Here is where it gets interesting
Once you know the mean and the standard deviation, you can estimate where most of the data is likely to fall.
This is where the bell curve becomes especially useful. It leads to one of the most famous rules in statistics: the 68-95-99.7 rule.
Imagine Kiki’s cookies have an average diameter of 7 cm, with a standard deviation of 0.5 cm.
About 68% of the cookies will be between 6.5 cm and 7.5 cm, which is within one standard deviation of the average.
About 95% will be between 6 cm and 8 cm, within two standard deviations.
About 99.7% will be between 5.5 cm and 8.5 cm, within three standard deviations.
The further you move from the average, the fewer cookies you find. That is exactly what the bell curve shows.
Does every dataset follow a normal distribution?
Not at all. Some datasets are skewed. Some have multiple peaks. Others have completely different shapes.
Kiki is a fictional baker, but the pattern in her cookies is very real. The same bell shape shows up in plenty of real-life situations too, like people’s heights, exam scores across a large class, or measurement errors in manufacturing.
The normal distribution matters because many real-world measurements are close enough to this pattern that it becomes incredibly useful. But it is not the right model for every dataset.
One of the most important skills in statistics is recognising when data follows a normal distribution, and when it does not.
Try it yourself
Adjust the mean and standard deviation below and watch what happens to the bell curve.
Notice how changing the mean moves the whole curve left or right, while changing the standard deviation makes the curve narrower or wider.
Quick check
The next time you bake cookies, remember that they probably will not all come out exactly the same size. Most will be somewhere near the middle, and only a few will be much smaller or much larger. That simple idea is what a normal distribution is all about.