What does the standard deviation of a data set tell us about the data and why is the standard deviation an important measure?
(There are, of course, many other additional characterizations.)
It is important because it tells you if the average is a very useful quantity to use to interpret the data. If someone tells you that the average person your age dies in 50 years, that seems important, but if someone says that the average person dies in 50 years, give or take 20 years, suddenly you realize there is more to the story and maybe you should save more money, just in case. Well, the "give or take" part of that statement is very useful, but not well defined. If they say the life expectancy is 50 years with a standard deviation of 20 years, then that is perfectly defined mathematically. Standard deviation is a mathematical measure of the broadness of the distribution of data.
The following two data sets, A and B, have the same mean (average):
A: 48, 49, 50, 51, 52
B: 30, 40, 50, 60, 70
The distribution of the data about the mean in A is very narrow, whereas the distribution about the mean in B is broad. The S.D. gives us a quantification of the broadness of the distribution.
In normal distributions, about 68 percent of the data will fall within one S.D. on either side of the mean. About 96 percent of the data will fall with two S.D.
Let's say your teacher gives a test to one hundred kids and the test average is 80 points and the S.D. is 10. If the distribution is "normal," about 34 kids will score between 70 and 80, and about 34 kids will score between 80 and 90. We can also predict that about 14 kids will score between 90 and 100, and 14 will score below 70. That leaves four kids. They fall into two groups: they either totally bombed the test, or they got the extra credit question to boost their score over 100!