How can you perform a sample selected in such a way that each member of the population has an equal probability of being included?
The short answer is "random sample," but that, unfortunately, is
neither specific nor complete. It is not specific because there are
forms of random sampling where selection probabilities are not
constant. It is not complete because there are many different ways
to conduct random sampling with equal selection probabilities.
"Simple random sampling" occurs when you can perform a process
that, for all practical purposes, behaves like writing down the
identifier of each population member on a piece of paper, putting
all the pieces into a box, mixing them thoroughly, and pulling out
a few of them one by one (without replacing them in the box).
Nowadays we use a computer to do this job, because it's faster and
more reliable (it is notoriously difficult to mix pieces of paper
perfectly randomly). The computer needs a complete list of all the
population members: this is called a <i>sampling
Here is an example of random sampling that is not simple but
still selects every population member with equal probability.
Suppose you want to sample half the students in a classroom of 30.
Ask them to line up. Flip a fair coin: if it's heads, pick the
first, third, ..., 29th in line. If tails, pick the second, fourth,
..., 30th. Any individual student has a 50% chance of being part of
the sample, so each student has an equal probability of being
included. However, if you lined up the students boy-girl-boy-girl,
etc., the samples themselves wouldn't look very random: they will
either be mostly boys or mostly girls. It's still random though,
because it's determined by the flip of a coin.
The example highlights a subtle but important property of a
random sample: in many cases, you want the selection of population
members to be <b>independent</b>. This means the
probability of selecting one member is not affected by which other
members are selected. In simple random sampling, independence
holds; in the second example (a form of <i>gridded
sampling</i>), there is complete dependence: no student can
be chosen along with either of their neighbors in line, for
Simple random sampling is ideal for many purposes but often
cannot be carried out in practice because it is not feasible (you
might not be able to construct a sampling frame) or costs too much.
Often, more complicated procedures, such as <i>hierarchical
sampling</i>, are carried out to overcome these limitations.
(An example of hierarchical sampling is when an epidemiologist
selects a city at random, then selects households at random within
the city, then selects children at random within each household to
study. Doing it this way can require much less travel than
selecting children at random from all over the state.) These
procedures might or might not select population members with equal
probability. Usually the selection is not independent, either. When
the probabilities are unequal, they can be figured out and used as
<i>weights</i> in statistical analysis of the data.
Results can also be adjusted for lack of independence.
A good, readable, non-technical introduction to sampling and
simple random samples is the textbook <i>Statistics</i>
by Freedman, Pisani, and Purves. Any edition is fine. Steven
Thompson's book <i>Sampling</i> discusses dozens of
different sampling procedures and explains the theory behind each