7964
Lecture 4
Lecture 4 focuses on surveys--surveys are increasingly important in
everyday life, but are also a useful illustration of some of what we've covered
already, and some of what we'll cover in the future.
Surveys use samples to draw conclusions about populations. Gallup, for example,
may survey about 1000 individuals to draw conclusions about millions.
I. Important Concepts: Margin of Error
Suppose a survey finds that 35% of the sample approves of the job
performance of George Bush.
This percentage, of course, is just an estimate of the proportion
in the population that approve of George Bush.
How do we know close is that to the "true" proportion in the population?
The margin of error provides a range in which the population (the
"true") percentage falls--and gives an idea of how confident we can be that
the population percentage falls within that margin. Most margins of error
provided by pollsters are determined at about a 95% confidence level--that
is, if you took an infinite number of samples, and from each sample calculated
out a percentage of respondents that approved of Bush's performance as president,
the "true" population percentage would fall within that range 95% of the time.
A rough way to calculate margins of error at the 95% level is 100 divided
by the square root of n (recall that n is your sample size).
So, consider a hypothetical sample of 1,600 respondents.
In the sample, about 40% of individuals approved of George Bush's job
performance. What is the margin of error?
- The sample size is n=1600.
- The square root of the sample size is 40.
- So, 100/40 is the margin of error--we can say that the margin of
error is + or - 2.5 percentage points
-
We can also say that if you (hypothetically) took an infinite number of
samples of respondents, and asked them the same questions, the "true"
population percentage approval of Bush would fall in this range 95% of the
time.
Note a few things about the margin of error:
-
First, the margin of error does not depend on the population size.
So, if we have a sample size of 1,600, it doesn't matter if the population
is 2000 or 4 million--the margin of error is the same.
-
Second, the margin of error does depend on the sample size.
The bigger the sample, the smaller the margin of error--and the narrower
the range in which you're "95 confident" that the true, population percentage
falls. Likewise, the smaller the sample, the less robust or
stable or reliable your results are.
What would be your
confidence interval if your sample was just 64 (an odd number for the
sample, but useful for calculating)? The square root of
64 is 8--and then you would divide 100 by 8, to get 12.5. The margin
of error for your sample would be plus or minus 12.5: in other words,
instead of having a range of 37.5 to 42.5, you'd have a range of
27.5 to 52.5! You would be '95% confident' that George Bush's approval
rating fell in between those two extremes--although you probably wouldn't
even need a survey to be reasonbly sure of that.
- Third, 95% confidence sounds like a lot--you're almost sure with
a smaple of 1,600 that you population approval rating -- the "true"
approval rating -- falls within that fairly narrow 5 point range. But,
by definition, keep in mind that 5% of the time, it doesn't.
- And fourth, there are more precise ways to calculate margins
of error (or "confidence intervals"), and you can calculate them
at various levels of confidence -- you can get a range for a 98% confidence
level, for a 90% confidence level, and so on. We'll cover those different
methods later on--but this rough formula described above is often
what pollsters report to the public, because it's so straightforward.
Moreover, more precise methods never produce a wider range (or larger
margin of error)--so sometimes pollsters call this margin of error
the conservative margin of error.
II. Advantages and Disadvantages to Sample Surveys
It is faster and less expensive to do a survey of a population than to
gather data through a census,, or a study of an entire population.
And, in part because you can get fairly robust results with a relatively
small sample (say, over a 1000), then resources can be spent developing
good questions, training interviewers, and other measures to guarantee
good results.
Of course, there are limitations or challenges with surveys.
- 1. First, there may be selection bias--that is, the sample
selected is not representative of the population in which the
researcher is interested.
- 2. Second, there may be non-response bias--the sample is
representative, but a subset of the sample may not be able to be
contacted, or does not respond to the survey. An example would be the
difficulties in surveying the homeless, for example. Likewise,
if one is doing surveys that involve questions about income, the
significantly less wealthy and (in particular) the relatively wealthy
are less likely to participate.
- 3. Third, there may be response bias---question wording,
question ordering, and the race, sex, accent, and demeanor of the
interviewer may all change an individual's answers.
III. How are Samples Selected?
- A. Simple random samples
Researchers drawing simple random samples find a source of
random numbers (computers generally have programs called "random number
generators"--or pollsters use random digit dialing, which is at
random among those with phones). Cases are then selected so that
every individual (or unit) has an equal chance of selection.
The sampling frame is the set of cases from which the
sample is selected--for example, the set of phone numbers. It may be
the population--or it may be a subset of the population that is still
larger than the sample.
- B. Stratified random samples
Researchers drawing a stratified random sample divide the population up into
separate groups (for instance, by sex, age, race, or region), and then
drawn samples from each group. Such an approach has several advantages--
researchers can address problems of responses to interviewers (for instance,
respondents in one region may be more comfortable with interviewers from that
region), and (more important) problems with non-responses can be addressed by
"over-sampling" some groups.
- C. Cluster samples
In cluster samples, like stratified random samples,
the population is divided up into groups. However, here the similarity
ends: in cluster samples, the groups are much smaller, and a random
sample of group is selected. Then all the observations in these
selected groups are surveyed.
- D. Multistage samples
Multistage samples are a combination of stratified random samples and
cluter samples. This is a common approach for public opinion pollsters.
First, the population is stratified into groups (or strata),
and then a random sample of communities is drawn from those strata.
Then, those communities are divided up into clusters, and these clusters
are sampled--and all the units within those clusters are surveyed.
- E. Other Considerations: Timing, Response Rates
- Good polls of public opinion generally are conducted over a period of days.
Generally, pollsters should provide information about when the poll was
conducted--so people know whether it was before or after particular events.
One potential problem in contemporary politics, however, is that there is
tremendous pressure for media to take polls to immediately assess the
effect of a particular event--the capture of Hussein, for example. These
are called quickie polls
However, this is somewhat problematic--it is very difficult to get a
random sample on one particular day (Fridays are different than Saturdays,
and so on).
-
And, observers should keep in mind that conclusions can only
be drawn about the immediate changes in public opinion--not about
any long-term effects.
IV. Ethical considerations
At times, pollsters will deliberately conduct polls that violate the ethics of
polling. Push polls, for example, are designed to elicit
particular answers. Three examples:
- 1. During the 90s, when care reform plan was under debate, some members of
Congress were sending "polls" that asked questions similar to the
following:
Would you support a enormous government-funded managed health care system,
or would you prefer to continue to choose your own physicians?
Of course, this question is designed to elicit a particular answer--and
to some degree, is designed to encourage individuals to vote.
-
2. A nursing home interest group (presumably) conducts telephone interviews,
outlining the most unpopular aspects of the governor's proposal to fund
stay-at-home care (such as the cost, the potential risk to seniors, and
so on)...and asks respondent's opinions about the plan. When asked to
give the name of the organization sponsoring the survey, the interviewer refuses
to identify that organization. The survey seems to designed to reduce
support for the governor's plan, rather than to measure public opinion about
the plan.
-
3. Potentially most problematic, disreputable campaign firms will
occasionally random digit dial within a district asking the following question:
If you knew that Marvin Lewis had been arrested for theft, would that
make you more or less likely to vote for him?
Clearly, the intent is to discourage voting for John Lewis; the "pollster"
also doesn't provide information about who is polling. And, it's not clear
whether the information is true--or the circumstances of said arrest
(was the candidate convicted? theft of what? age when arrested? and other
information that would clarify the circumstances.)
Questions in polls should be designed to accurately measure -- not change
-- opinion. Pollsters should always be willing to give information about
who has funded and who is conducting the poll. (It is acceptable, however,
for pollsters to guarantee the confidentiality of the respondents--
and this becomes particularly important when polling about sensitive matters.
V. Questions
- 1. Consider the following examples of polling questions, and explain why
they may not accurately measure public opinion:
- A survey asks "to what extent do you think teenagers worry about peer
pressure related to drinking alcohol?".
Then, the survey asks "Name the top five pressures you think face
teenagers today."
- In 1995, the Washington Post asked (in a weekly poll of 1,000 randomly
selected respondents): "Some people say the 1975 Public Affairs Act should be
repealed. Do you agree or disagree that it should be repealed?"
- Students are "randomly sampled" on the quad, and asked about current
events. The survey results indicate that few students know a great deal
about current events.
- Individuals are asked to participate in a television poll about
celebrity popularity. Participation is voluntary; to participate,
one calls a 1-800 number and enters in numbers to answer questions.
- A poll asks (randomly selected) respondents to indicate whether
they support withdrawal from Iraq. The two possibilities
given are "yes" and "no".
- In 1936, the magazine Literary Digest predicted a 3-2
victory for Landon. They had sent questionnaires (taken from
lists of magazine subscribers, car owners, telephone directors,
and in some cases registered voters) to 10 million people.
- 2.If you ask the following two questions, how do you think the results
would differ? Why?
- Do you believe that the government should increase the amount of
money it directs to the needy?
- Do you believe that the government should increase the amount of
money it directs to welfare recipients?
- 3. Define the following terms:
- margin of error
- conservative margin of error
- simple random sample
- stratified random sample
- cluster samples
- multistage samples
- push polling
- quickie polls
- sampling frame
- response bias
- non-response bias
- selection bias
- representative
- robust / stable / reliable
- 4. what are the advantages to sample surveys? What are the potential
limitations or challenges posed by sample surveys?