In this discussion, we'll focus on statistical tests for 2 X 2 tables.
Suppose you have two categorical variables--say, gender (male/female) and
left-handedness. The chi-square test allows us to test whether there
are statistically significant differences in one categorical variable (in
this example, left-handedness) based on the other (in this example, gender).
Suppose, for instance, you wanted to examine whether women are more likely to
be left-handed than men.
You can construct a 2 X 2 table to do that. Such a table is called a
contingency table.
Suppose that 90 men and 110 women are each asked about his or her left-handedness.
Of the 200 total individuals, 20 are left-handed, and 180 are right=handed.
Recall from previous lectures that the null hypothesis, H0,
is generally "no difference" or "no effect". In this case, H0 would
be that there is no association between gender and handedness. Suppose the
researcher has some reason to expect that women are in fact more likely to be
left-handed. In this case, then, H1 (the alternative
hypothesis) would be that women are more likely to be left-handed.
Let's talk first about the "null hypothesis".
If gender and left-handedness are not related, what counts would we
see in the cells of a "contingency table"?
We can start by entering in the total number of men and
women--which is given in the example. And, we can enter in the total
number of individuals who should be left-handed, and the total number who
should be right-handed--based on the information given in the sample.
So, we can set up a partial table with these totals:
Left-Handed | Right-Handed | Total | |
---|---|---|---|
Men | 90 | ||
Women | 110 | ||
Total | 20 (10%) | 180 (90%) | 200 |
Then, the expected count can for each cell can be calculated as:
Row Total X Column Total
______________________ = Cell Value
Total n for table
For example, if there's no correlation between left handedness and gender, then
(out of 110 women, if 10% of people are left handed) we would expect 11
women to be left handed (and 99 women to be right handed).
Given this, we can fill in the rest of the table--
Left-Handed | Right-Handed | Total | |
---|---|---|---|
Men | 9 (10%) | 81 (90%) | 90 |
Women | 11 (10%) | 99 (90%) | 110 |
Total | 20 (10%) | 180 (90%) | 200 |
Of course, these are the expected counts--not the real counts.
How do we calculate a chi-square statistic?
So, in our example of handedness and gender, suppose that in reality,
15 women were left-handed, and 95 women were right-handed. 10 men are
left-handed, and 80 men are right-handed.
We can then enter the (observed-expected)2 / (expected) values into
the table. Our actual contingency table looks like this:
Left-Handed | Right-Handed | Total | |
---|---|---|---|
Men | (10-9)2/(9)=.111 | (80-81)2/(81)=.012 | 90 |
Women | (15-11)2/(11)=1.455 | (95-99)2/(99)=.162 | 110 |
Total | 20 (10%) | 180 (90%) | 200 |
And, we can add the values in the cells to get the chi-square:
X2 = .111 + .012 + 1.455 + .162 = 1.740
Recall that when we discussed t-tests, the question was often
whether there was a significant difference between means (or whether the
mean was significantly different from 0) in the population, based
on what we saw in the sample. That is, we were using inferential
statistics, because we were looking at samples and making generalizations
to populations. In the case of means, we looked at t-tables to
determine whether there were significant differences between our
actual mean and our "null" mean. The t-tables provided a p-value, which
was essentially the probability that we'd get as large or larger of a t
if the null hypothesis was correct--that is, it can be seen as the
probability of incorrectly rejecting the null hypothesis, or (put differently)
accepting the "alternative hypothesis" even though the null hypothesis was true
(in the population). In most cases, the p-value gives you the probability that
you are incorrectly concluding that there is a significant effect or
significant effect in the population--when, in fact, there is no such
significant effect.
In this case, we're doing something very similar--we're looking at a chi-square
table (instead of a t-table), but we'r estill getting a p-value and
the p-value is still giving us the probability of incorrectly rejecting the null
hypothesis.
(from Utts & Heckard)
92 college students were given a form that read "Randomly choose one of the
letters S or Q". Of the 92 students, 66% (61/92) picked S. Another
98 students were given a form that read "Randomly choose oen of the letters
Q or S." Of these students, 46% (45/98) picked S. Is this sufficient evidence
to generalize that individuals are more likely to pick the first letter
mentioned?
The null hypothesis is, as always, no effect.
The alternative hypothesis is that people are more likely to name an option
(out of two) if it is mentioned first--that is, order has an effect on
the answers people give.
So, we want to calculate out: (observed-expected)2 / expected
in each of the cells.
Our observed contingency table looks like:
Q first in Question | S first in Question | Total | |
---|---|---|---|
Respondent Named Q | 53 | 31 | 84 |
Respondent Named S | 45 | 61 | 106 |
Total | 98 | 92 | 190 |
Q first in Question | S first in Question | Total | |
---|---|---|---|
Respondent Named Q | 43.326 | 40.674 | 84 |
Respondent Named S | 54.674 | 51.326 | 106 |
Total | 98 | 92 | 190 |
Q first in Question | S first in Question | Total | |
---|---|---|---|
Respondent Named Q | (53-43.326)2 / 43.326 = 2.160 | (31-40.674)2 / 40.674 = 2.301 | 84 |
Respondent Named S | (45-54.674)2 / 54.674 = 1.712 | (61-51.326)2 / 51.326 = 1.823 | 106 |
Total | 98 | 92 | 190 |
And, the X2 is therefore: 2.160 + 2.301 + 1.712 + 1.823 = 7.996
What is the degrees of freedom? For chi-squares, for 2 x 2 tables, df=1.
In general, degrees of freedom for chi-square statistic = (# rows - 1) X (# columns - 1)
According to a chi-square table, for df=1, and X2=7.996, the
p-value is .005--which means there is only a very, very small chance (5 in 1000)
that, if the "null hypothesis" was correct, there would be a chi-square of
this magnitude. So, we can "reject the null hypothesis" and "accept
the alternative hypothesis that question ordering influences respondents'
answers."
Observed Results | |||
---|---|---|---|
Treatment | Heart Attacks | No Heart Attacks | Total |
Aspirin | 104 | 10,933 | 11,037 |
Placebo | 189 | 10,845 | 11,034 |
Total | 293 | 21,778 | 22,071 |
Expected Results | |||
---|---|---|---|
Treatment | Heart Attacks | No Heart Attacks | Total |
Aspirin | 146 . 520 | 10,890 . 480 | 11,037 |
Placebo | 146 . 480 | 10,887 . 520 | 11,034 |
Total | 293 | 21,778 | 22,071 |
(observed-expected)2 / (expected)
Chi-Square Calculations | |||
---|---|---|---|
Treatment | Heart Attacks | No Heart Attacks | Total |
Aspirin | 12.339 | 0.166 | 11,037 |
Placebo | 12.343 | 0.166 | 11,034 |
Total | 293 | 21,778 | 22,071 |
Statistical significance does not necessarily mean that two variables are
causally related--that is, that one variable causes the other. Generally,
in order to draw conclusions about causation, you need a good theory, and often
contnrols for other factors that might influence your dependent variable.
So, for instance, in the case of left-handedness, it's not clear that being
female "causes" left-handedness--just that there's an association between
being female and being left handed. It may be that there's a third variable --
some sort of genetic predisposition, perhaps, or socialization -- that
is associated with being female, and that causes "left handedness".
That said, the experiments on heart attacks and aspirin might tell us something
about causation, for two reasons: first, medical researchers may have a theory
about what it is about aspirin that would cause fewer heart attacks, and second,
in an experiment, other possible factors influencing heart disease would
presumably randomize out across these two very large, randomly selected groups.
But the association between heart attacks and aspirin itself, in the absence of
other information, doesn't tell us anything about causation, just correlation
(or association, or a relationship).
When a result is not significant -- that is, the p-value is greaster than the
conventionally used threshold of .05, and you therefore can't reject the
null hypothesis of "no effect" or "no relationship"--what does that tell us?
It does not tell us that there's no association between the two variables.
It merely tells us that there isn't enough evidence to conclude that such
a relationship exists in the population.
In other words, even if we found no significant association between left-handedness
and gender, it might still be possible that one exists in the population--
we just don't have the evidence to draw that conclusion.
While all the examples that we've used focus on 2 X 2 tables, one can calculate a chi-square statistic for larger tables. The process is the same:
So, for example: the following table presents data regarding three
types of external occipital protuberance, based on gender.
Type | Women | Men | Totals |
---|---|---|---|
Type I | 427 | 89 | 516 |
Type II | 52 | 94 | 146 |
Type III | 21 | 317 | 338 |
Totals | 500 | 500 | 1,000 |
The following table represents the expected counts:
Type | Women | Men |
---|---|---|
Type I | 258 | 258 |
Type II | 73 | 73 |
Type III | 169 | 169 |
And, then you can calculate out for each cell the value of (observed-expected)2 / (expected).
You can then sum up all of those values of (O-E)2 / E across all
cells of the table (in this case, 6 values would be added up), and perform
a chi-square test. The degrees of freedom = (# columns - 1) * (# rows - 1).
It appears that gender is significantly associated with type of occipital
protuberance.
1. In a retrospective observational study, researchers asked women
who were pregnant with planned pregnancies how long it took them to get
pregnant. Length of time to pregnancy was measured according to the
number of cycles between topping birth control and getting pregnant. Women
were also categorized according to whether or not they smoked, with
smoking defined as having at least one cigarette per day for
at least the first cycle during which they were trying to get pregnant.
The observed counts are as follows:
Observed Results | |||
---|---|---|---|
First Cycle | Two or More Cycles | Total | |
Smoker | 29 | 71 | 100 |
Non-Smoker | 198 | 288 | 486 |
Total | 227 | 359 | 586 |