7964
Lecture 5
Lecture 5 shifts the focus a bit to statistics used in forensics.
Much of this will be an introduction--and in a couple of upcoming lectures
we'll revisit some of what we've already discussed, but in a more thorough
fashion, and applied to forensics.
Statistics can be used for two distinct purposes. First,
as Lucy points out, they can be used to count up (and describe) types of events or entities, for
economic, social, and scientific purposes. Second, they can be used
to provide a guide to an uncertain world.
This is very simliar to the distinction we've already
made between descriptive and inferential statistics,
and to populations and samples. If you want to describe public opinion in
the United States, you could just ask every citizen whether they approved
of George Bush's job performance. But of course that wouldn't
be feasible--so we take a sample. We are interested in "counting
up" the number of citizens who approve of the President's job
performance (and the number who do not, and the number who are
undecided, etc.), but at the same time, we are using statistics
to make some expression about uncertainty.
I. A Bit of History
Population statistics date to John Gaunt's (1620-74) publication
of
Observations Made Upon the Bills of Mortality (a manuscript
that focuses on assessing the mortality caused by the plague in four
timepoints: 1592-93, 1603, 1625, and 1636. He puts
the mortality data into tables, and performs basic calculations--
comparisons of proportions (proportion of those who died to those
who were christened, proportion of those who died from the plague
to those who died, and so on).
Likewise, the first common recognition of "probability" as an important
concept becomes evident at the same time (during the Scientific Revolution
of the late 17th century). For example, correspondence between Pascal
and Fermat in the mid 1600s make reference to probability in the context
of gambling.
II. Types / Meanings of "Probability"
Several debates existing probability are directly relevant to the question
of forensics.
There are several relevant terms here:
- aleatory v. epistemic
- Aleatory simply means "by chance"; aleatory probabilities are those which
are known because there is full information. We know the
probability--we don't need to determine it by observing repeated events.
For example, if you have a six-sided die that you are sure is fair,
then you know that the probability of rolling a six is 1/6. This is
a deduced probability. Note, however, that in reality, there
may never be something like complete knowledge--the die may not be
completely fair, etc. So "aleatory probabilities" is really a theoretical
construct.
Note that "deductive reasoning" refers to inferences
where the conclusion is just as certain as the premises.
-
Epistemic probabilities, on the other hand, are those that we can
by observation induce knowledge of the system. If one
were to examine a representative group of students, and discovered
that 10% of them had visited FaceBook, one would have some knowledge
of the structure of FaceBook use--but, because we did not survey
every single person in the population, we would be uncertain
about that estimate. That uncertainty can be quantified (recall
our discussion about margins of error.)
With epistemic probabilities, we're employing the uniformitarian
assumption: that the processes in the present will be similar to the
processes in the past.
Note that inductive reasoning produces a conclusion
that has less certainty than the premises (that is,
we're certain of the probabilities in the sample--
but we're estimating the probabilities in the
population.
Recall that we said that there were no true, absolute
aleatory probabilities. All probabilities are to
some degree based on observation -- all probabilities are
essentially frequencies. (Note: this is very
similar to previous discussions of samples versus populations).
- realist v. idealist
-
So, given that all probabilities are to some
degree frequencies--that they can only be
estimated based on observing repeated events--
does it even make sense to talk about the
"probability" that a single event will happen?
That is, to use the example of FaceBook
described above, does it even make sense to essentially say that
"if I ask exactly one student, she will be a
facebook visitor one time out of ten?"
In order to think of a single event quantified
by a probability, we think about probabilities
in an "idealist" sense--or subjective
probability. It's an interpretation of probability
which allows that the probability really doesn't
exist in an objective fashion.
-
A realist interpretation of probability,
on the other hand, relies on frequencies
that are derived from long runs of events.
Realist probabilities also acknowledge that
there is uncertainty--but think of the
uncertainty as something that can be modeled, or
explained.
In reality, of course, all statisticians are willing
to make probabilistic statements about single events--
so all statisticians are to some degree "idealistic".
- Frequentist v. Bayesian
But this debate between "idealism" and "realism"
mirrors a conflict between the frequentist and
Bayesian approaches.
- Frequentists tend to argue against
subjective probabilities--and for long-run
frequency based interpretations of probability.
- Bayesians are in favor of subjective
probability--and are more confident about
discussing the probability of a single event.
An example: Forensic scientists doing repeated
experiments in a lab--or pollsters conducting surveys
of thousands of individuals--would be "frequentists".
But assessing events in a criminal case is a single
event which therefore demands a Bayesian approach.
But keep in mind that even those who adopt a Bayesian
approach are to some degree almost always relying on
what they know about repeated events. So the distinction
can be thought of as a philosophical distinction--and a
continuum of the degree to which one is relying on what one
knows based on previous events.
- instrumentalist view
The instrumentalist view sees probability as
just a useful device to employ when talking about
uncertainty.
III. Questions
Define and explain the significance of the following terms,
and provide an example (hypothetical or real):
- descriptive and inferential statistics
- populations and samples (why can almost
everything be considered a "sample"?)
- population statistics
- aleatory versus epistemic
- deduction versus induction
- uniformitarian assumption
- realist v. idealist
- subjective probability
- frequentist v. Bayesian
- instrumentalist