7964
Lecture 6

Lecture 6 focuses on distributions. There's a fair amount of review here.

I. Review--Types of Variables & Data

We've already talked about types of variables--categorical (or nominal), ordinal, and interval. Let's review.

Categorical or nominal variables are those which are categories--without order. Therefore, "religion" could be a categorical variable within a data set:
- Those who are Protestant coded as 1
- Those who are Jewish coded as 2
- Those who are Catholic coded as 3
- Those who are Muslim coded as 4
- Those who are "other" (not Protestant, Jewish, Catholic, or Muslim) coded as 5
Note--when you are creating variables and coding data, the categorization should be exhaustive--that is, each datapoint should be able to be coded. That is why, in the above example, an "other" category is necessary.

Generally, it is often important that data be coded into mutually exclusive systems--that is, you wouldn't want to code a variable system such as the following:
- Those who are Christian coded as 1
- Those who are Protestant coded as 2
- Those who are Jewish coded as 3
- Those who are Muslim coded as 4
- Those who are "other" (not Christian, not Protestant, not Jewish, and not Muslim) coded as 5
because you wouldn't be sure whether to code someone who is Protestant as a 1 or a 2.

Note there is no inherent order across the religions--the numbers are merely used to classify the religions.
Ordinal variables are variables which are categories, and which have a certain order. However, one cannot say for sure that the distances between levels or points on the scale are equal to each other. "Ordinal" just means "rank-ordered". For example, one could code finishers in a race--first, second, third, and so on. But the distance between the first and second place finisher isn't necessarily the same as the distance between the third and fourth place finisher.

Likert scales which are often used in survey research, are generally considered to be ordinal scales. A Likert scale is a rank-ordered scale, generally used to measure attitudes.

For example, you could ask the following survey question:
- To what degree do you agree with the statement "statistics is fun!"
```
1     Strongly Agree
2     Agree
3     Neither Agree nor Disagree
4     Disagree
5     Strongly Disagee
```
This is an ordinal variable because there's no assumption that the distance between 1 and 2 is the same as between (for instance) 2 and 3, 3 and 4, and so on.
Interval / Ratio variables are those which are categories--but where the distance between any two categories is the same as the distance between any other two categories.

For instance--one could code as a variable the number of children that a survey respondent has. The difference between one child and two children is the same as the difference between two children and three children, four children and five children, and so on. (Of course, the difference in the effect of going from one child to two children may be different in the effect of going from two children to three children....but the actual distance is the same).
Continuous Versus Discrete Data
- Continuous data are data that can be broken down into smaller parts and still have meaning. That is, they can take on any value in an allowed range.
- Discrete data are data that can't be broken down into smalelr units--they have to be thought of in terms of whole numbers. Categorical and ordinal variables are generally considered discrete data.
Factors are variables that are used to classify other variables. Factors are usually either categorical or ordinal variables, but can be interval-level. An example would be a a restaurant database--the variable "number customers served" can be classified by another variable, "day of the week". Day of the week could have values ranging from 1 to 7. (Is it a categorial, ordinal, or interval variable? Why?) Number of customers served

II. Review--Ways to Present Data

One can present data in several ways:

Through a frequency table (which is a table that, for a particular variable, gives the number and percentage of cases in a dataset that have each possible value),
a bar chart (a chart with vertical bars showing the frequencies for each possible value of that variable),
or a histogram (which is the same as a bar chart-- but the vertical bars are contiguous, without space between them).

III. Questions

An example above was of a restaurant database, with two variables: "number of customers served" and "day of the week" (coded 1-7).
- Is the "number of customers served variable" continuous or discrete? Is it categorical, ordinal, or interval? Why?
  - Click here for the answer.
- Is the "day of the week" variable continuous or discrete? Is it categorical, ordinal, or interval? Why?
  - Click here for the answer.
- What sort of variable is a DNA profile?
  - Click here for the answer.
- Would you imagine that height in a population of college aged female students is a unimodal variable?
  - Click here for the answer.
- If one measures height of female college students, and then height of male college students, what type of variable is "sex" being used as?
  - Click here for the answer.
- This question is from Lacy (2006): If you make five measurements of refractive index from each of 20 fragments of glass from a single window pane. Is this set of measurements a population-- or a sample? Why?
  - Click for the answer.

7964Lecture 6

I. Review--Types of Variables & Data

II. Review--Ways to Present Data

III. Questions

7964
Lecture 6