Chi square

One thing that all of our variables have had in common so far is that all are based on either interval, or ratio data. The next analysis that we will use in this module for hypothesis testing uses nominal data sets.

The Chi Square distribution can be used to test whether observed data differ significantly from theoretical expectations, Frequency observed ( equation image indicator ) vs. Frequency expected (). For example, based on student enrollment, can we detect whether some courses are more popular than others? Examining the enrollment of 70 students in the psychology courses at a local university, we find that 17 students enrolled in Social Psychology, 35 students enrolled in Basic Statistical Analysis, and 18 students enrolled in the Psychology of Learning. Our null hypothesis should state that the number of students observed to be enrolled in courses should not differ from the frequency of students expected to be enrolled in these courses. To evaluate this null hypothesis, we are going to have to calculate the Chi Square equation image indicator :

equation image indicator

Translated into something more manageable we get

equation image indicator one term for every group in the analysis

Step by step it would look like this.

Step 1: calculate the frequency expected

The hypotheses of interest are:

equation image indicator : there is no student preference for courses

equation image indicator : there is a student preference for courses

If there is no preference among students for one course over another, then we would expect the enrollments in the courses to be the same:

equation image indicator

So we expect the enrollment in each course to be 23.3 students. Of course, there can't actually be 23.3 students – we can't have a fraction of a student! But when calculating the expected number of students we generally carry out such calculations to the nearest tenth (i.e. one decimal digit).

We observe that the three courses actually have enrollments of 17, 35, and 18 students – clearly not the expected 23.3. So the question becomes "Are the actual enrollments significantly different from what we would expect if there was no student preference?" In other words, could the observed differences just be chance or is there more to this story?

The easiest way to work through the equation image indicator formula is to set up a table and work through each step one at a time. So far we have:

	Social	Stats	Learning
f_o	17	35	18
f_e	23.30	23.30	23.30

Step 2: calculate: equation image indicator and add it to the table

	Social	Stats	Learning
f_o	17	35	18
f_e	23.30	23.30	23.30
f_o- f_e	-6.30	11.70	-5.30

Step 3: calculate: equation image indicator and add it to the table

	Social	Stats	Learning
f_o	17	35	18
f_e	23.30	23.30	23.30
f_o- f_e	-6.30	11.70	-5.30
(f_o- f_e)²	39.69	136.89	28.09

Step 4: calculate: equation image indicator and add it to the table

	Social	Stats	Learning
f_o	17	35	18
f_e	23.30	23.30	23.30
f_o- f_e	-6.30	11.70	-5.30
(f_o- f_e)²	39.69	136.89	28.09
(f_o- f_e)²/ f_e	1.70	5.88	1.21

Step 5: calculate the sum of equation image indicator for all your groups. This is

equation image indicator

Step 6: compare the calculated equation image indicator with the critical

For the equation image indicator our df = K-1 where K is the number of categories (in this case k = 3 - the 3 classes)

For our example, df=2 at the alpha level of 0.05

Using the table below, what is our critical equation image indicator ? text annotation indicator

Chi Square Statistic graph.PNG

So one more time, before we move on...what is your decision text annotation indicator ?

Chi Square table.jpg

PRACTICE PROBLEM 10

After research into traffic violations, a hypothesis was formed stating that police citations for traffic violations are a function of the age of the driver. A random sample of 100 drivers was selected from the population of cited traffic violators during a one-month period. The drivers were categorized according to age, and the frequencies for each category were established.

18-27	28-37	38-47	48-57	58 and Over
35	25	15	13	12

Using the Chi Square procedure, evaluate the null hypothesis that there is no difference in traffic violations as a function of age of driver.

ANSWER

return to top | previous page | next page