##### SCA-32 ##### ##### Chi-Square Goodness-of-Fit Test ##### ### This gives a few examples of the analysis process for testing ### for goodness of fit ### Preamble source("http://rfs.kvasaheim.com/stat200.R") ### Example 1: Majors # # Are the four major types equally represented in my first STAT 200 # class at Knox? # # The tabulated data of a random sample from that first class are # # Major: MNS | HUM | HSS | ART | # Count: 18 | 10 | 9 | 9 | majors = c(18, 10, 9, 9) mtypes = c("MNS","HUM","HSS","ART") binom.plot(majors, ylim=c(0,0.6), names=mtypes) chisq.test(majors) ### Conclusion: Because the p-value of 0.1750 is greater than our # selected alpha, 0.05, we cannot reject the null hypothesis that # the majors are equally represented in that first STAT 200 course. ### Example 2: Die # # Is my six-sided die fair? # # To test this, I roll it 1000 times, keeping track of the die face # # The tabulated data are # # Face: 1 | 2 | 3 | 4 | 5 | 6 # Count: 165 | 182 | 161 | 197 | 131 | 164 results = c(165,182,161,197,131,164) binom.plot(results, ylim=c(0,0.4)) chisq.test(results) ### Conclusion: Because the p-value of 0.0112 is less than our # selected alpha, 0.05, we should reject the null hypothesis that # the faces of the die come up with equal frequency, that the die # is fair. ### Example 3: Grades # # Do my STAT200 grades tend to follow a Normal distribution? # # To check, I record the current grades for all of my current # STAT200 students and summarize them (assuming they are representative). # # I also determine the grade distribution that follows a # Normal distribution. # # Both are in this table: # # Grade: A | B | C | D | F # Observed Count: 5 | 12 | 10 | 3 | 1 # Expected Proportion: 0.10 | 0.25 | 0.30 | 0.25 | 0.10 # obs = c(5,12,10,3,1) exp = c(0.10,0.25,0.30,0.25,0.10) chisq.test(obs,p=exp) binom.plot(obs, names=LETTERS[1:5]) points(1:5, exp, pch=3, col="saddlebrown") ### Conclusion: Because the p-value of 0.0960 is greater than our # selected alpha value of 0.05, we should not reject the null # hypothesis. There is no significant evidence that the distribution # of my STAT200 grades deviates from the Normal curve. ### Example 4: Representativeness # # I collected data and would like to check if they are not # representative of the population. The data consist of the # student's class (and other things). # # To check, I see if the class distribution in my data is # sufficiently close to that of Knox College. # # The following are the observed and expected values: # # Year: 1st | 2nd | 3rd | 4th # Observed Count: 41 | 21 | 21 | 17 1 # Expected Proportion: 0.259 | 0.275 | 0.228 | 0.238 # obs = c(41,21,21,17) exp = c(0.259,0.275,0.228,0.238) chisq.test(obs,p=exp) binom.plot(obs, ylim=c(0,0.5)) points(1:4, exp, pch=3, col="saddlebrown") title(xlab="Class Year") # CONCLUSION: Because the p-value of 0.0061 is less than our # chosen alpha of 0.05, we reject thenull hypothesis that the # observed frequencies follow the hypothesized distribution. # This means our sample is not representative of the population # in terms of class level.