##### SCA-32
##### 
##### Chi-Square Goodness-of-Fit Test
##### 

### This gives a few examples of the analysis process for testing
### for goodness of fit



### Preamble

source("http://rfs.kvasaheim.com/stat200.R")


### Example 1: Majors
#
#   Are the four major types equally represented in my first STAT 200 
#   class at Knox?
#
#   The tabulated data of a random sample from that first class are
#
#   Major: MNS | HUM | HSS | ART |
#   Count:  18 |  10 |   9 |   9 |

majors = c(18, 10, 9, 9)

mtypes = c("MNS","HUM","HSS","ART")
binom.plot(majors, ylim=c(0,0.6), names=mtypes)

chisq.test(majors)

### Conclusion: Because the p-value of 0.1750 is greater than our 
#   selected alpha, 0.05, we cannot reject the null hypothesis that
#   the majors are equally represented in that first STAT 200 course.





### Example 2: Die
#
#   Is my six-sided die fair?
#
#   To test this, I roll it 1000 times, keeping track of the die face
#
#   The tabulated data are
#
#   Face:     1 |   2 |   3 |   4 |   5 |   6
#   Count:  165 | 182 | 161 | 197 | 131 | 164 

results = c(165,182,161,197,131,164)

binom.plot(results, ylim=c(0,0.4))

chisq.test(results)

### Conclusion: Because the p-value of 0.0112 is less than our 
#   selected alpha, 0.05, we should reject the null hypothesis that
#   the faces of the die come up with equal frequency, that the die
#   is fair.




### Example 3: Grades
#
#   Do my STAT200 grades tend to follow a Normal distribution?
#
#   To check, I record the current grades for all of my current
#   STAT200 students and summarize them (assuming they are representative).
#
#   I also determine the grade distribution that follows a 
#   Normal distribution.
#
#   Both are in this table:
#
#   Grade:                   A |    B |    C |    D |    F 
#   Observed Count:          5 |   12 |   10 |    3 |    1   
#   Expected Proportion:  0.10 | 0.25 | 0.30 | 0.25 | 0.10 
#

obs = c(5,12,10,3,1)
exp = c(0.10,0.25,0.30,0.25,0.10)

chisq.test(obs,p=exp)

binom.plot(obs, names=LETTERS[1:5])
points(1:5, exp, pch=3, col="saddlebrown")

### Conclusion: Because the p-value of 0.0960 is greater than our
#   selected alpha value of 0.05, we should not reject the null
#   hypothesis. There is no significant evidence that the distribution
#   of my STAT200 grades deviates from the Normal curve.




### Example 4: Representativeness
#
#   I collected data and would like to check if they are not
#   representative of the population. The data consist of the
#   student's class (and other things). 
#
#   To check, I see if the class distribution in my data is
#   sufficiently close to that of Knox College. 
#
#   The following are the observed and expected values:
#
#   Year:                   1st |   2nd |   3rd |   4th 
#   Observed Count:          41 |    21 |    21 |    17    1   
#   Expected Proportion:  0.259 | 0.275 | 0.228 | 0.238 
#

obs = c(41,21,21,17)
exp = c(0.259,0.275,0.228,0.238)

chisq.test(obs,p=exp)

binom.plot(obs, ylim=c(0,0.5))
points(1:4, exp, pch=3, col="saddlebrown")
title(xlab="Class Year")

#   CONCLUSION: Because the p-value of 0.0061 is less than our 
#   chosen alpha of 0.05, we reject thenull hypothesis that the
#   observed frequencies follow the hypothesized distribution.
#   This means our sample is not representative of the population 
#   in terms of class level.