##### Categorical dependent variables ##### ##### Script: sca-52.R ##### ## The purpose of this script is to show you how to perform ## hypothesis tests and calculate confidence intervals ## concerning population proportions. ## ## While working through this script, keep the following ## workflow in mind: ## I have to test the proportion of some samples ## I need to assume the counts come from a Binomial distribution ## 1 proportion 1 population: binom.test() ## 2 proportions 2 populations: prop.test() ## >2 proportions 1 population: chisq.test() ## ##### ## Example 1a: I would like to estimate the proportion of Oklahomans ## who plan on voting for the Republican candidate. To ## determine this, I ask a random sample of 100 people. ## of those people, 60 state they will vote for the ## Republican candidate. ## ## What is a central 95% confidence interval? binom.test(60,100) ## I am 95% confident that the true proportion of Oklahomans ## planning on voting for the Republican candidate is between ## 49.7 and 69.7%. ## Example 1b: My friend states that the Democratic candidate will win ## Oklahoma. Does the data support her? binom.test(60,100, p=0.5,alternative="less") ## Because p=0.9824 > 0.05=alpha, the data support the null ## hypothesis, not my friend's research hypothesis. ##### ## Example 2: I hypothesize that the proportion of males on the OSU- ## Stillwater campus is the same as that on the OSU-Tulsa ## campus. To test this, I sample 100 people from OSU- ## Stillwater and 50 people from OSU-Tulsa. 50% of the ## OSU-Stillwater people were male, but only 40% of the ## OSU-Tulsa people were. prop.test( c(50,20), c(100,50) ) ## Because p=0.3253>0.05=alpha, we cannot reject the null ## hypothesis. We conclude that there is not sufficient ## evidence for concluding that the proportion of males ## differs on the two compuses. binom.test(50,100) ## We are 95% sure that the proportion of males at OSU- ## Stillwater is between 39.8 and 60.2%. binom.test(20,50) ## We are 95% sure that the proportion of males at OSU- ## Tulsa is between 26.4 and 54.8%. prop.test( c(50,20), c(100,50) ) ## The central 95% confidence interval for how much ## greater the proportion of males is at OSU-Stillwater ## than at OSU-Tulsa is from -8 to 28%. ##### ## Example 3: I hypothesize that my four-sided die is fair. To test ## this, I will it 700 times and record my results in ## the following table: ## ## | 1 | 2 | 3 | 4 | ## --------------------------------- ## | 168 | 152 | 192 | 188 | ## ## Do the data support my hypothesis? chisq.test( c(168,152,192,188) ) ## Because p = 0.1156>0.05=alpha, we cannot reject the ## null hypothesis. We cannot conclude that the die ## is unfair. ## More generally, the command would be chisq.test( c(168,152,192,188), p=c(0.25,0.25,0.25,0.25) ) ## However, since the *expected* proportions are equal (according ## to the null hypothesis), R does not require you to give the second ## slot. ##### ## Example 4: M&Ms states that the distribution of colors is ## ## Blue Brown Green Orange Red Yellow ## 24% 13% 16% 20% 13% 14% ## ## To test this, I get a bag of 100 M&Ms, measuring the ## frequency of each color. The results are tabulated ## in this table ## ## Blue Brown Green Orange Red Yellow ## 20 10 20 22 17 11 ## ## ## Do the data support the corporate contention? chisq.test( c(20,10,20,22,17,11), p=c(0.24,0.13,0.16,0.20,0.13,0.14) ) ## Because p=0.489 > 0.05 = alpha, we fail to reject the null ## hypothesis and conclude that there is not sufficient evidence ## for claiming that the corporate claim is incorrect. In fact, ## 95% confidence intervals for the proportion of each color are: ## Blue 0.126 to 0.292 ## Brown 0.049 to 0.176 ## Green 0.126 to 0.292 ## Orange 0.143 to 0.314 ## Red 0.102 to 0.258 ## Yellow 0.056 to 0.188