##### SCA-22 ##### ##### Two-Sample Proportions Tests ##### ### This gives a few examples of the analysis process for comparing ### two population proportions. ### Preamble # Import extra functionality source("http://rfs.kvasaheim.com/stat200.R") ### Part I: Hats # # I would like to know if males tend to wear hats more often # (GREATER) than females. # # To determine this, I measure the number of males wearing # hats (and the number of males in my sample) and the number # of females wearing hats (and the number of females in my # sample). # # Of the 57 males I measured, 15 wore hats. Of the 92 females # I measured, 18 wore hats. prop.test(x=c(15,18), n=c(57,92), alternative="greater") # n.b. Here, we are not relying on the alphabet; we were able # to explicitly state which came first in our mind. Thus, # the alternative hypothesis here is p_male > p_female. # # According to the proportions test, we were not able to conclude # males wear hats more frequently than females (p-value = 0.2232). binom.plot( x=c(15,18), n=c(57,92), names=c("Male","Female"), ylab="Proportion Wearing Hats") ### Part II: Politics # # I would like to know if males tend to be Republican more # (GREATER) than females. # # To test this. To do so, I ask n=1056 males and n=1056 # females who identified as either Democrat or Republican # their political party affiliation. Of that sample, a total # of 656 males and 563 females identified as Republican. prop.test(x=c(656,563), n=c(1056,1056), alternative="greater") # According to the proportions test, we are able to conclude # males tend to be Republican more frequently than females (p-value = # 0.0000). # # In fact, we are 95% confident that the gender gap is greater # than 5.19%. binom.plot( x=c(563,656), n=c(1056,1056), names=c("Male","Female"), ylab="Proportion Identifying as Republican") # SOURCE: http://news.gallup.com/poll/120839/Women-Likely-Democrats-Regardless-Age.aspx ### Part III: Politics # # I would like to know if the Galesburg Congressional district (IL-17) # tends to be more Democratic than the Stillwater, OK, Congressional # District (OK-3). # # To test this, I randomly call 2000 people in IL-17 (less expensive) # and 500 people in OK-3. Here are the results: # # District: IL-17 | OK-3 # Democrats: 1030 | 185 # Sample Size: 2000 | 500 prop.test(x=c(1030,185), n=c(2000,500), alternative="greater") # Conclusion: Because the p-value is approximately 0, we reject # the null hypothesis. We conclude that IL-17 is significantly # more Democratic than OK-3. We are 95% confident that IL-17 # is at least 17% more Democratic than OK-3. binom.plot(x=c(1030,185), n=c(2000,500), xlab="Congressional District", ylab="Democratic Support", names=c("IL-17","OK-3"), ylim=c(0.3,0.6), boxcol="lightblue" ) # Information: https://en.wikipedia.org/wiki/Cook_Partisan_Voting_Index