##### SCA-42b ##### ##### Correlation and Regression ##### ### This gives a single example of using regression to answer a ### question of interest. ### Preamble source("http://rfs.kvasaheim.com/stat200.R") ### The Topic: Unfairness in Ruritanian election # Is there are relationship between the invalidation rate and the candidate # support rate in the 2017 Ruritanian presidential election? dt = read.csv("http://rfs.kvasaheim.com/data/xr2017pres.csv") attach(dt) summary(dt) # create the proportion variables VALID = TOTAL-INVALID pInv = INVALID/TOTAL pKuz = KUZNECOV/VALID mod1 = lm(pInv~pKuz) summary(mod1) # or # cor.test(pInv,pKuz) # There is a statistically significant relationship between the invalidation rate and # the support level for Kuznecov in the 2017 Ruritanian presidential election (p-value = # 0.03). A 95% confidence interval for that correlation is from -0.039 to -0.624, with # a point estimate of -0.367. # # Electoral divisions with higher support for Kuznecov also had a lower invalidation rate. # This is consistent with ballot box stuffing and differential invalidation in favor # of Kuznecov. ## Graphic plot(pKuz,pInv) # better plot(pKuz,pInv, xlim=c(0,1), ylim=c(0,0.1), xlab="Kuznecov Support Level", ylab="Rejection Rate") # even better plot(pKuz,pInv, xlim=c(0,1), ylim=c(0,0.1), xlab="Kuznecov Support Level", ylab="Rejection Rate") abline(mod1) # pretty graphic par(mar=c(3,3,0,0)+1) par(las=1) par(family="serif") par(xaxs="i", yaxs="i") par(cex.lab=1.2, cex.font=2) plot.new() plot.window( xlim=c(0,1), ylim=c(0,0.06) ) axis(1, at=0:10/10) axis(2, at=0:6/100) title(xlab="Kuznecov Support Level", line=2.5) title(ylab="Rejection Rate", line=3) abline(mod1, col="grey") points(pKuz,pInv, pch=21, bg="honeydew2")