Homework Assignment 8
The purpose of this assignment is to give you practice in doing (and realizing why you may need to do) weighted least squares regression. Recall a purpose of weighted least squares regression is to allow for the data values to have different levels of precision. This is important when the data are generated from cluster processes, such as in voting. People who live near each other tend to vote similarly. When votes are aggregated, the vote counts are not Binomially distributed; the votes (Bernoulli random variables) are not independent.
A Ruritanian Update
Our patron, King Rudolf II, died. His grandson, Rudolph V, ascended the Leonine Throne. However, instead of continuing the autocracy, King Rudolph V decided to introduce democracy to the people. Whether or not this was done to increase the probability that Ruritania joined the European Union depends on whom you ask. Regardless, the first (allegedly) free and fair elections were held just one month into Rudolph’s reign. The election pitted his foreign minister, Vasilij Vasiljevič Kuzněcov, against the leader of the opposition, Saša Ivanović.
Kuzněcov won handily. Ivanović gracefully conceded on election night just before taking a vacation in Copenhagen. No one in the government suggested the presence of electoral fraud. The state-run newspaper, Řurité Noviny, lauded the outcome of the election and the maturity of the voters:
Voliči jsou splatné v jejich demokracii. Mají zru šil osobní prospěch a hlasovali pro nejlepší budoucnost této země. Ať žije Rudolph!
Regardless of this praise, Ruritania’s exile community in Denmark claimed the election was fraudulent and that the government stuffed many of the ballot boxes, thus ensuring Kuzněcov’s win.
To show the world that Ruritania is dedicated to democracy, King Rudolph V hired us to determine if there is evidence that the election was unfair in favor of Kuzněcov. Because of the data available, we are able to do an important test. Election theory tells us that if the election was unfairly in favor of a candidate then the correlation between the invalidation rate and the candidate support rate would be significant and negative.
And so, that will be our goal: Analyze the Ruritanian presidential election of 2017.
The Assignment
For this assignment, use the xr2017pres
dataset located in the expected place:
https://rfs.kvasaheim.com/data/xr2017pres.csv
This dataset contains the official results from the 2017 presidential election in Ruritania held between candidates Kuzněcov and Ivanović. The numbers refer to counts. There is no need to adjust the models below beyond what I request.
- Create the variable
pInv
as the ratio of the rejected to the total: - Now, create the variable
lInv
, which is the logit of thepInv
variable: - Next, fit the logit model using weighted least squares (WLS) regression. Weight on the square root of the number of votes cast (
TOTAL
). Call this modelmodA3
. Provide the regression table and briefly interpret the results.
- Finally, fit the data using Binomial regression (GLM). Call this model
modA4
. Provide the regression table and briefly interpret the results.
- Create one graphic with the data shown and the regression curves for each of the models you fit. Use different colors for the curves. Provide a legend so that the reader knows which color corresponds to which model.
- For each of your models, predict the invalidation rate for a division in which the support for Ivanović is 60%. Also give 95% confidence intervals in all models.
- Finally, let us do what we were hired to do. Is there evidence of unfairness in this election? If so, does it favor Kuzněcov or Ivanović? Explain your answer using evidence and a nice-looking graphic. Make sure you are thorough and make sense.
pInv = INVALID/TOTAL
This variable is the invalidation rate. Similarly, create the variable pCnd
as the ratio of the number of votes for candidate Kuzněcov to the number of valid votes:
pCnd = KUZNECOV/(TOTAL-INVALID)
This variable is the candidate support rate. Fit a simple linear regression model using ordinary least squares regression with the invalidation rate being the dependent variable and the candidate support rate being the explanatory variable. Name this model modA1
. Provide the regression table and briefly interpret the results (a sentence or so).
lInv = logit(pInv)
Fit this logit model, modA2
, with the logit of the invalidation rate as the dependent variable and the candidate support rate as the independentvariable. Again, use ordinary least squares regression. Provide the regression table and briefly interpret the results.
Do Not Forget
Do not forget to include your code in the appendix. I should be able to run your code and achieve the same results as you.