The R
Assignments
Ultimately, this is applied statistics course. This means you learn how to gather data, how to frame your research questions, and how to present your findings. Because of its ubiquity in the statistical field, we use the R
Statistical Environment for the analysis part.
R
is free for all. It allows you to show your work (your steps in the research process). It allows you to start an analysis and return to finish it later. It allows you to see — and show — how certain problems can be solved. It allows you to create some stunning presentation graphics. This power and flexibility is only available if you know how to ask R
in its language. It will do exactly what you tell it to do — exactly what you tell it to do.
The primary purpose of this series of assignments is to give you quick, pointed practice in using R
. I expect each of these to take you about 10 minutes. That should be your goal, too. In many ways, showing your skill-level is a function of showing how quickly you can perform elementary calculations… in much the same way as how well you tie your shoes.
The following provide links to — and overviews of — these R
assignments. Pay attention to the course calendar for when these are due.
#4 | Introduction to R |
|
This assignment reinforces what we did in class on Friday. | ||
⟼ The solutions are available |
||
#6 | Measures of Center and Measures of Position | |
This assignment shows that you can easily calculate the measures of center and some measures of position for a set of real | ||
⟼ The solutions are available |
||
#7 | Measures of Spread and the Hildebrand Rule | |
Here, we see if you can calculate the various measures of spread and determine which is the optimal one to use (using the Hildebrand Rule). | ||
⟼ The solutions are available |
||
#8 | Categorical Graphics* | |
By this point, you should assume this R Assignment would check that you can create three basic graphics that help to illustrate categorial variables in R . Those three graphics are the pie chart, the bar chart, and the mosaic plot. However, because of the limitation of your professor, this will review your abilities with descriptive statistics. |
||
⟼ The solutions are available |
||
#9 | Numerical Graphics* | |
Again, you should assume this R Assignment would check that you can illustrate your numerical data. However, this will review your abilities with descriptive statistics. |
||
⟼ The solutions are available |
||
#12 | Discrete Distributions | |
The Binomial, Poisson, and Hypergeometric distributions are covered in this assignment. Specifically, you are to calculate probabilities with these three discrete distributions. | ||
⟼ The solutions are available |
||
#13 | Continuous Distributions | |
As a lead-in to the Normal (Gaussian) distribution, we look at the Uniform and the Exponential distribution. Each describes certain phenomena. As such, we will check that you can calculate the correct probabilities for continuous distributions. | ||
⟼ The solutions are available |
||
#14 | The Normal Distribution, I | |
This R assignment has you handle the CDF for the Normal distribution. Make sure you can calculate these cumulative probabilities: below, above, and between. |
||
⟼ The solutions are available |
||
#15 | The Normal Distribution, II | |
This R assignment has you show that you can calculate quantiles. Recall a p-th quantile is the value of x for which P[X ≤ x] = p. |
||
⟼ The solutions are available |
||
#16 | The Central Limit Theorem | |
Without question, the Central Limit Theorem is the most far-reaching theorem in statistics. It really allows us to ignore the distribution of the data as long as we only care about the distribution of the sample means. Since almost all hypothesis tests concern the mean (expected value), the CLT tells us to focus on the Normal distribution." | ||
⟼ The solutions are available |
||
#19 | Confidence Intervals | |
A confidence interval is a set of reasonable values for the population parameter. Here, you are showing that you are able to calculate them for several parameters and variables." | ||
⟼ The solutions are available |
||
#21 | Hypothesis Testing | |
Now, we look at testing if you can perform simple hypothesis tests. Here, you will need to provide p-values for appropriate hypothesis tests. | ||
⟼ The solutions are available |
||
#23 | Binomial and Proportions Procedures | |
When testing hypotheses about a single population, one will focus on the Binomial test (if each trial has two possible outcomes). If testing hypotheses concerning the relationship between two proportions (p1 and p2), then one will need to use the proportions test… and hope that the sample size is “large enough” for the CLT to allow the Normal approximation for the Binomial distribution to be sufficiently close. | ||
⟼ The solutions are available |
||
#25 | Goodness of Fit Test | |
The Chi-square goodness-of-fit test is used to check if the observed discrete distribution is different from the hypothesized distribution. It was Fisher’s favorite goodness-of-fit test, and he even used it for continuous distributions. Today, we have better goodness-of-fit tests for continuous distrbutions (e.g., the Kolmogorov-Smirnov test)." | ||
⟼ The solutions are available |
||
#26 | Analysis of Variance (a.k.a. ANOVA) | |
ANOVA is used to test one thing: Whether the means of several groups are equal. However, because of equivalent meanings, it is used to test if a categorical variable and a numeric variable are independent. It is also used to test if using the categorical variable aids in understanding the numeric variable. | ||
⟼ The solutions are available |
||
#27 | ANOVA, II | |
Here, I check that you can go beyond the ANOVA procedure. Here, “going beyond” means both that you can determine if ANOVA is appropriate and if you can determine which group mean is different. | ||
⟼ The solutions are available |
||
#30 | Test of Independence | |
The Chi-square test of independence allows one to test if two categorical variables are related. As expected, the null hypothesis is that they are not. | ||
⟼ The solutions are available |
||
#31 | Correlation | |
Correlation tests allow us to determine if two numeric variables are numericallyy related. Values of the corellation coefficient range from -1 (perfect negative correlation) to +1 (perfect positive correlation). | ||
⟼ The solutions are available |
||
#32 | Regression, I | |
Regression extends correlation in that it allows us to determine the effect of the independent variable on the dependent variable. | ||
⟼ The solutions are available |