The SCA Procedure
Doing real statistics requires actually doing statistics on real data. That means using a computer and a statistical program. After trying many different programs, R
matches my needs as an analyst most closely. It also allows me to easily check your understanding of statistical techniques because it requires you to provide the script (a.k.a. you show your work).
Part I: The Install
Linux, Mac, and Windows Instructions
The R
Statistical Environment installs like the typical piece of computer software: download it, click on the installer, and follow the installation directions.
- Go to http://cran.r-project.org/.
- Click on the link for your operating system (Linux, MacOS, Windows).
- Click on “install R for the first time” if you are using Windows, and “R-4.3.2.pkg” if you are using High Sierra (10.13) or newer. If you have an older version of Mac OS, you will need to use the instructions for Chromebook (below). Note that the “4.3.2” represents the current version number, which changes from term to term. Version “4.3.2” was released on October 2023.
Note that the version I have on my Windows office desktop is 4.0.2. The version I have on my Mac laptop is 4.0.2. The version I have at home is 3.3.2. In other words, version updates are frequent, but not important for the work we do. - Install as you would any piece of software. The defaults are fine.
That is the end of the first part. By this point, you have R
installed on your computer. In the third part, we will ensure that you did this correctly.
Chromebook and iPad Instructions
Because Chromebooks and iPads do not (easily) allow you to download software, you will not be able to use R
without being connected to the Internet. You will use R
at this URL:
https://euclid.knoxds.org/rstudio/
Note that security settings require you be on campus to use this link.
To help ensure that your experience with R
is akin to what your classmates will experience, please make the following changes to the Global Options... under “Tools” Use the Basic tab under the General section.
- Uncheck: Restore more recently opened project at startup
- Uncheck: Restore .RData into workspace at startup
- Save workspace to .RData on exit → Never
In other words, make sure your Options page looks like this.
Part II: The Folders
My experience is that 80% of the problems in doing statistics comes from not knowing where files are located and not understanding the importance of the working directory. Creating this directory structure will help avoid those issue… but only if you actually follow through and use it.
- Create a folder for this course,
STAT200
. This is where you will save all of your work for this course. This is a good habit to get into for all courses (and projects) you are a part of. - In that folder, create the following subfolders:
sca
labs
practicums
These subfolders have these names because these are the main “aspects” of this course. Other courses will have different subfolders because they have different structures.
Part III: Testing the Install
Now, let us double-check that you did everything correctly.
- Start
R
. Note that the window that opens is called the “Console” window. You know this because the window title is “R Console.” - Open a new script window (it will be titled “untitled” until it is saved).
- Type the following into that script window:
### Test Script # Set prng seed set.seed(3) # Define variables x = c(1, 6, 7, 2) y = runif(4) z = x+y # Sample statistics mean(x) median(y) sd(z)
- Save this into your
sca
folder as “0-testScript.R
.” - Quit
R
. It is important to quit because that actually ends it in your computer’s current memory. In Windows, just click on the red X (top-right of the window). In Mac, just pressCtrl + q
. Online, click on the x next to the script name, then close the browser window. - Restart
R
. - Click on “Open Script” (Windows) or “Open Document” (Mac) or “Open File” (online). This can be found in the menu under “File.”
- Open the script called “
0-testScript.R
.” - Highlight all of the lines.
- Press
Ctrl + r
(Windows) orCmd + Return
(Mac) orCtrl + Enter
(online). - Look at the blue numbers (Windows) or black numbers (Mac).
- They should be (in order and ignoring the echo from what you sent to the Console window):
[1] 4 [1] 0.3563383 [1] 3.132832
Note that the “[1]
” parts of the outputs indicate that this is the first value outputted in this particular list of values.
Part IIII: The Working Directory
So, you now have the R
Statistical Environment installed on your computer. You have a folder structure designed to help you understand the importance of keeping projects separate. The last thing we have to cover is the “working directory.”
The working directory is the default folder from which R
will load data and/or save files. The working directory is (by default) the folder from which you start R
. In Windows, opening R
from the Start menu will open it in your Documents folder. In Mac, opening R
from the Launchpad will open it in your folder.
The key is that you want to open R
in your working directory. In Mac OS (and online), this is rather easy. In Windows, it is less so.
-
Windows:
Save theR
environment (.RData
) to your working directory. Double-click on this to openR
in this directory.
Note that copy-pasting will be helpful here. Copy-paste the big-blue-R-icon file from one folder to another to make this procedure easier. The big-blue-icon looks like the image to the right. There must not be a small arrow in the bottom-left of the icon (this would be a shortcut which will openR
in its default folder, which you do not want). - Mac OS and Online:
Save a script (extension of.R
or.r
) in your working directory. OpenR
by double-clicking on this file to open in this directory.
In all cases, R
will open in your working directory. This allows you to keep all important information for the project in a single folder. Following this structure and procedure will help you structure your analyses.
Part V: You are Done
This is the end of the first Statistical Computing Activity (SCA). There is a lot of information available to you on the R
Statistical Environment. A broad Internet search will usually land you in the right place.
That is where I start for a better understanding of any procedure in R
. For instance, if I need to test a single population median, then I will tend to go directly to an Internet search. With that said, here are a few sources I prefer when learning about new R
capabilities:
- CRAN FAQs
- R Bloggers (blog)
- R Graphics Cookbook, 2nd edition (online book)
- The R Graph Gallery (blog)
- Data Visualization in R (online course)
In other words, the truth is out there.
You just need to look for it.