SCA 02b

SCA 02b: Measures of Precision

Purpose

The purpose of this activity is to show you how to have R calculate the sample statistics for you. Remember that the computer is useful for doing the calculations. You are useful for deciding which statistics to use and how to interpret them. FOCUS ON THAT.

Functions

In this SCA, we will be using the following functions in R. It is useful to keep track of where you were introduced to the functions. By the end of the SCA, you should be able to explain what these functions do.

Those functions with a * are only available if this line is run in your script:
source("http://rfs.kvasaheim.com/stat200.R")
Note that you should not include the * when using those functions.

 


The SCA Procedure

Doing real statistics requires actually doing statistics on real data. That means using a computer and a statistical program. After trying many different programs, R matches my needs as an analyst most closely. It also allows me to easily check your understanding of statistical techniques because it requires you to provide the script (a.k.a. you show your work).

Part O: The Preparations

The following are common start-up instructions. You will want to always follow them when starting analyses.

  1. Start R and open a new script.
  2. Now, since we will be using some special R functions that do not exist in the base R package, we will need to import them. Making sure you have an Internet connection, run this line:
    source("http://rfs.kvasaheim.com/stat200.R")
    When you run this line, R goes to the URL you specified and runs that code. Here, the code only imports several helpful functions. From now forward, I will assume you run this line for every script in this course.
  3. Load the “crime data set” using the following two lines.
    dt = read.csv("http://rfs.kvasaheim.com/data/crime.csv") attach(dt)
    The first line loads the data into the variable dt. The second line “attaches the data,” which makes it easier for us to access the variables in the data file.

That is the end of the zeroeth part. All analyses in this course will start a similar way. The source line is run to give R more functionality, the data are loaded into memory using the read.csv function, and the data are attached to make it easier to access the variables in the data file.

Part I: Measures of Position

This section looks at using R to calculate a couple of measures of position. These measures are the quantiles (a.k.a. percentiles) and the z-scores.

It turns out that both are important in most of the sciences. The z-scores are used to compare a person’s score across tests. The quantiles allow the researcher to better understand the extremes in the data set, such as the 10th percentile and the 90th percentile.

  1. To see how to calculate the quantiles (percentiles), let us calculate the 25th percentile of the 2000 state unemployment rate. It is 3.25%: quantile(unemp2000, 0.25) As an extension, we can see that the 90th percentile is 5.2%: quantile(unemp2000, 0.90) and that the 10th percentile is 2.7%: quantile(unemp2000, 0.10)
  2. The next measure of position we will cover is the z-score. Here are several ways for us to calculate the z-score of the first state (Maine) in terms of the median household income in 2000. Use the one that is most helpful for you. I will tell you that the brackets will be heavily used in the future. It is how R allows us to subset the data.
    • zscore(medhhd00)
    • zscore(medhhd00, names=state)
    • zscore(medhhd00, names=state)[1]
    • zscore(medhhd00, names=state)[state=="Maine"]
    Note that the output from each is different. It may be helpful to think about times when each of the four will be most useful for you as the researcher.

Conclusion

That’s the SCA. Review the objectives and the list of R functions I listed at the top of this SCA. Now:

This page was last modified on 2 January 2024.
All rights reserved by Ole J. Forsberg, PhD, ©2008–2024. No reproduction of any of this material is allowed without explicit written permission of the copyright holder.