Introductory Statistics

 

IS: Lab F, Hypothesis Testing and p-values

[icon]
Lab F: Hypothesis Testing and p-values

The main purpose of this laboratory activity is to better understand p-values and some consequences of them being random variables. A secondary purpose throughout these lab activities is to understand randomness and its effects on how much we can understand about the world around us.

Finally, remember that these are designed to help you master the material. However, this also requires you to think through the steps and seek connections. No assignment will force you to learn. Learning requires a conscious decision and resulting effort on your part.

The Pre-Lab

All of these laboratory activities have pre-labs that must be completed prior to class on lab day (Moodle Quizzes). These pre-labs consist of a brief quiz covering the lab. You need to take the quiz before class on lab days. While it is possible to do well on the quiz without going through and thinking about the lab material, don’t do it. You are cheating yourself.

These pre-labs do not have to be done the night before the lab. I strongly recommend that you do them in the days leading up to the lab. There will frequently be a week between when we cover a topic in class and when there is a lab on it. Do yourself a favor and start early on the pre-labs.

Also, treat the pre-labs as being material to help you prepare. These are not busy work. You should be able to enter class on lab day and teach all of the material in the lab to us in the class. Learning takes time. Remember that you should be willing to spend at least 15 hours a week on the course material. Your primary job is to be a student. Do not cheat yourself.

✦•······················•✦•······················•✦

[moodle icon]
 Moodle

And so, with no further ado, please complete the Moodle quiz and submit it before class begins.

The quiz is a partial check on your reading of this lab. While you may be able to pass the quiz without thoughtfully doing it, you are cheating yourself. Given how much are you spending on your education, be thoughtful.

Lab

This lab starts in the same way as all do: Start R, start a new script, set the working directory, and source the usual file from that script,

source("http://rfs.kvasaheim.com/stat200.R")

We will be using random numbers in this activity. So, to get you in the habit of doing this (for science to occur, the results must be able to be replicable), you will need to set the pseudo-random number seed. Do this according to your student ID number. That is, if your ID number is 123456, you would next run in your script window

set.seed(123456)

The purpose of this is to allow me to check that you are reporting the numbers correctly. As such, before reporting your final numbers, run the entire script from the beginning. This will ensure that your numbers are the ones I see. This is very important, muy importante, très important, sehr wichtig, etc. If your numbers do not agree with mine, yours are wrong.

Overview

The data you collect is random. Any function of the data is also a random variable— any function. Since the p-value you calculate is calculated from the data, it is a random variable with its own distribution. It can be shown that the p-value of an appropriate test follows a standard Uniform distribution as long as the null hypothesis is true.

Skippable Proof

Why? The value of α determines when we reject a null hypothesis. The Type I error rate is the proportion of time we reject a true null hypothesis. It is also α if the test is appropriate. Thus, if the test is appropriate and the data are generated under the null hypothesis, then the following is true:

P[p-value ≤ α] = α

This is just the CDF of a standard Uniform distribution. Thus, if the test is appropriate and the test’s requirements are met, then the p-values will follow a standard Uniform distribution.

Unskippable Note

Thus, we have a way of measuring how good a test is under the specified setting: We generate the data under the null hypothesis, calculate the p-value, repeat many, many, many times, and then check the distribution of those p-values. If that distribution is close to the standard Uniform distribution, then the test is appropriate.

The data you collect is random. Any function of the data is a random variable with a probability distribution— including the p-value.

Part I: Violation of Neither Requirement

As with the previous laboratory activity, let us examine the z-procedure for the population mean from Lecture d1 (“The Theory of the Z”). The assumptions of (requirements for) this procedure are twofold:

  1. the data are generated from a Normal distribution
  2. the population variance is known

Our first step is to check that the p-values do follow a standard Uniform distribution when the null hypothesis is true.

The following code will generate a set of data from a Normal distribution, calculate the test statistic and the p-value, then repeat this many, many, many times. When it is done, the variable pvals will contain 1 × 104 p-values that were generated under the null hypothesis. If the z-test is appropriate under this situation, then the distribution of those p-values should be standard Uniform — or sufficiently close to it.

pvals = numeric() ## set aside memory # Set Parameter Values n = 100 ## sample size mu = 3 ## population mean sigma = 1 ## population stdev alpha = 0.05 ## selected alpha-value # Start simulation for(i in 1:1e4) { x = rnorm(n, m=mu, s=sigma) ## random sample ts = (mean(x)-mu)/sqrt(sigma^2/n) ## calc ts pvals[i] = 2*pnorm(-abs(ts)) ## calc p-value } # Produce graphic hist(pvals, freq=FALSE) ## Relative frequency abline(h=1, col="red") ## Horiz line at 1 # Calculate the observed rejection rate mean(pvals<=alpha) ## Rejection rate, a

When I ran this code, I got this histogram.

[histogram2]

I also got a value of a = 0.0506. That is the value calculated from the last line of the code. It is the proportion of the time that the p-value was less than or equal to α = 0.05. In other words, a is the observed rejection rate, which should be close to α.

The Rejection Rate for All Alpha Values

We could spend our time testing each possible value of α to determine if the test works for any and every α. Or, we can directly test if the distribution is standard Uniform. To do this, let us use the Kolmogorov-Smirnov test.

The Kolmogorov-Smirnov test determines if it is reasonable to conclude that the observed data came from the hypothesized distribution. To do this, it calculates the empirical CDF of the data, the CDF of the hypothesized distribution, calculates the greatest distance between the two (the test statistic), then compares that test statistic to the Kolmogorov distribution.

Performing the Kolmogorov-Smirnov test in R is quite easy. The following single line of code will test the null hypothesis that the p-values calculated above really do follow a standard Uniform distribution.

ks.test(pvals, "punif")

The first slot in the function contains the generated values. The second slot specifies the CDF of the hypothesized distribution. Here, that CDF is that of the standard Uniform.

Running the test produces a p-value of 0.6653. Because the p-value is greater than our usual alpha-value, we cannot reject the claim that the p-values were generated from a standard Uniform distribution. This means that the test appears to be appropriate for any value of α under these circumstances.

This conclusion does not really surprise us, does it? It means that when the data are from a Normal distribution and we know the population variance, then the z-test is an acceptable test to use.

This is something we proved in Lecture d1.

Part II: Violating Requirement 1: Normality

The z-test requires the data come from a Normal distribution. In the previous lab, we discovered that this assumption is not too important in calculating confidence intervals. Let’s see if we can make a similar conclusion with respect to the p-values.

And so, let us generate data with sample size n = 10 from an Exponential distribution with mean μ = 1. This also means the standard deviation is 1 (σ = 1). Before you continue reading, try to write the code in your notes. Base it on the above code.

When you have the code written in your notebook, move your cursor over the following space to see the code I wrote.

pvals = numeric() # Set Parameter Values n = 10 ## sample size mu = 1 ## population mean sigma = mu ## population stdev alpha = 0.05 ## selected alpha-value # Start simulation for(i in 1:1e4) { x = rexp(n, rate=1/mu) ts = (mean(x)-mu)/sqrt(sigma^2/n) pvals[i] = 2*pnorm(-abs(ts)) } # Produce graphic hist(pvals, freq=FALSE) ## Relative frequency abline(h=1, col="red") ## Horiz line at 1 # Calculate the observed rejection rate mean(pvals<=alpha) ## Rejection rate, a # Is the distribution standard Uniform? ks.test(pvals, "punif")

When I ran this code, I got the following histogram.

[histogram]

According to the Kolmogorov-Smirnov test (p-value = 0.001453), there is evidence that the z-test is inappropriate for these circumstances and for all values of α.

What about only α = 0.05?

What about for our special/usual value of alpha, α = 0.05? Is the z-test appropriate under these circumstances?

To test this, we need to focus on the rejection rate for α = 0.05. From our earlier work in class, we know that we can use the Binomial test to check if the observed number of rejections is reasonable under the null hypothesis that the Type I error rate is α = 0.05.

binom.test(x=sum(pvals<=alpha), n=length(pvals), p=alpha)

Recall the folowing about the binom.test function. The first position is the number of successes; the second, the number of trials, and the third, the claimed rate. Here, the first is the number of rejections; the second, the number of trials, and the third, α = 0.05.

According to the Binomial test (p-value = 0.000485), we reject the hypothesis that the z-test is appropriate under these circumstances: σ known, distribution is Exponential(λ=1), and the sample size is n = 10.

Who knows. The z-test may be allowed under different scenarios, such as n = 100. In fact, check that the z-test does work for all α for this relatively large sample sizes like n = 1000. [You should be able to modify the above code quickly to check the impact of the sample size on the distribution of the p-values.] You should also understand that the Central Limit Theorem explains why larger sample sizes are better.

Part III: Violating Requirement 2: s, not σ

That the distribution is Normal is merely one of the two requirements. The other is that the population standard deviation, σ, is known. In this section, I give you full freedom to check if the z-test is still appropriate when this assumption is violated.

In other words: Violate that requirement and check if the test is still appropriate. Draw from the original standard Normal distribution. Look back to the previous lab for a lot of help on this. Change the sample size to finish your version of the following table:

Sample Size Rejection Rate KS test p-value Binomial test p-value
10 0.0765 ≪ 0.0001 ≪ 0.0001
30 0.0623 0.0108 ≪ 0.0001
50 0.0616 0.1068 — — —
100 0.0563 0.5591 — — —
200 0.0503 0.2238 — — —

This is a table for your results:

Sample Size Rejection Rate KS test p-value Binomial test p-value
10
30
50
100
200

Your rejection rates should be “close” to mine. (Why not exactly mine?) Use that to check that you did this analysis correctly.

Note that your p-values may be very different from mine. Why? In answering this question, think about the variance of the standard Uniform distribution.


Alright, one last question about this part of the lab. It seems rather similar to a question from the previous lab.

Again, many introductory statistics textbooks provide a rule of thumb that a sample greater than n = 30 is sufficient to use the z-test, even if one does not know the population variance. Does this seem like a good rule to follow? If not, what should the rule be under these conditions? Why?

My Code for this Part

Here is the code I used for the case of n = 10:

pvals = numeric() mu=0; sigma=1; n=10 for(i in 1:1e4) { x = rnorm(n, m=mu, s=sigma) ts = (mean(x)-mu)/sqrt(var(x)/n) pvals[i] = 2*pnorm(-abs(ts)) } ks.test(pvals,"punif") binom.test( sum(pvals<=0.05), n=length(pvals), p=0.05)

The Post-Lab

Please answer the following post-lab questions as usual.

  1. What roles did the Central Limit Theorem and the Law of Large Numbers play in this activity? Be very specific about what each did.
    Was the usual rule of thumb[1] given in the textbook sufficient, or would you recommend a higher cutoff between “good enough” and “not good enough” when the data are non-Normal and you do not know σ? If so, what would it be? Give evidence.
  2. The Exponential distribution is highly right-skewed (H ≈ 0.307). Would the rule of thumb you created in #1 be different if the data came from a symmetric distribution like the standard Uniform distribution (skew = 0)? If so, why? Would it increase or decrease? What would it be?
    Explain fully using statistics. I want to see how well you understand the two distributions and the effect of symmetry on the Central Limit Theorem. I also want to check that you are able to use R to determine the answer (as you need to).
  3. Let us say that the null hypothesis is true and that you are using a proper test. What is the probability that you will wrongly reject the null hypothesis?

Remember that the post-lab is based on correctness as well as your ability to express yourself well. Spend time making sure that there are no errors. Include actual values from your analysis to support each of your answers. This last is important for the first two questions. Without including the statistics you calculated, you are not grounding your answers in reality, and your grade will reflect that.

Make sure you include your script from this lab— properly commented. Your script should include what is in the lab as well as what you do to answer question 2. No script = No points.

As always: your first page is the title page. Your title page should include your name, the lab title, the date of the lab, and your Knox ID. Start a new page and answer the post-lab questions. After you answer those questions, start a new page and start your code appendix.

Footnote


Reference:

[1] The “usual rule of thumb” states that the sample means are essentially Normally distributed as long as the sample size is at least 30.