IS‘24: Course Calendar

The Course Calendar

This page provides the proposed calendar for the course. If changes need to be made (likely), then I will change them here and announce the changes during class. If you would like, you may download the calendar as an Excel file. That may allow you to see the course in a different light. Remember that one of your jobs is to see the connections among the course topics — and among the courses topics.

This calendar was last updated on: February 28, 2024, at 8:52 am.

January 1 (Monday)

Initial Thoughts on the Course

There are no classes today; it is the last day of Winter Vacation. Today makes a great day to make your New-Term Resolutions.
Thinking back over the past several terms, how will you change to make this term a success for you? This is such an important question that I may ask it on the first day of the term in the form of a quiz. When you start a professional job somewhere, you will need to convey how you will help the company grow and how the company will help you grow. Good jobs will provide opportunities for growth to you. Make sure you take advantage of them.

Module I: Gathering Your Data

The first learning module deals with gathering data. While this may seem rather elementary, we discover that choices on collection method impact our conclusions. This is why we spend time examining how we can properly — and cheaply — collect a representative sample from the larger population.

→ Link to Module 1 Overview

January 3 (Wednesday)

Course Introduction

Before you come to our first class period, ensure that you do the following: familiarize yourself with the course website, join the course on Moodle, carefully read the syllabus, and watch the videos there that cover learning and thinking in college.
Note that the syllabus contains some important features. These include a list of what it takes to succeed in this class as well as several pointed suggestions (Keys to Success) sprinkled throughout. You are not experts in statistics (yet), and you are in this course to become much more proficient at statistics. Similarly, you are not experts on learning (yet), and you are in this course to become much more proficient at learning. Growing your skills is one of the most important reasons for education.

Slides for today: Deck a1
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 1.1
Chapter 1

January 4 (Thursday)

Sampling Schemes

Today, we think about how we can properly collect our sample.
Remember that the sample consists of the people (items) we actually measure. This sample comes from the “sampled population.” Its entire purpose (and how we evaluate the sample) is to represent the population of interest (target population). If the sample fails to represent the population, then the statistics we calculate on it will not be good estimates of the population parameters we care about. Sadness abounds.

Slides for today: Deck a2
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 1.3
Section 2.1

January 5 (Friday)

Starting with R (SCA0)

The purpose of this class is to ensure you properly install R, know what a working directory is, and understand the importance of structure in your analyses.
⟼ Link to today’s activity

Support for today:
Chapter 1

January 8 (Monday)

Types of Variables

The first thing we deal with in statistics is the variable. This is because variables are at the very center of our studies. We are trying to learn more about variables. We are measuring variables. We are summarizing those variables. We are telling others what we learned about those variables. Statistics is all about variables.
Focus on the different types of variables. You will see that different types of variables contain different amounts of information due solely to their level. Knowing the level of the variable gives us information about the types of statistics we can meaningfully calculate on them and about the types of graphics that you canshould create to summarize your variable.
After looking at different types of variables (and why) we will examine how to perform a scientific study... and examine two main types, along with their strengths and weaknesses.

Slides for today: Deck a3
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 1.2
Sections 2.2–4
Section 4.1
Due today:
R Assignment 04

January 10 (Wednesday)

Biases in Statistics

Bias is (usually) bad. Today, we will look at how to ensure bias is reduced in our studies.

Slides for today: Deck a4
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 1.4

January 11 (Thursday)

Activity A: Sampling Schemes

An entire subdiscipline of statistics concerns sampling. Remember that the #1 purpose of the sample is to represent the population in every important way. Thus, I designed this activity to achieve two goals: First, it gives you more practice in using R to better understand statistics. Second, it gives you an inside view of the difference between cluster and stratified sampling, including what additional information is needed to use stratified sampling and what you are assuming when you claim to use cluster sampling.
⟼ Link to today’s Statistical Laboratory Activity A

Support for today:
Section 1.3
Section 2.1
Chapter 1
Due today:
Pre-Lab A, on Moodle

January 12 (Friday)

Examination I

This is the first examination for the course. It covers everything we covered in class during the first learning module.
Why am I giving a test so soon in the course? Its main purpose is to reinforce the importance of structured learning and continual testing/re-testing. In other words, those videos you watched before the start of the term offer you an opportunity to succeed. This examination lets you see if you are actually using the skills taught in those videos. And, if you are not successful on this examination, you have an opportunity to grow as a student.

Module II: Summarizing Your Data

Before any supportable analysis can take place, the analyst must know the data. This includes summary statistics for each variable, correlations between the variables, and meanings of the values and of the missing values. This also includes being able to create graphics that “tell the story of the data.”

→ Link to Module 2 Overview

January 15 (Monday)

Graphics: Categorical Variables

Graphics are EXTREMELY important, because they allow one to “see” the data. This is important both so that we can see the data, and so that we can more easily tell the story of the data to others.
When looking at types of variables in Day 2, we learned that categorcal and numeric variables have amounts levels of information available. Today, we focus on graphics that handle categorical variables — either singly or in pairs. In general, a bar chart will be best for single categorical variables (although some would argue a Pareto chart would be better). However, there is no really good option for showing the relationship between two categorical variables. While the mosaic plot is the best for statisticians so far, it is really not that good (intuitive for non-statisticians). The side-by-side or the stacked barplot seems to be the most intuitive.
To see the issue, try to draw a graphic by hand that shows the relationship between eye color and hair color. See how hard it is??? There is something fundamentally different between categorical and numeric variables. It is far easier to represent numeric than categorical.
This actually raises an IMPORTANT point: If you want to make a point with a graphic, best practices tell us to sketch the graphic by hand to ensure it actually communicates to us what we need it to communicate.

Slides for today: Deck b1
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 2.1
Chapter 4
Computing Activity: SCA 3
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.

January 17 (Wednesday)

Graphics: Numeric Variables

I find it much easier to use graphics to summarize numeric variables (singly and in pairs). Singly, we can use a boxplot or a histogram. In pairs, we can use a scatter plot.
Actually, if we are comparing a numeric variable across a few classes (showing the relationship between a numeric and a categorical variable), then we can use a side-by-side boxplot. These clearly show differences in distributions across different levels.
So, in reality, it is so much more easy to show relationships using numeric variables than it is for categorical variables.

Slides for today: Deck b2
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 2.2, 2.3
Chapters 5 and 6
Section 12.1
Computing Activity: SCA 3
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
Hawkes Learning Systems
Post-Lab A

January 18 (Thursday)

Measures of Center and of Position

If there is a single number that best represents your variable, it is the measure of center. However, there are three of them. Which is appropriate for your data, and how do we interpret the various measures? Also, what if we care about values in the tails instead of those at the center? There are many disciplines that focus their studies in the tails (those looking at the high or low achievers; those studying poverty; those studying winning teams, etc.).

Slides for today: Deck b3
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 3.1, 3.3
Chapter 5
Section 4.2
Computing Activity: SCA 2
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.

January 19 (Friday)

Measures of Uncertainty and of Skewness

This lecture covers two things. Most importantly, it looks at these “measures of spread” (or of dispersion or of uncertainty). There are a few that will be covered in the slide deck, but we need to focus on three: variance, standard deviation, and interquartile range. The key differences for these three are that they can be used with different variable types, that they can be used to help the reader better understand the data more easily, and that they work better for mathematical work. Today, pay attention to which is which.
By the way, the Hildebrand Rule is a rule of thumb allowing us to determine which data are “too skewed” to be represented by the mean. Since the mean is the preferred measure of center (we want to use it when we can), this rule of thumb should be quite helpful in summarizing the variable. Unfortunately, as with many things in statistics, we tend to use measures that are easy to use instead of measures that have a strong mathematical foundation.

Slides for today: Deck b4
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 3.2
Chapter 5
Section 4.3, 4.4
Computing Activity: SCA 2
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 06

Module III: Probability Theory

In addition to describing your data, there is a need to understand the “data-generating process.” This is the method by which the data came into being. Being able to describe this process means that the data are better known… including future data points not already collected. Understanding the data-generating process is the final step before us being able to draw conclusions about the population based on our small sample.

→ Link to Module 3 Overview

January 22 (Monday)

Discrete Probability Distributions

The purpose of statistics is to draw conclusions about the population of interest based solely on a sample. To that end, we will spend some time looking at probability distributions. The elements of a probability distribuion constitutes a population. Thus, knowing probability distribution allows us to identify certain types of populations… based solely on understanding how the data are created by the universe.

Slides for today: Deck c1
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 5.1
Appendix A.1
Computing Activity: SCA 5
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 07

January 24 (Wednesday)

Bernoulli and Binomial Distributions

When a distribution is encountered time and again, we bestow a name on it. This name indicates the importance of the distribution in our understanding of the world around us.
The Bernoulli distribution describes any random variable that has two possible outcomes: success and failure. The Binomial distribution describes a random variable that is the sum of independent and identically distributed Bernoullis. It is used to model the number of successes out of a known number of trials.

Slides for today: Deck c2
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 5.2
Appendix A.2, A.3
Computing Activity: SCA 5
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 08
Hawkes Learning Systems
Practicum Activity 1

January 25 (Thursday)

Poisson Distribution

The Poisson distribution is used to model counts of successes over a time period or an area. Note that this contrasts with the Binomial, which models counts of successes over a set number of trials.

Slides for today: Deck c3
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 5.3
Appendix A.7
Computing Activity: SCA 5
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 09

January 26 (Friday)

Activity B: Sampling and Discrete Distributions

This activity looks more closely at simple random sampling and how it is not as simple as we may think. We see that the distribution of the sample proportion is dependent on your rules for data collection.
If you allow for asking the same person multiple times, the distribution is Binomial. If you explicitly forbid this, then the distribution is Hypergeometric. If you stop sampling after your first success, then it is Geometric. All three are legitimate sampling rules. However, your choice has an effect on accuacy and precision (on bias and variance).
⟼ Link to today’s Statistical Laboratory Activity B

Support for today:
Appendix A.1, 2, 3, 7
Due today:
R Assignment 12
Pre-Lab B, on Moodle

January 29 (Monday)

Uniform and Exponential Distributions

Previously, we examined discrete random variables. Now, we will examine some continuous random variables. Remember that numeric variables can be discrete or continuous, so it behooves us to also study the second type. Also remember that it is usually easier to work with continuous random variables, so we will treat the random variables as continuous unless they are too discrete.
The two continuous distributions today are the Uniform and the Exponential distribution. The former is useful in modeling things that have a fixed start and end, whereas the latter can model things with a definite starting point. The former can model time spent at a stop light; the latter, time waiting for a bus.
Ultimately, the purpose of today’s lecture is to have you understand the cumulative distribution function (CDF) and how it is related to probability cor continuous random variables.

Slides for today: Deck c7, c8
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Appendix B.1, 2, 5
Computing Activity: SCA 6
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.

January 31 (Wednesday)

Normal Distribution

Today, we are introduced to the most important distribution in statistics. The Normal distribution (a.k.a. the Gaussian distribution) is important because of the Central Limit Theorem, which states that sums (and averages) of random variables are Normally distributed (as long as the sample size is “large enough.”
The previous lecture introduced us to continuous distributions (of which the Normal is) and to the importance of the cumulative distribution function (CDF) in calculating probabilities. We will work today on calculating probabilities and on determining x-values for a given probability; that is, we will learn more about the quantile function.
Ultimately, the conceptual relationships are key here. We will see how to get the computer to do the calculations for us. We just need to make sure we know what we want.
By the way, this lecture marks the end of the probability distributions we will be covering in class. Please avail yourself of the webapp I created that gives you practice in determining the probability distribution — and its parameters — from a graphic.
Please also read through the slidedeck “Approximating the Binomial with a Normal.” (cA). It will serve as an interesting tie-in to the Central Limit Theorem.

Slides for today: Deck c9
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Sections 6.1–4, 6.5
Appendix B.3, C.1, C.2
Computing Activity: SCA 6
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 13
Hawkes Learning Systems
Post-Lab B

February 1 (Thursday)

Normal Distribution, Part Deux

Day Two of the Normal distribution has us look at ways of using the Normal distribution (continuous) to estimate probabilities of a Binomial distribution (discrete). This lecture also looks at what we mean by one distribution “approximating” another. This lecture actually takes us quite nicely into the Central Limit Theorem (next time) and will be alluded to when drawing inferences on proportions (in a couple of weeks).

Slides for today: Deck cA
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Chapter 6
Appendix C

February 2 (Friday)

Activity C: Bootstrapping, Continuous Distributions, and the CLT

This in-class activity looks at bootstrapping as a means of estimating the population parameter. It also explicity demonstrates the Central Limit Theorem (CLT) in that the distribution of a sum of random values converges to a Normal distribution. The activity has you think about what affects the speed of convergence.
⟼ Link to today’s Statistical Laboratory Activity C

Support for today:
Section 5.6, 5.7
Computing Activity: SCA 7
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
Pre-Lab C, on Moodle

Module IV: Introductory Inference

Now that we understand our data and the process that created our data — both our sampling method and the data-generating process — we can begin drawing conclusions about the population based on the sample. These conclusions take the form of confidence intervals and/or hypothesis tests. The former consists of a set of reasonable values for the population parameter. The latter is a conclusion regarding the claim made about reality.

→ Link to Module 4 Overview

February 5 (Monday)

The Central Limit Theorem

The most important theorem in all of statistics is the Central Limit Theorem (CLT). The conclusions of the CLT should not be surprising for us, since we have already seen it happen in some of our activities. As the sample size increases, the distribution of the sample mean becomes more and more Normal.
Why is this so important? It means that a large sample size allows us to pretend that the sample means are Normal. This simplifies methods for estimating the population mean. How? Well, you will need to wait around for the next lecture to find out. 😼

Slides for today: Deck cB
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Chapter 7
Section 16.1
Appendix C.3
Computing Activity: SCA 7
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 14

February 7 (Wednesday)

Day of Dialog

Today is dedicated to engendering a better understanding of each other and of ourselves in this world. Discussions will be held around campus today to support this goal.
We are all human, with all that entails. We are affected by our personal history… as well as by the history that led to us personally. There is a very interesting anime called Astra Lost in Space in which the main characters grow up in a world without history. It is very thought-provoking. How are we a product of our history. More importantly: How would we be different with either a different history or with no history?
Or, for anime fans, this is interesting because the two main characters are voiced by Yato from Noragami and Sadao Maou from The Devil is a Part-Timer!. It is a fascinating combination… along with the new up-and-coming voice-actor Ciarán Strange.

Due today:
Hawkes Learning Systems

February 8 (Thursday)

The Theory of Confidence Intervals

Today starts the second half of the course. We now know all we need to know in order to understand the population based on the sample. We just have to put it together in a coherent whole. Our first step is to understand confidence intervals.
A confidence intervals is a set of “reasonable” values for the population parameter of interest. That is all it is. It contains our best estimate (“point estimate”) and an indication of our level of our uncertainty (“margin of error”). It is extremely important to provide both in your analyses. Without both, you are only telling a part of the story.

Slides for today: Deck d1
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Chapters 8 and 9
Chapters 11–15
Chapters 5 and 6
Due today:
R Assignment 15
Post-Lab C
Practicum Activity 2

February 9 (Friday)

Activity D: Estimators

We use estimators quite frequently in statistics. In fact, you have used several since the start of the term (e.g., sample mean, sample variance, sample median). What makes estimators “good” estimators? Why do we choose one over another. This activity checks your understanding of bias, variance, and the mean squared error. It gives you an opportunity to explain why we use an unbiased estimator in STAT 200, even though one with a lower MSE exists. It also has you think about the effect of variability of the estimator on the precision of your estimate. So much in one activity.
⟼ Link to today’s Statistical Laboratory Activity D

Support for today:
Chapter 11
Due today:
R Assignment 16
Pre-Lab D, on Moodle

February 12 (Monday)

The Practice of Confidence Intervals

Now that we understand confidence intervals, we see how to calculate them using the R Statistical Environment. As you have seen in the past, calculating things with a statistical program makes statistics much more available to all of us. There is no need for us to do math. We can just type a few lines in the script window, run them, and have the computer give us the answer.
So easy.

Slides for today: Deck d2
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Chapters 8 and 9
Chapters 11–15
Chapters 5 and 6
Computing Activity: SCA 8
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.

February 14 (Wednesday)

The Theory of Hypothesis Testing and p-values

Next, we move to the second cornerstone of inferential statistics: The Hypothesis Test. As we did with confidence intervals, let us spend the first lecture examining the theory and the second (and more) seeing how to use the computer to perform the calculations for us.
Today, we look at the hypothesis process. This means we are thinking about claims being made by people to see how to test them. A claim is a research hypothesis. From that, we create a null and an alternative hypothesis. We use the null hypothesis (and the structure of the problem) to determine the distribution of the test statistic (“under the null”).
With that distribution, the alternative hypothesis, and the value of the test statistic calculated from the data, we are able to determine the probability of observing data this extreme, or more so, given that the null hypothesis is true. In other words, with that information, we can calculate the p-value. Or, in reality, we can have the computer calculate the p-value so that we can properly interpret it.

Slides for today: Deck d3
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Chapters 8 and 9
Chapters 11–15
Chapters 5 and 6
Due today:
R Assignment 19
Hawkes Learning Systems
Post-Lab D
Practicum Activity 3

February 15 (Thursday)

The Practice of Hypothesis Testing

Now that we understand the hypothesis process and how to interpret the p-value, let’s see how to perform the necessary calculations. Today, we look at hypothesis tests involving population means, proportions, and variances. We also learn how to make R properly reflect our alternative hypothesis. It is easy… just a few more keystrokes. And, from that paltry effort, we reap acres of benefits.
While we will see how to do some simple hypothesis testing in class today, I refer you to slidedecks e1, e2, and e3 for more examples of simple hypothesis testing.
I also strongly suggest that you draw flowcharts to illustrate the analysis procedure. There is a sample one here.

Slides for today: Deck d4
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Chapters 8 and 9
Chapters 11–15
Chapters 5 and 6
Computing Activity: SCA 9, 11, 12, 13, 21, 22, 23
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.

February 16 (Friday)

Examination II

This is the second examination for the course. It covers everything through the theory of hypothesis testing.
You have already had a test from me. You should know what types of questions I like to ask. You should also have modified your study habits to increase your mastery of the material so that you do not need to cram for this exam.

Module V: Advanced Inference

The final learning module continues our explorations into drawing data-based conclusions about the world around us. New procedures are introduced, each is tailor-made for a specific type of data and hypothesis. Whlie there does exist a generic procedure that applies to all types of confidence intervals and hypothesis tests — the bootstrap — they tend to be of low power. Thus, time should be spent leveraging all of what we have learned in this course to create procedures that are as powerful as possible.

→ Link to Module 5 Overview

February 19 (Monday)

Chi-Square Goodness-of-Fit Test

The Binomial test is for one population in which there are only two possible outcomes per person. The proportions test is for two populations in which there are only two possible outcomes per person. Today, we see how to handle cases of one population in which there are more than two possible outcomes per person.
The Chi-Square Goodness-of-Fit test was designed to test if an observed discrete random variable actually follows a claimed population distribution. It calculate the scaled difference between the observed and the expected (under the null) counts. If that difference is too large, then we can conclude that the data do not come from that claimed discrete distribution. If the difference is not too large, then we cannot conclude that the data do not come from the claimed distribution.
If you are trying to compare categorical distributions across more than one population, then you will need to use the Chi-Squared Test of Independence (later in the course).

Slides for today: Deck e4
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 10.6
Section 18.1
Computing Activity: SCA 32
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 21

February 21 (Wednesday)

The Analysis of Variance (ANOVA)

Analysis of Variance is a set of methods specifically designed to model a numeric dependent variable using a categorical independent variable with just a few levels. Surprisingly, it is an instance of "linear models" which also includes linear regression (next week). Here, we are being introduced to the ANOVA procedure. In doing this, we need to focus on the dependent and independent variables. We also need to focus on what ANOVA can — and cannot — tell us.

Slides for today: Deck e5
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 11.6
Chapter 22
Section 7.1, 7.2
Computing Activity: SCA 31b
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 23
Hawkes Learning Systems
Practicum Activity 4

February 22 (Thursday)

Beyond ANOVA: Which is different?

As we saw in the previous lecture, ANOVA can only tell us if there is a difference in means (or if the grouping variable gives us insight into the dependent variable). That is all. One cannot tell which of the means is different. This is important information to have. Also, there are requirements to the ANOVA procedure. What should we do if those requirements are not met in our data and model? We learn that today.

Slides for today: Deck e6
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 11.6
Section 7.3
Due today:
R Assignment 25

February 23 (Friday)

Activity E: Confidence Intervals

What is a confidence interval? It is a set of numbers that contains (or “covers”) the population parameter a specified proportion of the time. This activity looks at coverage and the effects of the CLT (and the Law of Large Numbers) on the appropriateness of a statistical procedure.
⟼ Link to today’s Statistical Laboratory Activity E

Support for today:
Section 11.2
Due today:
Pre-Lab E, on Moodle

February 26 (Monday)

Chi-Square Test of Independence

The ANOVA procedure helps us determine if a categorical variable and a numeric variable are independent. The Chi-Square Goodness-of-Fit test allows us to determine if a discrete random variable could have followed a claimed distribution. Today, we learn how to determine if two categorical variables are independent.
To do this, we recall our Hypothesis Testing Theory, constructing the test statistic based on the null hypothesis of “no relationship.” This requires a quick trip back to Hawkes Chapter 4 to learn what is meant by “independent.” But, once that aside is made, the test statistic and its distribution are easily determined. And, under the usual assumptions, we know the distribution of the test statistic. This allows us to calculate a p-value… and interpret the p-value accordingly.

Slides for today: Deck e7
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 10.7
Chapter 18
Computing Activity: SCA 41
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 26

February 28 (Wednesday)

The History of Correlation

Correlation is one method used to test if two numeric variables are independent. Today, we see the history of this concept… eventually leading to a simple test of correlation.
And yet, while we can determine if there is correlation, this methods can tell us little more. As with ANOVA, the original test is quite limited in what it tells us. Also like ANOVA, we will wait for the next lecture to go beyond simple correlation.

Slides for today: Deck e8
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Section 12.1
Section 7.1
Computing Activity: SCA 42a
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 27
Hawkes Learning Systems
Post-Lab E
Practicum Activity 5

February 29 (Thursday)

Regression: Beyond just Correlation

Now, we go beyond determining just the strength of the relationship between two numeric variables. We try to model a dependent variable with an independent variable. Framing it in such a manner allows us to calculate confidence intervals for the y-intercept and for the slope (effect). As always, the calculations are easy for the compute; the interpretations are up to you and your brain.

Slides for today: Deck e9
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Chapter 12
Chapters 7 ad 24
Section 12.2, 12.3
Computing Activity: SCA 42b
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 30

March 1 (Friday)

Activity F: P-Values

What is a p-value? It is a measure of how well the data support the null hypothesis… assuming the correct statistical test is used. Also, since the p-value is a function of the (random) data, the p-value is a random variable. What is its distribution? Knowing this may help us better interpret p-values in our own research. This activity looks at p-values, their distribution, and the effects of the CLT on the appropriateness of a statistical test.
⟼ Link to today’s Statistical Laboratory Activity F

Support for today:
Section 5.2.5, 5.4,
Due today:
Pre-Lab F, on Moodle

March 4 (Monday)

Regression: Bells and Whistles

Finally, we find out how to use our regression model to go beyond estimating (with confidence interval) the y-intercept and the slope. We will discover how to estimate a point on the regression line… and include its confidence interval.
We will also do something rather interesting: We will predict a new y-value and provide something called a prediction interval.
It has been a while since we worked with prediction intervals (see Lab Activity C). However, these prediction intervals are frequently what we want to calculate. Where the confidence interval is an interval on the average (mean), the prediction interval is an interval on a future observation. Sometimes, we want the former, especially when we are trying to discover if there is a relationship between two variables. Other times, we really do want to estimate a future value. Prediction intervals allow for this latter.

Slides for today: Deck eA
↠ Remember to download and read over these slides before class. This is a part of the whole “being prepared for class” thing that good students do.
Support for today:
Chapter 12
Chapters 8 and 25
Section 12.4
Computing Activity: SCA 42
↠ This statistical computing activity gives you direct practice in the statistical computing that we do in this course.
Due today:
R Assignment 31

Module VI: Course Summary

Our final day is dedicated to summarizing the entire course. It is a tall order, but we succeed. Reviewing the slides before class will help you remember all that we have done. It will give you some insight into the “story of the course.” It will also help you to think statistically.

→ Link to Module 6 Overview

March 6 (Wednesday)

Course Review

And here we are. Today is the final class lecture. I dedicate it to providing a “walk down memory lane” for all we did this term. As we revisit the topics in the course, you should be able to determine what is important for the final and what is important for you future research life.

Slides for today: Course Summation
Due today:
R Assignment 32
Hawkes Learning Systems
Post-Lab F
Practicum Activity 6

March 7 (Thursday)

Surprise Day

Let us start Reading Days a day early. This was originally built as a buffer day for if I got sick. Thankfully, I did not get sick, so let’s not have class today. Instead, let us celebrate all the work and learning you did this term.
Well done!!

Module VII: It’s the End of the Term as We Know It (and I Feel Fine)

Our classes have ended. You have two Reading Days in which you study and refresh your memory of all of your classes this term. The three-day final examination period provides you the opportunity to prove to your professors (and to yourself) that this term was valuable to you in your education.

March 8 (Friday)

Reading Day

Reading days were originally created by Knox College to provide students with days set aside for nothing other than bringing together all of the material learned during the term — and during the time at Knox College. Education is a process of changng yourself. It provides you with the opportunity to better see the relationships throughout the world.
If you are only intrested in short-term gains (which is better than nothing), then today is a day to cram. If you are interested in mastering the material you have learned thus far, then today is a day to reflect on what you have learned in life.

March 9 (Saturday)

Reading Day

March 10 (Sunday)

Final Exam Period

March 11 (Monday)

Final Exam Period

March 12 (Tuesday)

Final Exam Period

March 13 (Wednesday)

Start of Spring Break

Yay! This is the start of Spring Vacation. It will be cold in Galesburg (most likely), but this week needs to be a time for you to relax and return to your usual level of energy.
Remember that the Spring Term consists of the usual 10 (or so) weeks of Knox-level learning. This piggy-backs on the 10 (or so) weeks of Winter Term. in some ways, the Winter-Spring term is a 20-week term. Keep this in mind. Even if you are not new to Knox College, please take advantage of what we offer to help you remain consistent throughout this pair of terms.
The goal, as always, is to learn the material so that you are prepared to succeed in the future. Keep this in mind.

Final Thoughts

And that is all there is to this course! Again, the key is to be proactive in your learning. Learning how to learn will serve you well into the future. In many ways, your grade in this course has less to do with your statistical abilities than is does with your abilities as a student. Since we all have things to learn about learning, spend this time polishing your skills.

Introductory Statistics

IS‘24: : The Current Calendar

The Course Calendar

Module I: Gathering Your Data

Module II: Summarizing Your Data

Module III: Probability Theory

Module IV: Introductory Inference

Module V: Advanced Inference

Module VI: Course Summary

Module VII: It’s the End of the Term as We Know It (and I Feel Fine)

Final Thoughts


	Ole J. Forsberg, PhD Associate Professor Chair of Data Sciences Office: SMC E-212 Knox College 2 East South Street Campus Box K-6 Galesburg, IL, USA, 61401-4999	Some Links Knox College of Illinois Department: Mathematics Program: Data Science Program: Statistics R for Starters Project Scarlet Elections @Knox