STAT 225: This Course’s Calendar

 

[The Course Calendar]
The Course Calendar

 

This page provides the proposed calendar for the course. If changes need to be made, then I will change them here and announce the changes during class. If you would like, you may download the entire calendar as an Excel file. That will allow you to see the course in a different light.

 

[csv] Term: Winter 2023
Update: February 22, 2024

Learning Module O: Review, Plus Ultra!

January 1 (Monday):

Initial Thoughts on the Course

Class Notes: There are no classes today; it is the last day of Winter Break. Today makes a great day to make your “New-Term Resolutions.”

Thinking back over the past several terms, how will you change to make this term a success for you? This is such an important question that I may ask it on the first day of the term (Wednesday) in the form of a quiz. When you start a professional job somewhere, you will need to convey how you will help the company grow and how the company will help you grow. Good jobs will provide opportunities for growth to you. Make sure you take advantage of them.

Also:

How will you treat this course differently?

January 3 (Wednesday):

Matrix Review, Plus Ultra!

Class Notes: Today, we start our journey into linear models and what we can do with them. To ensure that you have the needed background information — or at least know where to find it — we start with a week of review (and more). The “and more” indicates that we will already be learning new facts. While most of this review will have been covered in the prerequisites, not all will have. Also, the amount of prior information will be dependent on the faculty member who taught you the prerequisites. Thus, this will be 60% new for some of you, but 15% new for others. Stay aware of this as we move through this week.

Today, we start with a review on matrices. This means we will cover the basics of adding and multiplying them, as well as more advanced topics of projection matrices and idempotency. You will quickly learn that you need to read the assigned sections before class so that you are prepared for class.

January 4 (Thursday):

Matrix Review, Plus Ultra!

Class Notes: We finish the review (plus ultra!) of matrices. Please remember that everyone will learn new things in this review. No one will have learned it all in the past. However, everyone should know some of this material from past courses. The purpose of prerequisites is to give you the necessary background to a course. While it is possible to succeed in a course without the required knowledge, it is much more difficult. So, for those without MATH 185, you will need to step-up and work three-times as hard in the beginning. Similarly, those who have had MATH 185, but remember little of it, will need to work harder than those who took it and remember it all.

Everything in this review will be used in this course. Thus, you will want to learn it to the point that you can either recall the information from your working memory or at least remember where in the appendix the information can be located.

January 5 (Friday):

Matrix Review, Plus Ultra!; LaTeX and R

Class Notes: Today, we move on to learning how to use LaTeX to properly typeset our work. While the first homework assignment does not require you to use LaTeX, all others do. The simple reason is that LaTeX is the program of choice for mathematicians, statisticians, engineers, computer scientists, etc. for presenting research. While the overhead is greater than for MS Word (and its ilk), the ease at which one can include equations, graphics, and the like is much greater.

January 8 (Monday):

Statistics Review, Plus Ultra!

Class Notes: Let’s move from matrices to statistics. These two appendices actually complement each other. As we move forward in the review, you will need to refer to both appendices. The reason for this integration of material is that this course focuses on linear models, which uses both matrices for calculations and statistics for interpretation.

If you have had STAT 200 from me, this review will be easier than those who did not. Note, however, that no one will know all of this information coming into this course. Everyone will be reviewing something and learning something… everyone. This means y’all will need to work hard to learn that new material.

Today, we start with the basics of statistics, leaving the distributions for tomorrow. This means sample statistics and population parameters will be covered in detail. Look for the purpose of sample statistics. This will help you better understand the formulas involved. Also, use your mathematical background to understand the formulas. All formulas tell a story about what is being measured. The sooner you can read mathematical formulas, the better.

January 10 (Wednesday):

Statistics Review, Plus Ultra!

Class Notes: We shall spend the entire day on probability distributions. Those who took STAT 200 will recognize some of them. Those who took MATH 321 will recognize others. Ultimately, the probability distributions we cover will reflect either the data or the test statistics. As such, you will need to distinguish between the two types. You will also need to be aware of the parameters involved in each distribution. These parameters include those values that define the shape of the distribution, as well as those values we care about estimating.

Spend time learning the definitions of the distributions. These are the wordy things, not the probability functions. Learning the definitions of the distributions will help you know when to use each distribution... and when to ignore each distribution.

January 11 (Thursday):

Statistics Review, Plus Ultra!

Class Notes: Finally, we complete the review (plus ultra!) today by covering sections 5 and 6 in the statistics appendix. Today, in many ways, is the point of the second-half of last week. We will use the definitions of sample statistics to find their distributions (under usual assumptions), thus allowing us to estimate parameters.

Note that ths is the entire point of statistics— to estimate the population parameter, given a sample. Because the sample is random, the sample statistics will be random variables. Thus, statistics uses this randomness, describes it (knows it), and estimates what we really care about. See that these distributions contain both the best estimate of the parameter and the uncertainty in our estimate. These distributions also allow us to test hypotheses about the parameters.

In fact, understanding the distribution of the statistics tells us almost everything we need to know about the parameter.

January 12 (Friday):

Examination 1

Class Notes: It is time for the first examination of the course. It is entirely in-class. It is also only 50 points. It covers the material in two appendices. Be prepared to show me that you have the mathematical and probabilistic basics understood.

Learning Module I: Ordinary Least Squares

January 15 (Monday):

Scalar Representation

Class Notes: [Icon of MLK, Jr.]And so, with the review (plus ultra!) behind us, we are able to start learning about linear models. The plan is to explore — in detail — the simplest case first. Then, once we fully understand the simplest case, we move on to modifications.

Today, we start with simple linear regression (SLR) ignoring matrices. We do this so that we better understand the inner-workings of the mathematics, allowing us to focus on the assumptions made in creating the SLR model. We will need to use differential calculus to optimize our estimates. The biggest question is “What do we mean by optimize?“ What are we optimizing?

We are calculating the “line of best fit,” but what do we mean by “best”? How we define best determines what we optimize and what properties the line of best fit has.

January 17 (Wednesday):

Scalar Representation

Class Notes: Today, we finish the derivation of the OLS estimators in SLR, including understanding the assumptions we had to make to achieve these results. Note that the assumptions we made are actually requirements that must be met when using this modeling paradigm. ⟵ Important point.

Once we finish the derivation using scalars, we note that we now are able to model a single dependent variable using a single independent variable… as long as certain requirements are met. However, if we have two independent variables, we will have to redo the entire derivation, running into a lt more work. And, if we have three independent variables, we have to (again) start from scratch.

So, in lieu of having to do this over and over again, we use matrices. While matrices offer a clean and extensible solution to any number of independent variables, we will focus on the SLR case to show the connection between the two and to see what the assumptions/requirements are in the extended case.

January 18 (Thursday):

The OLS Matrix

Class Notes: Today, we tackle OLS as a matrix equation. This will be useful because it means we can use as many independent variables as we wish. We are no longer constrained to just one. The drawback is... Well, I’m not sure what a drawback is in using matrices to obtain our OLS estimators.

January 19 (Friday):

The OLS Matrix

Class Notes: Last time, we found the OLS estmators in matrix form. This allowed us to derive the results from the last two days rather quickly. Ultimately, the awesome part about the matrix representation is that we are no longer limited to simple linear regression (SLR; a single independent variable). The OLS estimators can now be easily calculated for any number of independent variables.

… as long as the independent variables are linearly independent.

This requirement will be an issue if p > n… if the number of parameters to be estimated is greater than the number of data points. If you find yourself in the case where p > n, then you cannot use any of the regression covered in this course. You will need to use some specialized technique, like Bayesian analysis. In these cases, one is able to estimate more parameters than information because strong assumptions are being made about the relationships amongst the variables. But, again, this is beyond the scope of this course.

Today, we finish estimating the parameters in OLS and examining what they mean.

January 22 (Monday):

Predictions and the Hat Matrix

Class Notes: Today, we will look at this thing called the “hat” matrix, H. Since it has a name, it must be important. Interestingly enough, it is very important in OLS regression. It is used to calculate the expected value matrix as well as the residuals matrix.

Looking closely at H, and using what we know from Appendix M.4, we will see that the geometric representation of OLS shows us that OLS gets us as close to reality as we can get while staying on the floor. This is the unexpected strength of OLS. However, its assumptions (requirements) may not be met in real life. As such, without meeting those requirements, the benefits of OLS do not exist.

January 24 (Wednesday):

Activity: GDP and Latitude

Class Notes: Today is the day to do SCA 1. The purpose of this activity is to have you check that your notes contain all of the important formulas, and to see that the output from the computer is not magic. You have the capability to calculate all of those numbers yourself. Well, at least most of the numbers. In the future, you will see where the rest of the numbers come from.

January 25 (Thursday):

Unsure

Class Notes: It appears as though today is a review day for you. I am not sure if I will do this in the future, however.

January 26 (Friday):

Examination 2

Class Notes: This is our second examination. It covers everything until Wednesday.

January 29 (Monday):

Probability Distributions and Intervals

Class Notes: Why do we care about these estimators? What do they tell us about our acucracy and our precision? These are questions concerning the (sampling) distribution of these estimators. From these sampling distributions, we are able to better understand the uncertainty in our estimates. This, fundamentally, is the goal of statistics.

Knowing the distribution of the sample statistic… which leads to knowing the distribution of the test statistic… leads to knowing the endpoints of the confidence interval. NOTE THAT the confidence interval depends on knowing the distribution of the sample statistic. This is the key purpose of the first couple of chapters of this course... KNOW THE DISTRIBUTION.

January 31 (Wednesday):

Test Statistics and Hypothesis Testing

Class Notes: Now that we know the sampling distribution, what can we know about testing hypotheses? Today's lecture covers the importance of knowing the distribution of the sample statistic (b1 or whatever).

Calculating the p-value for a given null hypothesis is an important step in understanding the randomness around us. In STAT 200, you did this for several different hypotheses, including one population means, two population means, goodness-of-fit, and proportions (one and two). Now, we will do it for the effect parameters (the β). We will also see how to get R to perform those calculations for us.

February 1 (Thursday):

Requirements of our OLS Model

Class Notes: We have gotten to this point of knowledge by using mathematics to learn all we can, then making some assumptions, and learning all we can with those assumptions, etc. However, when we make those assumptions, we need to check that these requirements are not violated by our data and model. This lecture covers the three requirements: Normality of residuals, homoskedasticity of residuals, and constant expected value of residuals. All three are about the residuals. All three can be easily tested. The tests are the Shapiro-Wilk test, the Breusch-Pagan test, and the runs test.

February 2 (Friday):

Activity: GDP and Latitude, Part Deux

Class Notes: Today, we will do the second statistical computing activity of this term. In this SCA, you will again draw connections between the matrix calculations covered in class and the statistical output from R.

In many ways, this SCA is a review and continuation of the previous. I designed it this way to help you see the importance of saving your work, commenting it (for understanding), and expanding it. In your analysis lives, you will be faced with complicated analyses. These analyses will be done over a period of several weeks or months. It is important to create good habits here so that you will have them when they count.

Learning Module II: Beyond the Ordinary

February 5 (Monday):

Adjustments for Boundedness, I

Class Notes: The topic today is to figure out how to adjust the dependent variable when it is bounded (once or twice). Remember that the residuals must come from a Normal distribution, which is unbounded. Bounded dependent variables are a violation of this requirement. Transforming the dependent variable appropriately can change a bounded variable into an unbounded one.

February 7 (Wednesday):

Day of Dialog

Class Notes: [listen, too]
Today is dedicated to engendering a better understanding of each other and of ourselves in this world. Discussions will be held around campus today to support this goal.

We are all human, with all that entails. We are affected by our personal history… as well as by the history that led to us personally. There is a very interesting anime called Astra Lost in Space in which the main characters grow up in a world without history. It is very thought-provoking. How are we a product of our history. More importantly: How would we be different with either a different history or with no history?

Or, for anime fans, this is interesting because the two main characters are voiced by Yato from Noragami and Sadao Maou from The Devil is a Part-Timer!. It is a fascinating combination… along with the new up-and-coming voice-actor Ciarán Strange.

February 8 (Thursday):

Adjustments for Boundedness, II

Class Notes: The topic today is to figure out how to adjust the dependent variable when it is bounded (twice). Since it is unlikely y’all remember the logit function (or its inverse), let’s spend some time looking at the mathematics of the logit function. We will use this function to transform a dependent variable that is twice-bounded (above and below).

After doing this, we shall work through an example where we can apply the transformations (and back-transformations) learned in class. The real purpose of the transformations is to help the model better match the data-generating function (a.k.a. nature).

February 9 (Friday):

Analysis Examples

Class Notes: We take a breath and apply what we’ve done over the past few lectures to a few interesting research questions. Today will be entirely on the computer using R. What we do in class will be quite similar to what is done in Chapter 5. So, think of Chapter 5 as being supplement to today’s lecture.

February 12 (Monday):

Other Least Squares

Class Notes: We’ve adjusted for some shortcomings of OLS. Here, we become pro-active. Instead of trying to adjust for things like heteroskedasticity, we can just model it properly. This solution leads to weighted least squares, WLS. While it is a very powerful technique to know, there are better.

This will be a math day with an ending in R. Today also marks the last time we explicitly deal with matrices. Make no mistake about it, however, matrices underly everything in linear models. Some in the Department of Mathematics would go so far as to claim Linear Models is just Linear Algebra. I shun those people. 😸

February 14 (Wednesday):

Quantile Regression

Class Notes: OLS only allows us to model the expected value of the dependent variable. Frequently, we would prefer to model one of the quantiles. Being able to do so is very important in studying things like poverty, things that happen at one extreme or another (or even the median). This class covers quantile regression.

The mathematics are beyond the scope of this class (except for some hand-waving). Today’s lecture is to show you another option in regression (and ANOVA). In lieu of modeling the expected value (mean) of the dependent variable, you will have the skills to model one of the quantiles— the one of your choice. A handy tool for the toolbox.

February 15 (Thursday):

Maximum Likelihood Estimation

Class Notes: Back on January 15, we decided that we wanted to minimize the sum of squared residuals. Let’s see where we would go if we decided to maximize something good… like the likelihood of observing the data, given the model parameters. This leads to Maximum Likelihood Estimation. While it relies on calculus (maximization), it is only differential calculus. Plus, maximizing the logarithm of the likelihood will tend to make the differentiation rather elementary in many cases.

While the MLE estimators of the slope and intercept look very familiar, it is MLE that is more general. OLS requires the residuals come from a Normal distribution. MLE allows for any other distribution. We will wait until Learning Module III to tackle the other distributions. The purpose here is to emphasize that we can move away from this Normality requirement.

You will also receive the take-home portion of the third examination today. It is due on Monday… the day of the in-class portion.

February 16 (Friday):

Surprise Day!

Class Notes: The in-class portion of the third examination was moved to Monday. Today, feel free to work on the take-home portion, which is due on Monday at the start of the class period.

We have no no scheduled class today.

February 19 (Monday):

Examination 3

Class Notes: This is the day of our third examination. Be prepared.

Learning Module III: Beyond the Classical Model

February 21 (Wednesday):

Generalized Linear Models: Bernoulli Dependent Variable

Class Notes: Now that we understand that GLMs force us to think more about the conditional distribution of the dependent variable, we can start to use it in a variety of situations. All we need to do is know the dependent variable… which is the #1 rule of statisticians. We need to know its conditional distribution. Once we do, we can easily fit GLMs with a couple lines of code. In many cases, it will be easier to fit using GLMs than CLMs where we have to transform the data.

Turning our focus to GLMs also allows us to explore ways of modeling non-numeric dependent variables, something that is particularly difficult with CLMs.

February 22 (Thursday):

Generalized Linear Models: Bernoulli Dependent Variable

Class Notes: If the dependent variable is Bernoulli, we have a lot more things that we can do with the results. Today, we will look at the effects of using different link functions and why we should.

February 23 (Friday):

Generalized Linear Models: Binomial Dependent Variable

Class Notes: As the course winds down, we see that we can apply GLMs to another type of dependent variable. Here, it is the Binomial dependent variable. Earlier in the course (the Weighted Least Squares day), we saw one use of Binomial dependent variables (and the necessary heteroskedasticity). Now, we see a clearer way of modeling such a dependent variable.

February 26 (Monday):

Generalized Linear Models: Poisson Dependent Variable

Class Notes: The last numeric dependent variable type we cover is the count dependent variable. The default conditional distribution is the Poisson. However, because there is a chance that the data-generating process creates overdispersion (the successes are not independent), we may need to fit with a different distribution (negative binomial) or with a different estimation technique (maximum quasi-likelihood). Ultimately, it matters little in terms of us doing the calculations. We just get the computer to do them for us.

February 28 (Wednesday):

Beyond GLMs: Nominal Dependent Variables

Class Notes: In some fields, it is important to model a variable that is nominal or ordinal. The closest we’ve come to this situation was when we modeled a dichotomous variable (Bernoulli). Going beyond this simple binary case adds a lot to the underlying mathematics.

However, it adds very little to the coding effort. Instead of using glm, we will use multinom for nominal regression. Because the mathematics are rather difficult, a lot of the flexibility of GLMs is missing. However, the basics are well-understood and work much like we would expect. The multinomial regression function requires the nnet package.

February 29 (Thursday):

Beyond GLMs: Ordinal Dependent Variables

Class Notes: It is fitting that the last lecture covers a topic that is another extension of the Bernoulli dependent variables. In some fields, it is important to model a variable that is ordinal. Instead of using glm, we will use polr for ordinal regression. Because the mathematics are so difficult, a lot of the flexibility of GLMs is missing. However, the basics are well-understood and work much like we would expect. The multinomial regression function requires the MASS package.

March 1 (Friday):

Surprise Day

March 4 (Monday):

Surprise Day

Learning Module IV: End of the Term

March 6 (Wednesday):

Project Presentations

Class Notes: This is it. Today is your day to share with all of us what you learned about your dependent variable. Spending 10 minutes on your presentation is a fine goal. A formal presentation will be useful so that you remember what to tell us about the variable and how you analyzed it. After that, we want to hear what you learned and where you are going with this line of research.

The order of presenting will be done randomly. Email me your presentation before class so we do not have to spend a long time transitioning.

March 7 (Thursday):

Project Presentations

Class Notes: We finish our presentations today.

March 8 (Friday):

Reading Day

Class Notes: Reading days were originally created by Knox College to provide students with days set aside for nothing other than bringing together all of the material learned during the term — and during the time at Knox College. Education is a process of changng yourself. It provides you with the opportunity to better see the relationships throughout the world.

If you are only intrested in short-term gains (which is better than nothing), then today is a day to cram. If you are interested in mastering the material you have learned thus far, then today is a day to reflect on what you have learned in life.

March 9 (Saturday):

Reading Day

March 10 (Sunday):

Final Examination Period

March 11 (Monday):

Final Examination Period

March 12 (Tuesday):

Final Examination Period

March 13 (Wednesday):

Start of Spring Break

Class Notes: Yay! This is the start of Spring Vacation. It will be cold in Galesburg (most likely), but this week needs to be a time for you to relax and return to your usual level of energy.

Remember that the Spring Term consists of the usual 10 (or so) weeks of Knox-level learning. This piggy-backs on the 10 (or so) weeks of Winter Term. In some ways, the Winter-Spring term is a 20-week term. Keep this in mind. Even if you are not new to Knox College, please take advantage of what we offer to help you remain consistent throughout this pair of terms.

The goal, as always, is to learn the material so that you are prepared to succeed in the future. Keep this in mind.

This page was last modified on 13 December 2023.
All rights reserved by Ole J. Forsberg, PhDd, ©2008–2023. No reproduction of any of this material is allowed without explicit written permission of the copyright holder.