LM’25: Course Calendar

The Course Calendar

This page provides the proposed calendar for the course. If changes need to be made, then I will change them here and announce the changes during class. If you would like, you may download the entire calendar as an Excel file. That will allow you to see the course in a different light.

Term: Winter 2025
Update: January 29, 2025

Learning Module O: Review, Plus Ultra!

January 5 (Sunday):

Initial Thoughts on the Course

Class Notes: There are no classes today; it is the last day of Winter Break. Today makes a great day to make your “New-Term Resolutions.”

Thinking back over the past several terms, how will you change to make this term a success for you? This is such an important question that I may ask it on the first day of the term (Wednesday) in the form of a quiz. When you start a professional job somewhere, you will need to convey how you will help the company grow and how the company will help you grow. Good jobs will provide opportunities for growth to you. Make sure you take advantage of them.

Also:

How will you treat this course differently?

Read:LMARK: Chapter 1 and Appendix R

January 6 (Monday):

Matrix Review, Plus Ultra!

Class Notes: Today, we start our journey into linear models and what we can do with them. To ensure that you have the needed background information — or at least know where to find it — we start with a week of review (and more). The “and more” indicates that we will already be learning new facts. While most of this review will have been covered in the prerequisites, not all will have. Also, the amount of prior information will be dependent on the faculty member who taught you the prerequisites. Thus, this will be 60% new for some of you, but 15% new for others. Stay aware of this as we move through this week.

Today, we start with a review on matrices. This means we will cover the basics of adding and multiplying them, as well as more advanced topics. You will quickly learn that you need to read the assigned sections before class so that you are prepared for class.

Read:LMARK: Appendix M.1–4

January 8 (Wednesday):

Matrix Review, Plus Ultra!

Class Notes: We finish the review (plus ultra!) of matrices. Please remember that everyone will learn new things in this review. No one will have learned it all in the past. However, everyone should know some of this material from past courses. The purpose of prerequisites is to give you the necessary background to a course. While it is possible to succeed in a course without the required knowledge, it is much more difficult. So, for those without MATH 185, you will need to step-up and work three-times as hard in the beginning. Similarly, those who have had MATH 185, but remember little of it, will need to work harder than those who took it and remember it all.

Everything in this review will be used in this course. Thus, you will want to learn it to the point that you can either recall the information from your working memory or at least remember where in the appendix the information can be located.

Read:LMARK: Appendix M.4–5
Due: Homework Assignment A1
- Assignment note: This chapter assignment does not require that you use LaTeX. You must type the solutions, but you do not need to use LaTeX.

January 9 (Thursday):

Matrix Review, Plus Ultra!; LaTeX and R

Class Notes: Today, we move on to learning how to use LaTeX to properly typeset our work. While the first homework assignment does not require you to use LaTeX, all others do. The simple reason is that LaTeX is the program of choice for mathematicians, statisticians, engineers, computer scientists, etc. for presenting research. While the overhead is greater than for MS Word (and its ilk), the ease at which one can include equations, graphics, and the like is much greater — plus, it allows you finer control over everything on the page.

January 10 (Friday):

Statistics Review, Plus Ultra!

Class Notes: Let’s move from matrices to statistics. These two appendices actually complement each other. As we move forward in the review, you will need to refer to both appendices. The reason for this integration of material is that this course focuses on linear models, which uses both matrices for calculations and statistics for interpretation.

If you have had STAT 200 from me, this review will be easier than those who did not. Note, however, that no one will know all of this information coming into this course. Everyone will be reviewing something and learning something… everyone. This means y’all will need to work hard to learn that new material.

Today, we start with the basics of statistics, leaving the distributions for tomorrow. This means sample statistics and population parameters will be covered in detail. Look for the purpose of sample statistics. This will help you better understand the formulas involved. Also, use your mathematical background to understand the formulas. All formulas tell a story about what is being measured. The sooner you can read mathematical formulas, the better.

Read:LMARK: Appendix S.1–3

January 13 (Monday):

Statistics Review, Plus Ultra!

Class Notes: We shall spend the entire day on probability distributions. Those who took STAT 200 will recognize some of them. Those who took MATH 321 will recognize others. Ultimately, the probability distributions we cover will reflect either the data or the test statistics. As such, you will need to distinguish between the two types. You will also need to be aware of the parameters involved in each distribution. These parameters include those values that define the shape of the distribution, as well as those values we care about estimating.

Spend time learning the definitions of the distributions. These are the wordy things, not the probability functions. Learning the definitions of the distributions will help you know when to use each distribution... and when to ignore each distribution.

Read:LMARK: Appendix S.4
Due: Research Question
The first step in your research project is due today. It is the research question. A research question is a question that your research will answer.

January 15 (Wednesday):

Statistics Review, Plus Ultra!

Class Notes: Finally, we complete the review (plus ultra!) today by covering sections 5 and 6 in the statistics appendix. Today, in many ways, is the point of the second-half of last week. We will use the definitions of sample statistics to find their distributions (under usual assumptions), thus allowing us to estimate parameters.

Note that ths is the entire point of statistics — to estimate the population parameter, given a sample. Because the sample is random, the sample statistics will be random variables. Thus, statistics uses this randomness, describes it (knows it), and estimates what we really care about. See that these distributions contain both the best estimate of the parameter and the uncertainty in our estimate. These distributions also allow us to test hypotheses about the parameters.

In fact, understanding the distribution of the statistics tells us almost everything we need to know about the parameter.

Read:LMARK: Appendix S.5–6
Due: Homework Assignment A2

Learning Module I: Ordinary Least Squares

January 16 (Thursday):

Scalar Representation, I

Class Notes: And so, with the review (plus ultra!) behind us, we are able to start learning about linear models. The plan is to explore — in detail — the simplest case first. Then, once we fully understand the simplest case, we move on to modifications.

Today, we start with simple linear regression (SLR) ignoring matrices. We do this so that we better understand the inner-workings of the mathematics, allowing us to focus on the assumptions made in creating the SLR model. We will need to use differential calculus to optimize our estimates. The biggest question is “What do we mean by optimize?“ What are we optimizing?

We are calculating the “line of best fit,” but what do we mean by “best”? How we define best determines what we optimize and what properties the line of best fit has.

Read:LMARK: 2.1

January 17 (Friday):

Scalar Representation, II

Class Notes: Today, we finish the derivation of the OLS estimators in SLR, including understanding the assumptions we had to make to achieve these results. Note that the assumptions we made are actually requirements that must be met when using this modeling paradigm. ⟵ Important point.

Once we finish the derivation using scalars, we note that we now are able to model a single dependent variable using a single independent variable… as long as certain requirements are met. However, if we have two independent variables, we will have to redo the entire derivation, running into a lot more work. And, if we have three independent variables, we have to (again) start from scratch.

So, in lieu of having to do this over and over again, we use matrices. While matrices offer a clean and extensible solution to any number of independent variables, we will focus on the SLR case to show the connection between the two and to see what the assumptions/requirements are in the extended case.

This, we get results from one assumption; run out of things we can learn; make more assumptions; play with the math learning things; etc. This is one way that applied mathematics advances.

Read:LMARK: 2.2–2.5

January 20 (Monday):

Examination 1

Class Notes: It is time for the first examination of the course. It is entirely in-class (a shortened class). It is also only 50 points. It covers the material in Chapter 1 and the three appendices (with a heavy focus on M and S). Be prepared to show me that you have the mathematical and probabilistic basics understood.

January 22 (Wednesday):

OLS with Matrices, I

Class Notes: Today, we tackle OLS as a matrix equation. This will be useful because it means we can use as many independent variables as we wish. We are no longer constrained to just one. The drawback is... Well, I’m not sure what a drawback is in using matrices to obtain our OLS estimators.

Read:LMARK: 3.1
Due: Homework Assignment A3

January 23 (Thursday):

OLS with Matrices, II

Class Notes: Last time, we found the OLS estmators in matrix form. This allowed us to derive the results from the last two days rather quickly. Ultimately, the awesome part about the matrix representation is that we are no longer limited to simple linear regression (SLR; a single independent variable). The OLS estimators can now be easily calculated for any number of independent variables.

… as long as the independent variables are linearly independent.

This requirement will be an issue if p > n… if the number of parameters to be estimated is greater than the number of data points. If you find yourself in the case where p > n, then you cannot use any of the regression covered in this course. You will need to make additional assumptions or use some specialized technique, like Bayesian analysis. In these cases, one is able to estimate more parameters than information because strong assumptions are being made about the relationships amongst the variables. But, again, this is beyond the scope of this course.

Today, we finish estimating the parameters in OLS and examining what they mean.

Read:LMARK: 3.2, 3.3

January 24 (Friday):

OLS with Matrices, III

Class Notes: Today, we will look at this thing called the “hat” matrix, H. Since it has a name, it must be important. Interestingly enough, it is very important in OLS regression. It is used to calculate the expected value matrix as well as the residuals matrix.

Looking closely at H, and using what we know from Appendix M.4, we will see that the geometric representation of OLS shows us that OLS gets us as close to reality as we can get while staying on the floor. This is the unexpected strength of OLS. However, its assumptions (requirements) may not be met in real life. As such, without meeting those requirements, the benefits of OLS do not exist.

I assigned Section 3.4 to have you work through a full example yourself. Please do so, writing down all questions you have.

Read:LMARK: 3.4

January 27 (Monday):

Activity: GDP and Latitude

Class Notes: Today is the day to do SCA 1. The purpose of this activity is to have you check that your notes contain all of the important formulas, and to see that the output from the computer is not magic. You have the capability to calculate all of those numbers yourself. Well, at least most of the numbers. In the future, you will see where the rest of the numbers come from.

January 29 (Wednesday):

Assume a Sperical Cow

Class Notes: [...in a vacuum] Today, we celebrate the end of Chapter 3 with an in-class activity designed to get you working with R to perform an analysis. In this activity, we will also discuss interpreting the output and the importance of control variables in our models. This should bring back some memories of STAT 200 as well as some of the readings for this course. We will return to this activity in the future to see what else we can learn from it.

Due: Homework Assignment A4

January 30 (Thursday):

Surprise Day Off

Class Notes: There is no formal class today. Yesterday’s activity is due by the end of the class. I will be in my office, SMC E-219. If you have questions about tomorrow’s test, then you should seek me out… or make sure to utilize the CTL embedded tutor tonight.

January 31 (Friday):

Examination 2

Class Notes: This is our second examination. It covers everything through Wednesday.

Due: Bibliography
The Annotated Bibliography is your chance to earn points for your sources. Everything submitted (emailed) by the end of the day will receive points. Remember that your bibliography needs to be in a .bib file, properly formatted. Anything not submitted as a .bib file will receive 0 points.

February 3 (Monday):

Probability Distributions and Test Statistics

Class Notes: Why do we care about these estimators? What do they tell us about our acucracy and our precision? These are questions concerning the (sampling) distribution of these estimators. From these sampling distributions, we are able to better understand the uncertainty in our estimates. This, fundamentally, is the goal of statistics.

Knowing the distribution of the sample statistic… which leads to knowing the distribution of the test statistic… leads to knowing the endpoints of the confidence interval. NOTE THAT the confidence interval depends on knowing the distribution of the sample statistic. This is the key purpose of the first couple of chapters of this course... KNOW THE DISTRIBUTION.

Read:LMARK: 4.1, 4.2

February 5 (Wednesday):

Interval Estimation

Class Notes: Now that we know the sampling distribution, what can we know about testing hypotheses? Today's lecture covers the importance of knowing the distribution of the sample statistic (b₁ or whatever).

Calculating the p-value for a given null hypothesis is an important step in understanding the randomness around us. In STAT 200, you did this for several different hypotheses, including one population means, two population means, goodness-of-fit, and proportions (one and two). Now, we will do it for the effect parameters (the β). We will also see how to get R to perform those calculations for us.

Read:LMARK: 4.3, 4.4
Due: Homework Assignment A5

February 6 (Thursday):

Requirements of our OLS Model

Class Notes: We have gotten to this point of knowledge by using mathematics to learn all we can, then making some assumptions, and learning all we can with those assumptions, etc. However, when we make those assumptions, we need to check that these requirements are not violated by our data and model. This lecture covers the three requirements: Normality of residuals, homoskedasticity of residuals, and constant expected value of residuals. All three are about the residuals. All three can be easily tested. The tests are the Shapiro-Wilk test, the Breusch-Pagan test, and the runs test.

Read:LMARK: Chapter 5

February 7 (Friday):

Activity: GDP and Latitude, Part Deux

Class Notes: Today, we will do the second statistical computing activity of this term. In this SCA, you will again draw connections between the matrix calculations covered in class and the statistical output from R.

In many ways, this SCA is a review and continuation of the previous. I designed it this way to help you see the importance of saving your work, commenting it (for understanding), and expanding it. In your analysis lives, you will be faced with complicated analyses. These analyses will be done over a period of several weeks or months. It is important to create good habits here so that you will have them when they count.

Read:LMARK: Chapter 6

February 10 (Monday):

Adjustments for Boundedness

Class Notes: The topic today is to figure out how to adjust the dependent variable when it is bounded (once or twice). Remember that the residuals must come from a Normal distribution, which is unbounded. Bounded dependent variables are a violation of this requirement. Transforming the dependent variable appropriately can change a bounded variable into an unbounded one. There are really only three types of transformations: linear, logarithmic, and logit. The one you use depends on the bounds of your dependent variable.

Read:LMARK: Chapter 7
Due: Thesis Statement
You have done your reading, as evidenced by your bibliography. Now, you have a proposed answer to your research question. This is your thesis statement. Note that a thesis statement is a single sentence that answers your research question and will be supported by your analysis and paper.

February 12 (Wednesday):

Day of Dialog

Class Notes: [listen, too]
Today is dedicated to engendering a better understanding of each other and of ourselves in this world. Discussions will be held around campus today to support this goal.

We are all human, with all that entails. We are affected by our personal history… as well as by the history that led to us personally. There is a very interesting anime called Astra Lost in Space in which the main characters grow up in a world without history. It is very thought-provoking. How are we a product of our history. More importantly: How would we be different with either a different history or with no history?

Or, for anime fans, this is interesting because the two main characters are voiced by Yato (a god) from Noragami and Sadao Maou (a devil) from The Devil is a Part-Timer!. It is a fascinating combination… along with the new up-and-coming voice-actor Ciarán Strange.

Due: Homework Assignment A6
- Assignment note: This needs to be submitted to me (handed to me hard copy) by the time I go home. I should be in my OM 105 office until 4pm today.

Learning Module II: Beyond the Ordinary

February 13 (Thursday):

Other Least Squares

Class Notes: We’ve adjusted for some shortcomings of OLS. Here, we become pro-active. Instead of trying to adjust for things like heteroskedasticity, we can just model it properly. This solution leads to weighted least squares, WLS. While it is a very powerful technique to know, there are better.

This will be a math day with an ending in R. Today also marks the last time we explicitly deal with matrices. Make no mistake about it, however, matrices underly everything in linear models. Some in the Department of Mathematics would go so far as to claim Linear Models is just Linear Algebra. I shun those people. 😸

Read:LMARK: Chapter 8

February 14 (Friday):

Quantile Regression

Class Notes: OLS only allows us to model the expected value of the dependent variable. Frequently, we would prefer to model one of the quantiles. Being able to do so is very important in studying things like poverty, things that happen at one extreme or another (or even the median). This class covers quantile regression.

The mathematics are beyond the scope of this class (except for some hand-waving). Today’s lecture is to show you another option in regression (and ANOVA). In lieu of modeling the expected value (mean) of the dependent variable, you will have the skills to model one of the quantiles— the one of your choice. A handy tool for the toolbox.

Read:LMARK: Chapter 9

February 17 (Monday):

Activity: TBD

Class Notes: Today, we will have another in-class activity.

Read:LMARK: Chapter (skim)
Due: Data Collected
To prove your thesis statement, you need data. Today, you need to have collected your data.

Learning Module III: Beyond the Classical Model

February 19 (Wednesday):

Generalized Linear Models: An Introduction

Class Notes: Before this, we have been working with the Classic Linear Model, usually fit using ordinary least squares (last class, we used the CLM, but fit using maximum likelihood estimation). Today, we scrap the CLM and generalize it to something better. We move from CLM to the GLM… the generalized linear model. The key difference is that the dependent variable is no longer restricted to being conditional Normal. It can follow many other distributions.

Today, however, we stick the mathematics of GLMs so that we understand a bit of what is happening “under the hood” in our software. So, today is a math day.

Read:LMARK: Chapter 11
Due: Homework Assignment A7

February 20 (Thursday):

Generalized Linear Models: Bernoulli Dependent Variable

Class Notes: Now that we understand that GLMs force us to think more about the conditional distribution of the dependent variable, we can start to use it in a variety of situations. All we need to do is know the dependent variable… which is the #1 rule of statisticians. We need to know its conditional distribution. Once we do, we can easily fit GLMs with a couple lines of code. In many cases, it will be easier to fit using GLMs than CLMs where we have to transform the data.

Turning our focus to GLMs also allows us to explore ways of modeling non-numeric dependent variables, something that is particularly difficult with CLMs.

Read:LMARK: 12.1–12.4

February 21 (Friday):

Activity: TBD

Class Notes: Receive the take-home version of the test today in class. Before you recieve it, however, there is an activity for you to do. You need to finish the activity before I give you the take-home.

Due: Paper Outline
Due today is the outline of your paper. The purpose of this full-sentence outline is to make sure you are able to prove your thesis statement. This is where you are able to evaluate how persuasive your argument (analysis) is. Spend time moving the pieces around to make your argument stronger.

February 24 (Monday):

Examination 3

Class Notes: This is the day of our third examination. Be prepared. It covers everything through Friday.

February 26 (Wednesday):

Generalized Linear Models: Bernoulli Dependent Variable

Class Notes: If the dependent variable is Bernoulli, we have a lot more things that we can do with the results. Today, we will look at the effects of using different link functions and why we should.

Read:LMARK: 12.5–12.7
Due: Homework Assignment A8

February 27 (Thursday):

Generalized Linear Models: Binomial Dependent Variable

Class Notes: As the course winds down, we see that we can apply GLMs to another type of dependent variable. Here, it is the Binomial dependent variable. Earlier in the course (the Weighted Least Squares day), we saw one use of Binomial dependent variables (and the necessary heteroskedasticity). Now, we see a clearer way of modeling such a dependent variable.

Read:LMARK: Chapter 13

February 28 (Friday):

Generalized Linear Models: Poisson Dependent Variable

Class Notes: The last numeric dependent variable type we cover is the count dependent variable. The default conditional distribution is the Poisson. However, because there is a chance that the data-generating process creates overdispersion (the successes are not independent), we may need to fit with a different distribution (negative binomial) or with a different estimation technique (maximum quasi-likelihood). Ultimately, it matters little in terms of us doing the calculations. We just get the computer to do them for us.

Read:LMARK: Chapter 14

March 3 (Monday):

Beyond GLMs: Nominal Dependent Variables

Class Notes: In some fields, it is important to model a variable that is nominal or ordinal. The closest we’ve come to this situation was when we modeled a dichotomous variable (Bernoulli). Going beyond this simple binary case adds a lot to the underlying mathematics.

However, it adds very little to the coding effort. Instead of using glm, we will use multinom for nominal regression. Because the mathematics are rather difficult, a lot of the flexibility of GLMs is missing. However, the basics are well-understood and work much like we would expect. The multinomial regression function requires the nnet package.

Read:LMARK: 15.1

March 5 (Wednesday):

Beyond GLMs: Ordinal Dependent Variables

Class Notes: It is fitting that the last lecture covers a topic that is another extension of the Bernoulli dependent variables. In some fields, it is important to model a variable that is ordinal. Instead of using glm, we will use polr for ordinal regression. Because the mathematics are so difficult, a lot of the flexibility of GLMs is missing. However, the basics are well-understood and work much like we would expect. The multinomial regression function requires the MASS package.

Read:LMARK: 15.2
Due: Homework Assignment A9

March 6 (Thursday):

Surprise Day Off

Class Notes: So, here we are at the end of the term. You have learned a lot this term. This goes beyond what is in the textbook. Listening to me, you also saw how a statistician thinks (when they do). Mathematics is important in statistics, but so is being able to use the computer to model the problems around us. Always be aware of the constraints on the methods you are using… and how important they are (or are not).

Spend today creating/polishing your slides and presentation.

March 7 (Friday):

Surprise Day Off

Class Notes: This is a second day dedicated to you making sure you are ready for Monday’s presentation.

Learning Module IV: End of the Term

March 10 (Monday):

Project Presentations

Class Notes: This is it. Today is your day to share with all of us what you learned about your dependent variable. Spending 10 minutes on your presentation is a fine goal. A formal presentation will be useful so that you remember what to tell us about the variable and how you analyzed it. After that, we want to hear what you learned and where you are going with this line of research.

The order of presenting will be done randomly. Email me your presentation before class so we do not have to spend a long time transitioning.

Due: Research Presentation

March 11 (Tuesday):

Homework Due!

Class Notes: Note that the final regular assignment is due today, even though we do not have class today. I am willing to accept this assignment until Wednesday at 9:10am without counting it late. So, feel free to take advantage of this offer.

Read:LMARK: Chapter 15
Due: Homework Assignment A10

Learning Module V:

March 12 (Wednesday):

Reading Day

Class Notes: Reading days were originally created by Knox College to provide students with days set aside for nothing other than bringing together all of the material learned during the term — and during the time at Knox College. Education is a process of changng yourself. It provides you with the opportunity to better see the relationships throughout the world.

If you are only intrested in short-term gains (which is better than nothing), then today is a day to cram. If you are interested in mastering the material you have learned thus far, then today is a day to reflect on what you have learned in life.

March 13 (Thursday):

Reading Day

March 14 (Friday):

Final Examination Period

March 15 (Saturday):

Final Examination Period

March 16 (Sunday):

Final Examination Period

Class Notes: This is the scheduled day of the final examination. Since it is take-home, you do not actually have to be in the classroom from 7:00pm until 10:00pm today. You just have to make sure I receive your examination by 10:00pm today.

Due: Final Paper
The final version of your paper is due to me by the end of the exam period (10:00pm). There are no extensions!! It must be done in LaTeX, using my specific style file. Not doing this will have you lose 25% of the grade.

March 17 (Monday):

Start of Spring Break

Class Notes: Yay! This is the start of Spring Vacation. It will be cold in Galesburg (most likely), but this week needs to be a time for you to relax and return to your usual level of energy.

Remember that the Spring Term consists of the usual 10 (or so) weeks of Knox-level learning. This piggy-backs on the 10 (or so) weeks of Winter Term. In some ways, the Winter-Spring term is a 20-week term. Keep this in mind. The goal, as always, is to learn the material so that you are prepared to succeed in the future. Keep this in mind.


	Ole J. Forsberg, PhD Associate Professor Chair of Data Sciences Associate Dean for Faculty Affairs Office: SMC E-219 Knox College 2 East South Street Campus Box K-6 Galesburg, IL, USA, 61401-4999	Some Links Knox College of Illinois Department: Mathematics Program: Data Science Program: Statistics R for Starters Project Scarlet Elections @Knox

STAT 225: This Course’s Calendar

The Course Calendar

Learning Module O: Review, Plus Ultra!

Learning Module I: Ordinary Least Squares

Learning Module II: Beyond the Ordinary

Learning Module III: Beyond the Classical Model

Learning Module IV: End of the Term

Learning Module V: