Introductory Statistics

 

[The Learning Modules]
The Learning Modules

This course is broken up into five separate learning modules (LM). While each module covers a specific set of topics, the material in this course builds on itself. Thus, time spent earlier in the course will make later parts easier to master.

I. Getting Your Data

Statistics is the study of data. Before we can study your data, it must be collected. While this part of the course provides a lot of terminology to learn, its most important topic is how to properly collect data. We will return to this several times over the course, simply because sampling is so important.

[sampling] LM1: Sampling The first part deals with gathering data. While this may seem rather elementary, we discover that choices on collection method impact our conclusions. This is why we spend time examining how we can properly — and cheaply — collect a representative sample from the larger population.

II. Understanding Your Data

This part of the course is dedicated to helping you understand your data. “Understanding” your variables requires you to be able to numerically summarize them, provide appropriate graphics illustrating them, and posit the statistical mechanism generating them. In these two learning module, we cover these three topics in detail.

[data] LM2: Knowing Your Data Before any supportable analysis can take place, the analyst must know the data. This includes summary statistics for each variable, correlations between the variables, and meanings of the values and of the missing values. This also includes being able to create graphics that “tell the story of the data.”

[probability] LM3: Probability Theory In addition to describing your data, there is a need to understand the “data-generating process.” This is the method by which the data came into being. Being able to describe this process means that the data are better known… including future data points not already collected. Understanding the data-generating process is the final step before us being able to draw conclusions about the population based on our small sample.

III. Analyzing Your Data

This final part of the course is dedicated to learning how to draw conclusions about the population when all you have is a representative sample from that population. This will allow you to estimate population parameters and provide indicators of precision. It will also allow you to test hypotheses made about the data.

[inference i] LM4: Introductory Inference Now that we understand our data and the process that created our data — both our sampling method and the data-generating process — we can begin drawing conclusions about the population based on the sample. These conclusions take the form of confidence intervals and/or hypothesis tests. The former consists of a set of reasonable values for the population parameter. The latter is a conclusion regarding the claim made about reality.

[inference ii] LM5: Advanced Inference The final learning module continues our explorations into drawing data-based conclusions about the world around us. New procedures are introduced, each is tailor-made for a specific type of data and hypothesis. Whlie there does exist a generic procedure that applies to all types of confidence intervals and hypothesis tests — the bootstrap — they tend to be of low power. Thus, time should be spent leveraging all of what we have learned in this course to create procedures that are as powerful as possible.

This page was last modified on 2 January 2024.
All rights reserved by Ole J. Forsberg, PhD, ©2008–2024. No reproduction of any of this material is allowed without explicit written permission of the copyright holder.