Practicum Assignment 2: The Data-Generating Process
As you may have guessed by now, knowing the distribution of the data-generating process is very important in understanding that process. Understanding the process is very important in being able to predict what will happen in the future. Predictions based on an Exponential distribution may be vastly different from those based on a Uniform or Binomial distribution. Knowing the difference helps the researcher (a.k.a. the explorer) better understand the relationships between — and among — the variables.
This practicum activity has you examine three variables and determine the most likely distribution that gave rise to each. For this practicum, you are limited to the following five distributions we covered: Binomial, Exponential, Normal, Poisson, and Uniform.
Before continuing, make sure you know the parameter(s) of each of the five distributions. The parameters of a distribution are the variables that define its shape and location. Thus, for the Binomial distribution, the two parameters are n and p; for the Poisson, the one parameter is λ; for the Exponential, the one parameter is λ, etc.
Collecting the Data
You can download the data for this activity here and in your Practicum folder on your computer (or USB drive). You downloaded it for the previous practicum. So, you should have a copy in your Practicum folder as well as in your Practicum 1 folder. Place another copy in your Practicum 2 folder. Work from this latest copy.
This dataset contains ten variables indicating the date: fullDate, day, date (day of month), month, year, and doy (day of year). For each day at my Galesburg restaurant (Such a wonderful view!), The Lamplighter, which is located on the southwest corner of Losey St. and Lake St., I have measured four other variables: grossSales (the amount of sales made that evening, before subtracting expenses), netProfit (the evening’s profit after expenses), customers (the number of customers served that evening), and servers (the number of servers starting that evening). The 876 dates run from August 6, 2021, until December 30, 2023.
Analyzing the Data
For each of the following variables, create a professional-looking histogram, determine the most likely distribution for the data, and estimate the value(s) of the distribution’s parameter(s). The three variables are the gross sales, the net profits, and the number of customers.
Make sure that you defend your choice of distribution. In fact, spend at least one separate paragraph defending your selection (important). This means you find characteristics of that distribution and explicitly tie them to characteristics of the data.
Checklist
In additional to the general checklist (see the Overview), here is the particular checklist for this practicum activity.
- For gross sales:
- Proper histogram created
- Most-likely distribution named and defended
- Parameter(s) of that distribution properly estimated
- Explanation of what this information tells us about the gross sales
- For net profits:
- Proper histogram created
- Most-likely distribution named and defended
- Parameter(s) of that distribution properly estimated
- Explanation of what this information tells us about the net profits
- For the number of customers:
- Proper histogram created
- Most-likely distribution named and defended
- Parameter(s) of that distribution properly estimated
- Explanation of what this information tells us about the number of customers
- Additional Notes:
- all
R
Code is included in the appendix - there should be no code in the prose
- in the prose, use the names of the variables (e.g., net profit), not their codes (e.g., netProfit)
- in the code, use their codes (e.g., netProfit)
- make sure the code is properly commented (you saw an example of this in the first activity)
- all
Special Note
For this practicum activity only, provide the title page, as usual, but start a new page for each variable. Thus, there should be about six pages in the document for this assignment (do not forget the script). Make sure the graphics are large enough for the reader to read. Also, make sure they look professional. Check the example Practicum 2 Activity to make sure you are presenting your results fully and in a meaningful manner. Again, follow the structure carefully, even if the example covers different data and variables.
Again, the example is not perfect. Thus, if all you do is echo the example, you will not receive 100%. As the course progresses, I expect you to make these submissions even more professional looking.