Handling Probabilities in R
Purpose
This page provides a list of the distributions we cover in class, how to use R
with them, and where to get additional information. It is up to you to know which distribution is to be used and whether a probability, a cumulative probability, or a quantile is to be calculated, or a random number is to be drawn.
If you would like to download the pdf document so that you can print it off, here is the link. This document will include additional information about these probabilities:
By the way, if you are interested in practicing identifying distributions from their graphics, please click on the icon below.
Discrete Distributions
A discrete random variable is only able to take specific values. These values may be integers or decimals. These values may be finite or infinite. Ultimately, discrete random values allow for the concept of “next.” These are the discrete distributions discussed on this page:
Binomial Distribution
X ~ Bin(n,p)
A Binomial random variable models the number of successes in a specific number of trials. There are five requirements for a random variable to follow a Binomial distribution:
- The number of trials, n, is known.
- Each trial results in either a success or a failure.
- The probability of a success is constant across the trials.
- The trials are independent (of each other).
- The random variable is the number of successes.
If your random variable follows a Binomial distribution, then it has two parameters that define it. These are the number of trials and the success probability. In class, these are symbolized as n and p. In R
, they are symbolized as size
and prob
.
- To calculate a point probability, P[X = x] = p, run
dbinom(x, size, prob)
to get the value of p - To calculate a cumulative probability, P[X ≤ q] = p, run
pbinom(q, size, prob)
to get the value of p - To calculate a quantile, P[X ≤ q] = p, run
qbinom(p, size, prob)
to get the value of q - To create a random sample of size n, run
rbinom(n, size, prob)
Poisson Distribution
X ~ Pois(λ)
A Poisson random variable models the number of successes in an area or a time period — not in a given number of trials.
If your random variable follows a Poisson distribution, then it needs just one parameter to define it. It is the average rate. In class, this was symbolized as λ. In R
, it is lambda
.
- To calculate a point probability, P[X = x] = p, run
dpois(x, lambda)
to get the value of p - To calculate a cumulative probability, P[X ≤ q] = p, run
ppois(q, lambda)
to get the value of p - To calculate a quantile, P[X ≤ q] = p, run
qpois(p, lambda)
to get the value of q - To create a random sample of size n, run
rpois(n, lambda)
Hypergeometric Distribution
X ~ Hyper(N, k, n)
A Hypergeometric random variable models the number of successes in a specific number of trials in which the population is finite and repetition is not allowed (the same element can be selected at most once). There are five requirements for a random variable to follow a Hypergeometric distribution:
- The number of trials, k, is known.
- Each trial results in either a success or a failure.
- The population is finite and repetition is not allowed.
- The trials are independent (of each other).
- The random variable is the number of successes in the trials.
If your random variable follows a Hypergeometric distribution, then it needs three parameters to defines it. These are m
, the number of successes in the population, n
, the number of failures in the population, and k
, the sample size.
- To calculate a point probability, P[X = x] = p, run
dhyper(x, m, n, k)
to get the value of p - To calculate a cumulative probability, P[X ≤ q] = p, run
phyper(q, m, n, k)
to get the value of p - To calculate a quantile, P[X ≤ q] = p, run
qhyper(p, m, n, k)
to get the value of q - To create a random sample of size nn, run
rhyper(nn, m, n, k)
Geometric Distribution
X ~ Geom(p)
A Geometric random variable models the number of failures until the first success. There are four requirements for a random variable to follow a Geometric distribution:
- Each trial results in either a success or a failure.
- The probability of a success is constant across the trials.
- The trials are independent (of each other).
- The random variable is the number of failures until the first success.
If your random variable follows a Geometric distribution, then it has just one parameter that defines it. It is the success probability. In class, this is symbolized as p. In R
, it is symbolized as prob
.
- To calculate a point probability, P[X = x] = p, run
dgeom(x, prob)
to get the value of p - To calculate a cumulative probability, P[X ≤ q] = p, run
pgeom(q, prob)
to get the value of p - To calculate a quantile, P[X ≤ q] = p, run
qgeom(p, prob)
to get the value of q - To create a random sample of size n, run
rgeom(n, prob)
Continuous Distributions
A continuous random variable can take on any values in an interval. These are the discrete distributions discussed on this page:
- Uniform
- Exponential
- Normal (Gaussian)
Probability Statements
Before we get started, let us look at four possible probability statements and how to calculate them in general. Remember that F(x) is the cumulative distribution function (CDF).
P[ X ≤ a ] | = F(a) | |
P[ a < X ] | = 1 − P[X ≤ a] | = 1 − F(a) |
P[ a < X ≤ b ] | = P[X ≤ b] − P[X ≤ a] | = F(b) − F(a) |
P[ X ≤ a or b < X ] | = 1 − (P[ a < X ≤ b ]) | = 1 − ( F(b) − F(a) ) |
Uniform Distribution
X ~ Unif(a, b)
A Uniform random variable models random variables that have a constant likelihood between two specified values. If your random variable follows a Uniform distribution, then it needs two parameters to define it: the lowest possible value and the highest possible value. In class, these were symbolized as a and b. In R
, they are min
and max
.
- To calculate a density, f(x) = p, run
dunif(x, min, max)
to get the value of p - To calculate a cumulative probability, F(q) = P[X ≤ q] = p, run
punif(q, min, max)
to get the value of p - To calculate a quantile, F(q) = P[X ≤ q] = p, run
qunif(p, min, max)
to get the value of q - To create a random sample of size n, run
runif(n, min, max)
Exponential Distribution
X ~ Exp(λ)
An Exponential random variable models the time until a success. If your random variable follows an Exponential distribution, then it needs just one parameter to define it. It is the average rate. In class, this was symbolized as λ. In R
, it is rate
.
- To calculate a density, f(x) = p, run
dexp(x, rate)
to get the value of p - To calculate a cumulative probability, F(q) = P[X ≤ q] = p, run
pexp(q, rate)
to get the value of p - To calculate a quantile, F(q) = P[X ≤ q] = p, run
qexp(p, rate)
to get the value of q - To create a random sample of size n, run
rexp(n, rate)
Normal (Gaussian) Distribution
X ~ Norm(μ, σ)
A Normal or a Gaussian random variable is useful to model measures of center. It turns out that it can be used to model the sums of types of random variables (thanks to the CLT).
If your random variable follows a Normal distribution, then it needs two parameters to define it. The two parameters are the expected value and the standard deviation. In class, we symbolized them as μ and σ. In R
, they are mean
and sd
.
- To calculate a density, f(x) = p, run
dnorm(x, mean, sd)
to get the value of p - To calculate a cumulative probability, F(q) = P[X ≤ q] = p, run
pnorm(q, mean, sd)
to get the value of p - To calculate a quantile, F(q) = P[X ≤ q] = p, run
qnorm(p, mean, sd)
to get the value of q - To create a random sample of size n, run
rnorm(n, mean, sd)