Statistics on Models for missing data

Non-ignorable missingness

Mon, 01 Jan 0001 00:00:00 +0000

Statistics is basically a missing data problem!

– Little 2013

Nearly all samples – whether by design or by accident – are incomplete. We very rarely make a complete census of all individuals in a population or all sites on a landscape. Sometimes we don’t collect, or can’t collect, complete information for individual samples or measures. For instance, we might know an animal was alive when it was last seen, so we know it survived at least that long, but know nothing about its current status. Or we might have information on the coverage of an invasive species down to a certain patch size, beyond which patches are too small or numerous to survey.

Sampling and populations

Mon, 01 Jan 0001 00:00:00 +0000

We sample for a very practical reason. It’s usually impossible to get information on the whole population, so we use a sample to make inferences about the population. In our case, the population is typically all sites in a stratum or all sites – in all strata – at the scale of an entire park. Typically, the inference we seek entails three questions.

What’s the best estimate of the population mean?

We can generate a sample mean, \(\bar{x}\) , from our sample. This is the best estimate of the population mean.

Stratum-varying fixed effects

Mon, 01 Jan 0001 00:00:00 +0000

Assume we have three strata, \(s_0\) , \(s_1\) , and \(s_2\) , where \(s_0\) is the “reference” stratum – in other words, \(s_0\) is the stratum for which the 0/1 indicator is 0 across the board in the indicator matrix below (the first row):

\[\begin{bmatrix} 1 & 0 & 0 \\ 1 & 1 & 0 \\ 1 & 0 & 1 \end{bmatrix}\]

B_0 + (B_1 + B_1_s1_offset * s1 + B_1_s2_offset * s2) * x_1 

# in stratum s0
B_0 + (B_1) * x_1 

# in stratum s1
B_0 + (B_1 + B_1_s1_offset * s1) * x_1 

# in stratum s2
B_0 + (B_1 + B_1_s2_offset * s2) * x_1 

# lm(y~x1*x2)
model.matrix(~x1*x2, tibble(x1 = runif(5), x2 = runif(5)))

The offset term

Mon, 01 Jan 0001 00:00:00 +0000

Counts of things naturally scale with the length or duration of observation, the area sampled, and sampling intensity ( Citation: McElreath, 2018 McElreath, R. (2018). Statistical rethinking: A bayesian course with examples in r and stan. Chapman; Hall/CRC. ) . For instance, the longer the river stretch we survey, the more fish we’ll tend to find.

Offset terms are used to model rates – e.g., counts per unit area or time. In the context of the model, the offset term transforms the response variable from a rate to a count.