MAR check using logistic regression — check.mar • describe

We can informally test for data missing at random (MAR) by creating a binary variable that represents missing data (1) and non-missing data (0). We then perform a logistic regression using the binary variable as our target to obtain p-values. Pairwise comparisons are performed on each unique pair and p-values are obtained after applying the Dunn-Šidák correction.

This test is not a formal test used to check MAR. Additionally, tests for missingness may not be very practical and do not substitute knowledge of the field and other much better joint tests dealing with missingness.

Usage

check.mar(data, digits = 3)

Arguments

data: a data frame or matrix of at least two columns
digits: significant figures used in decimals

Value

a square matrix with p-values across pairwise comparisons

Examples

set.seed(123)
data <- data.frame(x1 = stats::rnorm(100),x2 = stats::rnorm(100),y = stats::rnorm(100))
data$x1[sample(1:100, 20)] <- NA
data$x2[sample(1:100, 15)] <- NA
data$y[sample(1:100, 10)] <- NA

check.mar(data, digits = 2)
#> Adjusted p-values of pair-wise comparisons 
#>         x1 x2    y is_na
#> x1      NA  1 0.69     1
#> x2    1.00 NA 1.00     1
#> y     0.69  1   NA     1
#> is_na 1.00  1 1.00    NA