Generate a dataset containing values missing not at random. Specify the column length, correlation coefficient, standard deviation, number of columns and desired probability of missing values to obtain a data frame of correlated observations with missing values.
Arguments
- len
number of rows per column
- rho
desired correlation coefficient of generated variables. The length of rho must be equal to the product of
n_vars
and half ofn_vars
minus one.- sigma
desired standard deviation for each generated variable.
- n_vars
total number of variables to be generated. At least two variables must be provided.
- na_prob
desired probability of missingness in each variable set to 10% by default.
Details
The MNAR mechanism requires missingness to be related to events that are not measured or observed.
This type of missingness cannot be determined by statistical analysis and will
produce biased estimates. gen.mnar
uses a self-selection mechanism to create values missing
not at random.Others methods, such as logistic regression models, and exponential decay models
may be suitable to generate missing values but the self-selection mechanism provides a
straight-forward implementation of the MNAR mechanism.
Examples
syn_na <- gen.mnar(50,c(.25,.75,.044),c(1.1,.56,1.56),3,.15)
summary(syn_na)
#> V1 V2 V3
#> Min. :-0.8659 Min. :-0.68721 Min. :-3.5148
#> 1st Qu.:-0.3421 1st Qu.:-0.42436 1st Qu.:-0.7494
#> Median : 0.2887 Median :-0.03281 Median :-0.1010
#> Mean : 0.2908 Mean :-0.02013 Mean : 0.1588
#> 3rd Qu.: 0.7156 3rd Qu.: 0.25424 3rd Qu.: 1.0034
#> Max. : 1.8796 Max. : 1.01316 Max. : 3.7811
#> NA's :8 NA's :8