Skip to contents

Generate a dataset containing values missing not at random. Specify the column length, correlation coefficient, standard deviation, number of columns and desired probability of missing values to obtain a data frame of correlated observations with missing values.

Usage

gen.mnar(len, rho, sigma, n_vars, na_prob = 0.1)

Arguments

len

number of rows per column

rho

desired correlation coefficient of generated variables. The length of rho must be equal to the product of n_vars and half of n_vars minus one.

sigma

desired standard deviation for each generated variable.

n_vars

total number of variables to be generated. At least two variables must be provided.

na_prob

desired probability of missingness in each variable set to 10% by default.

Value

a data frame of at least 2 columns

Details

The MNAR mechanism requires missingness to be related to events that are not measured or observed. This type of missingness cannot be determined by statistical analysis and will produce biased estimates. gen.mnar uses a self-selection mechanism to create values missing not at random.Others methods, such as logistic regression models, and exponential decay models may be suitable to generate missing values but the self-selection mechanism provides a straight-forward implementation of the MNAR mechanism.

Examples

syn_na <- gen.mnar(50,c(.25,.75,.044),c(1.1,.56,1.56),3,.15)
summary(syn_na)
#>        V1                V2                 V3         
#>  Min.   :-0.8659   Min.   :-0.68721   Min.   :-3.5148  
#>  1st Qu.:-0.3421   1st Qu.:-0.42436   1st Qu.:-0.7494  
#>  Median : 0.2887   Median :-0.03281   Median :-0.1010  
#>  Mean   : 0.2908   Mean   :-0.02013   Mean   : 0.1588  
#>  3rd Qu.: 0.7156   3rd Qu.: 0.25424   3rd Qu.: 1.0034  
#>  Max.   : 1.8796   Max.   : 1.01316   Max.   : 3.7811  
#>  NA's   :8         NA's   :8