Stochastic regression imputation (SRI) — stoc.impute • describe

This method corrects the lack of variability in conditional mean imputation (CMI) by adding an error term to the conditional mean calculation. SRI is more effective than CMI in reducing bias in the imputed values and works well with MCAR and MAR data.

Usage

stoc.impute(
  data,
  family = "AUTO",
  tol = NULL,
  robust = FALSE,
  char_to_factor = FALSE,
  verbose = FALSE
)

Arguments

data: a numeric matrix or data frame of at least 2 columns.
family: the distribution family of your observations. The family arguments defaults to 'AUTO'; and it will automatically select a distribution family (gaussian, binomial, multinomial) based on the type of variable (numeric or factor). The distribution family dictates the regression model used (lm,glm, multinom). However, the user can change the family argument to match his response variable distribution and the function will adapt to this input by using the generalized linear model or beta regression.
tol: tolerance,a numeric vector of length 1 used as multiplicative factor to standard deviation for generalized linear models. As the sample size increases, the tolerance value should be decreased to represent the decreasing variability of the sample estimate.
robust: logical indicated whether to use robust estimation methods or ignore them. If set to 'TRUE', the function will make use of robust linear and generalized linear models to make its prediction.
char_to_factor: transform character variable to unordered factor variable
verbose: verbose error handling

Value

a matrix or data frame containing the imputed dataset.

Examples

data <- data.frame(x1 = c(stats::rnorm(87),rep(NA,13)),
x2 = stats::rnorm(100),y = stats::rnorm(100))
sri_data <- stoc.impute(data,tol = 1e-3)
summary(sri_data)
#>        x1                  x2                y           
#>  Min.   :-2.809775   Min.   :-2.6017   Min.   :-2.21063  
#>  1st Qu.:-0.557526   1st Qu.:-0.8379   1st Qu.:-0.51026  
#>  Median : 0.075834   Median :-0.1039   Median : 0.19840  
#>  Mean   : 0.007388   Mean   :-0.1360   Mean   : 0.07453  
#>  3rd Qu.: 0.583927   3rd Qu.: 0.6140   3rd Qu.: 0.72072  
#>  Max.   : 2.430227   Max.   : 2.0867   Max.   : 2.69171