Hot deck imputation — hotdeck.impute • describe

Hot deck imputation (HDI) is a univariate imputation technique where, for each respondent or recipient with a missing value, we find a donor with similar values across a subset of categorical or numerical predictors and use it to fill the recipient's missing observation. For this reason, HDI has been used with stratification across categorical variables.

The current implementation of HDI allows the user to choose one of four selection methods: deterministic, random sampling from all possible donors, from k-nearest neighbors and random sampling using weights as probabilities. The function will iteratively impute missing values across all variables with missing observations using the selection method specified in the function arguments.

Usage

hotdeck.impute(
  data,
  method = "deterministic",
  k = NULL,
  seed = NULL,
  na.rm = TRUE
)

Arguments

data

a matrix or data frame containing missing values in at least one predictor

method

selection method for imputing missing values based on donor similarity. Can be one of:

"deterministic": Select the same donor value for multiple repetitions of the CDI.
"rand_from_all": Select a different donor value for each repetition of the CDI.
"rand_nearest_k": Select one random donor value from a subset of k nearest neighbors for each repetition of the CDI.
"weighted_rand": Select one random donor through a probability-weighted choice for each repetition of the CDI.

k

number of nearest neighbors to select from when using the rand_nearest_k method.

seed

a numeric seed for reproducible results for every method except deterministic selection

na.rm

indicates removal of NA values from every row in the matrix or data frame

Value

a matrix or data frame of imputed values

Examples

data <- gen.mcar(100,rho = c(.56,.23,.18),sigma = c(1,2,.5),n_vars = 3,na_prob = .18)
hot_data <- hotdeck.impute(data)
summary(hot_data)
#>        V1                V2                V3          
#>  Min.   :-2.8766   Min.   :-4.8334   Min.   :-1.13091  
#>  1st Qu.:-1.0105   1st Qu.:-1.3556   1st Qu.:-0.26354  
#>  Median :-0.1079   Median :-0.5331   Median : 0.02834  
#>  Mean   :-0.1892   Mean   :-0.3364   Mean   : 0.01080  
#>  3rd Qu.: 0.4801   3rd Qu.: 0.5525   3rd Qu.: 0.30642  
#>  Max.   : 2.3067   Max.   : 5.1222   Max.   : 0.96153