WBM STATS Solutions

There are four qualitatively well-defined types of missing data. They are,

structurally missing
missing completely at random
missing at random
nonignorable missing

Structurally missing

Missing due to a logical reason is called structurally missing. The data is missing as it does not exist. In Table 1, variables, such as cooking_fuel and no._of_cigarretes, have missing values. Respondents who do not have a kitchen will not respond to what cooking fuel they use. In such cases, we can remove these cases and continue with our analysis. Similarly, a person who is not smoking will not answer the question of how many cigarettes he or she smokes daily. But in this case, one can assign a value of '0' for those who do not smoke and proceed with the analysis.

Table 1: Example of structurally missing

Missing completely at random (MCaR)

Looking at Table 2, one asked what the possible income of the third and fourth respondents could be. The easiest way to answer this question is to assume that 50% of the respondents have high incomes and the remaining 50% have low incomes, stratified by gender. Therefore, the female respondents will have high incomes, and the male respondents will have low incomes. This is known as assuming the missing values as missing completely at random. When we make this assumption, we assume that whether or not the person has missing values is entirely unrelated to the other information in the data.

Identifying an MCaR is relatively simple. If the other variables can predict the missing values in the data, then it is not an MCaR. MCaR can be formally tested by using Little's Test.

An MCaR means we can proceed with our analysis ignoring those missing cases, provided we have enough sample size. MCaR is possible only when the missing values in truly due to a random phenomenon.

Missing at random (MaR)

In the case of MaR, we assume that we can predict the missing values with the help of other variables in the data. Looking at Table 2, a simple predictive model is to predict the income using asset_index, age and gender or predict income using asset_index alone. Note that the idea of prediction does not mean we can perfectly predict a relationship. All that is required is a probabilistic relationship.
Table 2: Example of missing completely at random and missing at random

When we have MaR, we can use an advanced imputation method, like multiple imputations, to impute the missing values. Or we can also use analytical methods specifically designed for handling MaR.

Notably, any analysis valid for MCaR will be applicable and valid for MaR. Whereas the reverse is not true.

Nonignorable missing

It is also known as missing, not at random. This occurs when we cannot confidently conclude why the data is missing, or the respondents refuse to answer specific questions. We cannot use any of the standard methods for handling missing values in data if it is nonignorable missing. See Tang & Ju (2018) for more information on handling nonignorable missing data.

Monday, 10 April 2023

Different types of missing data

Structurally missing

Missing completely at random (MCaR)

Missing at random (MaR)

Nonignorable missing

Labels

Contents