2025 : 4 : 22

NargesSadat Bathaeian

Academic rank: Instructor
ORCID:
Education: MSc.
ScopusId:
HIndex:
Faculty: Faculty of Engineering
Address:
Phone:

Research

Title
Using imputation algorithms when missing values appear in the test data in contrast with the training data
Type
JournalPaper
Keywords
missing values; imputation algorithms; regression; kNN; MICE; random forest; tree; EM
Year
2018
Journal International Journal of Data Analysis Techniques and Strategies
DOI
Researchers NargesSadat Bathaeian

Abstract

Abstract: Real datasets suffer from the problem of missing data. Imputation is a common solution for this problem. Most of research works perform imputation algorithms to training data. Therefore, the output variable of samples might influence the imputation model. This paper aims to compare different imputation algorithms when they are applied to test data and training data. In this research, first, the relations between output variable and different imputation algorithms are investigated. Then six different types of imputation algorithms are applied to both training data and test data. Chosen datasets are globally available, and cover both classification and regression tasks. Also missing values are injected artificially to them. The results showed that performance of all algorithms will reduce in the case of elimination of output variable. Particularly, decline in algorithm, which uses k nearest neighbour for imputation in the classification datasets is not ignorable. Nevertheless, algorithms that are based on random forests have less decline and show better results compared with other five types of algorithms.