Abstract
One of the significant purposes of building a model is to increase its accuracy within a shorter timeframe through the feature selection process. It is carried out by determining the importance of available features in a dataset using Information Gain (IG). The process is used to calculate the amounts of information contained in features with high values selected to accelerate the performance of an algorithm. In selecting informative features, a threshold value (cut-off) is used by the Information Gain (IG). Therefore, this research aims to determine the time and accuracy-performance needed to improve feature selection by integrating IG, the Fast Fourier Transform (FFT), and Synthetic Minor Oversampling Technique (SMOTE) methods. The feature selection model is then applied to the Random Forest, a tree-based machine learning algorithm with random feature selection. A total of eight datasets consisting of three balanced and five imbalanced datasets were used to conduct this research. Furthermore, the SMOTE found in the imbalance dataset was used to balance the data. The result showed that the feature selection using Information Gain, FFT, and SMOTE improved the performance accuracy of Random Forest.
Collaboration
SDGs



Categories
ICT BASED1
Publication Group
Tanggal Publikasi
14/07/2022
Tahun
2022
DOI/ISSN Jurnal/Link
10.7717/peerj-cs.1041
Authors
Maria Irmina Prasetiyowati
Surendro
Author Affiliations
Ronaldo Ismael
Maria Irmina Prasetiyowati
Source Title
PeerJ Computer Science
Fields of Research (ANZSRC 2020)
46 INFORMATION AND COMPUTING SCIENCES
4605 Data management and data science