A new data augmentation method to use in machine learning algorithms using statistical measurements
Özet
Cancer disease is among the leading causes of death in the world today. It has been scientifically proven that if the disease is detected in the first stages, the success rate in treatment can be close to 100%. Accordingly, it can be said that the problems caused by breast cancer can be solved to a great extent thanks to early diagnosis. Data sets that can be processed in early diagnosis are required. Researchers use artificial intelligence techniques to develop systems to assist specialist doctors. It is of great importance to have a data set on which researchers can work. The more parameters of these data sets and the number of these parameters, the more artificial learning process takes place. In this study, both a new data augmentation method for two different classes in the database of breast cancer was presented and breast cancer was diagnosed with this method. First, the database was analyzed statistically with the proposed data augmentation method. Then, the new values obtained from the data augmentation process were added to the database with a 5-times and 10-times augmentation. Experimental studies were conducted between raw database and databases with augmentatied data. The proposed model with 5 different Machine Learning Algorithms (MLA) (k Nearest Neighbors (k-NN), Decision Tree (DT), Naive Bayes (NB), Random Forest (RF)) and Support Vector Machine (SVM)) were tested. The accuracy rates in train and test operations were increased in the following order; for k-NN, same − 15%, for DT, 0.1–13.01%, for RF, same − 14.03%, for NB, 14.7%−19.7%, for SVM, 15.9%−14.06%. The rates obtained from the experimental results proved to be very promising. The proposed method can be used as an alternative data augmentation method for researchers to get more accurate results in their studies.
Kaynak
Measurement: Journal of the International Measurement ConfederationCilt
180Sayı
-Bağlantı
https:/dx.doi.org/10.1016/j.measurement.2021.109577https://hdl.handle.net/20.500.12451/8246