Filters oversampled examples from a binary class dataset
using game
theory to find out if keeping an example is worthy enough.
neater( dataset, newSamples, k = 3, iterations = 100, smoothFactor = 1, classAttr = "Class" )
dataset | The original |
---|---|
newSamples | A |
k | Integer. Number of nearest neighbours to use in KNN algorithm to rule out samples. By default, 3. |
iterations | Integer. Number of iterations for the algorithm. By default, 100. |
smoothFactor | A positive |
classAttr |
|
Filtered samples as a data.frame
with same structure as
newSamples
.
Uses game theory and Nash equilibriums to calculate the minority examples probability of trully belonging to the minority class. It discards examples which at the final stage of the algorithm have more probability of being a majority example than a minority one.
Almogahed, B.A.; Kakadiaris, I.A. Neater: Filtering of Over-Sampled Data Using Non-Cooperative Game Theory. Soft Computing 19 (2014), Nr. 11, p. 3301–3322.
data(iris0) newSamples <- smotefamily::SMOTE(iris0[,-5], iris0[,5])$syn_data # SMOTE overrides Class attr turning it into class # and dataset must have same class attribute as newSamples names(newSamples) <- c(names(newSamples)[-5], "Class") neater(iris0, newSamples, k = 5, iterations = 100, smoothFactor = 1, classAttr = "Class")#> [1] "0 samples filtered by NEATER"#> SepalLength SepalWidth PetalLength PetalWidth Class #> 1 5.248315 3.725843 1.500000 0.2258426 positive #> 2 4.903154 3.000000 1.406307 0.2000000 positive #> 3 5.079625 3.459250 1.400000 0.2000000 positive #> 4 4.600000 3.423051 1.353897 0.2884743 positive #> 5 5.072576 3.454847 1.472576 0.2000000 positive #> 6 5.100000 3.752343 1.552343 0.2953143 positive #> 7 5.100000 3.309857 1.680287 0.4704301 positive #> 8 4.889947 3.400000 1.765080 0.2899468 positive #> 9 5.064951 3.629901 1.535049 0.4700988 positive #> 10 5.711875 4.058750 1.258750 0.2000000 positive #> 11 5.400000 3.900000 1.535231 0.4000000 positive #> 12 4.825951 3.100000 1.574049 0.1740491 positive #> 13 4.900000 3.100000 1.500000 0.1000000 positive #> 14 5.198913 3.500000 1.498913 0.2010872 positive #> 15 5.100000 3.800000 1.573616 0.2263842 positive #> 16 4.659512 3.400000 1.459512 0.2702440 positive #> 17 5.076126 3.400000 1.500000 0.2000000 positive #> 18 5.419602 3.580398 1.380398 0.2000000 positive #> 19 5.200000 3.475025 1.475025 0.2000000 positive #> 20 4.900000 3.072419 1.472419 0.1275805 positive #> 21 5.400000 3.791234 1.591234 0.2912338 positive #> 22 4.364505 3.000000 1.229011 0.1645055 positive #> 23 4.372902 2.927098 1.318707 0.1729022 positive #> 24 4.800000 3.000000 1.400000 0.2703486 positive #> 25 5.585683 3.838106 1.700000 0.3381057 positive #> 26 4.792180 3.107820 1.600000 0.2000000 positive #> 27 4.900000 3.019294 1.419294 0.1807057 positive #> 28 4.800000 3.396018 1.600000 0.2000000 positive #> 29 5.060846 3.360846 1.460846 0.2000000 positive #> 30 4.435322 2.688069 1.364678 0.2353219 positive #> 31 5.149647 3.400000 1.533098 0.2000000 positive #> 32 5.220429 3.739786 1.500000 0.2397855 positive #> 33 5.111363 3.834088 1.500000 0.2772748 positive #> 34 5.100000 3.460622 1.439378 0.2606222 positive #> 35 5.000000 3.524249 1.324249 0.2757507 positive #> 36 5.386332 3.406834 1.500000 0.3863317 positive #> 37 4.800000 3.000000 1.400000 0.1013764 positive #> 38 5.285373 4.128458 1.471542 0.1284577 positive #> 39 5.373330 3.673330 1.500000 0.2000000 positive #> 40 5.189412 3.500000 1.489412 0.2000000 positive #> 41 4.625698 3.200000 1.374302 0.2000000 positive #> 42 5.100000 3.800000 1.723400 0.3558501 positive #> 43 4.689869 3.189869 1.320263 0.2000000 positive #> 44 5.000000 3.465370 1.600000 0.5307399 positive #> 45 5.700000 4.384025 1.505325 0.3973375 positive #> 46 4.770835 3.100000 1.585417 0.2000000 positive #> 47 5.000000 3.220583 1.241166 0.2000000 positive #> 48 4.900000 3.073814 1.473814 0.1261857 positive #> 49 4.356085 3.000000 1.212170 0.1560850 positive #> 50 4.400000 3.184334 1.300000 0.2000000 positive