Filters oversampled examples from a binary class dataset using game theory to find out if keeping an example is worthy enough.

neater(
  dataset,
  newSamples,
  k = 3,
  iterations = 100,
  smoothFactor = 1,
  classAttr = "Class"
)

Arguments

dataset

The original data.frame. All columns, except classAttr one, have to be numeric or coercible to numeric.

newSamples

A data.frame containing the samples to be filtered. Must have the same structure as dataset.

k

Integer. Number of nearest neighbours to use in KNN algorithm to rule out samples. By default, 3.

iterations

Integer. Number of iterations for the algorithm. By default, 100.

smoothFactor

A positive numeric. By default, 1.

classAttr

character. Indicates the class attribute from dataset and newSamples. Must exist in them.

Value

Filtered samples as a data.frame with same structure as newSamples.

Details

Uses game theory and Nash equilibriums to calculate the minority examples probability of trully belonging to the minority class. It discards examples which at the final stage of the algorithm have more probability of being a majority example than a minority one.

References

Almogahed, B.A.; Kakadiaris, I.A. Neater: Filtering of Over-Sampled Data Using Non-Cooperative Game Theory. Soft Computing 19 (2014), Nr. 11, p. 3301–3322.

Examples

data(iris0) newSamples <- smotefamily::SMOTE(iris0[,-5], iris0[,5])$syn_data # SMOTE overrides Class attr turning it into class # and dataset must have same class attribute as newSamples names(newSamples) <- c(names(newSamples)[-5], "Class") neater(iris0, newSamples, k = 5, iterations = 100, smoothFactor = 1, classAttr = "Class")
#> [1] "0 samples filtered by NEATER"
#> SepalLength SepalWidth PetalLength PetalWidth Class #> 1 5.248315 3.725843 1.500000 0.2258426 positive #> 2 4.903154 3.000000 1.406307 0.2000000 positive #> 3 5.079625 3.459250 1.400000 0.2000000 positive #> 4 4.600000 3.423051 1.353897 0.2884743 positive #> 5 5.072576 3.454847 1.472576 0.2000000 positive #> 6 5.100000 3.752343 1.552343 0.2953143 positive #> 7 5.100000 3.309857 1.680287 0.4704301 positive #> 8 4.889947 3.400000 1.765080 0.2899468 positive #> 9 5.064951 3.629901 1.535049 0.4700988 positive #> 10 5.711875 4.058750 1.258750 0.2000000 positive #> 11 5.400000 3.900000 1.535231 0.4000000 positive #> 12 4.825951 3.100000 1.574049 0.1740491 positive #> 13 4.900000 3.100000 1.500000 0.1000000 positive #> 14 5.198913 3.500000 1.498913 0.2010872 positive #> 15 5.100000 3.800000 1.573616 0.2263842 positive #> 16 4.659512 3.400000 1.459512 0.2702440 positive #> 17 5.076126 3.400000 1.500000 0.2000000 positive #> 18 5.419602 3.580398 1.380398 0.2000000 positive #> 19 5.200000 3.475025 1.475025 0.2000000 positive #> 20 4.900000 3.072419 1.472419 0.1275805 positive #> 21 5.400000 3.791234 1.591234 0.2912338 positive #> 22 4.364505 3.000000 1.229011 0.1645055 positive #> 23 4.372902 2.927098 1.318707 0.1729022 positive #> 24 4.800000 3.000000 1.400000 0.2703486 positive #> 25 5.585683 3.838106 1.700000 0.3381057 positive #> 26 4.792180 3.107820 1.600000 0.2000000 positive #> 27 4.900000 3.019294 1.419294 0.1807057 positive #> 28 4.800000 3.396018 1.600000 0.2000000 positive #> 29 5.060846 3.360846 1.460846 0.2000000 positive #> 30 4.435322 2.688069 1.364678 0.2353219 positive #> 31 5.149647 3.400000 1.533098 0.2000000 positive #> 32 5.220429 3.739786 1.500000 0.2397855 positive #> 33 5.111363 3.834088 1.500000 0.2772748 positive #> 34 5.100000 3.460622 1.439378 0.2606222 positive #> 35 5.000000 3.524249 1.324249 0.2757507 positive #> 36 5.386332 3.406834 1.500000 0.3863317 positive #> 37 4.800000 3.000000 1.400000 0.1013764 positive #> 38 5.285373 4.128458 1.471542 0.1284577 positive #> 39 5.373330 3.673330 1.500000 0.2000000 positive #> 40 5.189412 3.500000 1.489412 0.2000000 positive #> 41 4.625698 3.200000 1.374302 0.2000000 positive #> 42 5.100000 3.800000 1.723400 0.3558501 positive #> 43 4.689869 3.189869 1.320263 0.2000000 positive #> 44 5.000000 3.465370 1.600000 0.5307399 positive #> 45 5.700000 4.384025 1.505325 0.3973375 positive #> 46 4.770835 3.100000 1.585417 0.2000000 positive #> 47 5.000000 3.220583 1.241166 0.2000000 positive #> 48 4.900000 3.073814 1.473814 0.1261857 positive #> 49 4.356085 3.000000 1.212170 0.1560850 positive #> 50 4.400000 3.184334 1.300000 0.2000000 positive