Modification for SMOTE technique which overcomes some of the problems of the SMOTE technique when there are noisy instances, in which case SMOTE would generate more noisy instances out of them.

mwmote(
  dataset,
  numInstances,
  kNoisy = 5,
  kMajority = 3,
  kMinority,
  threshold = 5,
  cmax = 2,
  cclustering = 3,
  classAttr = "Class"
)

Arguments

dataset

data.frame to treat. All columns, except classAttr one, have to be numeric or coercible to numeric.

numInstances

Integer. Number of new minority examples to generate.

kNoisy

Integer. Parameter of euclidean KNN to detect noisy examples as those whose whole kNoisy-neighbourhood is from the opposite class.

kMajority

Integer. Parameter of euclidean KNN to detect majority borderline examples as those who are in any kMajority-neighbourhood of minority instances. Should be a low integer.

kMinority

Integer. Parameter of euclidean KNN to detect minority borderline examples as those who are in the KMinority-neighbourhood of majority borderline ones. It should be a large integer. By default if not parameter is fed to the function, \(|S^{+}|/2\) where \(S^{+}\) is the set of minority examples.

threshold

Numeric. A positive real indicating how much we measure tolerance of closeness to the boundary of minority boundary examples. A large integer indicates more margin of distance for a example to be considerated important boundary one.

cmax

Numeric. A positive real indicating how much we measure tolerance of closeness to the boundary of minority boundary examples. The larger this number, the more we are valuing boundary examples.

cclustering

Numeric. A positive real for tuning the output of an internal clustering. The larger this parameter, the more area focused is going to be the oversampling.

classAttr

character. Indicates the class attribute from dataset. Must exist in it.

Value

A data.frame with the same structure as dataset, containing the generated synthetic examples.

References

Barua, Sukarna; Islam, Md.M.; Yao, Xin; Murase, Kazuyuki. Mwmote–majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning. IEEE Transactions on Knowledge and Data Engineering 26 (2014), Nr. 2, p. 405–425

Examples

data(iris0) # Generates new minority examples newSamples <- mwmote(iris0, numInstances = 100, classAttr = "Class")