Generates synthetic minority examples by approximating their probability distribution until sensitivity of wrapper over validation cannot be further improved. Works only on discrete numeric datasets.

wracog(
  train,
  validation,
  wrapper,
  slideWin = 10,
  threshold = 0.02,
  classAttr = "Class",
  ...
)

Arguments

train

data.frame. A initial dataset to generate first model. All columns, except classAttr one, have to be numeric or coercible to numeric.

validation

data.frame. A dataset to compare results of consecutive classifiers. Must have the same structure of train.

wrapper

An S3 object. There must exist a method trainWrapper implemented for the class of the object, and a predict method implemented for the class of the model returned by trainWrapper. Alternatively, it can the name of one of the wrappers distributed with the package, "KNN" or "C5.0".

slideWin

Number of last sensitivities to take into account to meet the stopping criteria. By default, 10.

threshold

Threshold that the last slideWin sensitivities mean should reach. By default, 0.02.

classAttr

character. Indicates the class attribute from train and validation. Must exist in them.

...

further arguments for wrapper.

Value

A data.frame with the same structure as train, containing the generated synthetic examples.

Details

Until the last slideWin executions of wrapper over validation dataset reach a mean sensitivity lower than threshold, the algorithm keeps generating samples using Gibbs Sampler, and adding misclassified samples with respect to a model generated by a former train, to the train dataset. Initial model is built on initial train.

References

Das, Barnan; Krishnan, Narayanan C.; Cook, Diane J. Racog and Wracog: Two Probabilistic Oversampling Techniques. IEEE Transactions on Knowledge and Data Engineering 27(2015), Nr. 1, p. 222–234.

Examples

data(haberman) # Create train and validation partitions of haberman trainFold <- sample(1:nrow(haberman), nrow(haberman)/2, FALSE) trainSet <- haberman[trainFold, ] validationSet <- haberman[-trainFold, ] # Defines our own wrapper with a C5.0 tree myWrapper <- structure(list(), class="TestWrapper") trainWrapper.TestWrapper <- function(wrapper, train, trainClass){ C50::C5.0(train, trainClass) } # Execute wRACOG with our own wrapper newSamples <- wracog(trainSet, validationSet, myWrapper, classAttr = "Class")
#> Error in UseMethod("trainWrapper"): no applicable method for 'trainWrapper' applied to an object of class "TestWrapper"
# Execute wRACOG with predifined wrappers for "KNN" or "C5.0" KNNSamples <- wracog(trainSet, validationSet, "KNN") C50Samples <- wracog(trainSet, validationSet, "C5.0")