Allows you to treat imbalanced discrete numeric datasets by generating synthetic minority examples, approximating their probability distribution.

racog(dataset, numInstances, burnin = 100, lag = 20, classAttr = "Class")

Arguments

dataset

data.frame to treat. All columns, except classAttr one, have to be numeric or coercible to numeric.

numInstances

Integer. Number of new minority examples to generate.

burnin

Integer. It determines how many examples generated for a given one are going to be discarded firstly. By default, 100.

lag

Integer. Number of iterations between new generated example for a minority one. By default, 20.

classAttr

character. Indicates the class attribute from dataset. Must exist in it.

Value

A data.frame with the same structure as dataset, containing the generated synthetic examples.

Details

Approximates minority distribution using Gibbs Sampler. Dataset must be discretized and numeric. In each iteration, it builds a new sample using a Markov chain. It discards first burnin iterations, and from then on, each lag iterations, it validates the example as a new minority example. It generates \(d (iterations-burnin)/lag\) where \(d\) is minority examples number.

References

Das, Barnan; Krishnan, Narayanan C.; Cook, Diane J. Racog and Wracog: Two Probabilistic Oversampling Techniques. IEEE Transactions on Knowledge and Data Engineering 27(2015), Nr. 1, p. 222–234.

Examples

data(iris0) # Generates new minority examples newSamples <- racog(iris0, numInstances = 40, burnin = 20, lag = 10, classAttr = "Class") # \donttest{ newSamples <- racog(iris0, numInstances = 100) # }