Generates synthetic minority examples for a numerical dataset approximating a Gaussian multivariate distribution which best fits the minority data.

pdfos(dataset, numInstances, classAttr = "Class")

Arguments

dataset

data.frame to treat. All columns, except classAttr one, have to be numeric or coercible to numeric.

numInstances

Integer. Number of new minority examples to generate.

classAttr

character. Indicates the class attribute from dataset. Must exist in it.

Value

A data.frame with the same structure as dataset, containing the generated synthetic examples.

Details

To generate the synthetic data, it approximates a normal distribution with mean a given example belonging to the minority class, and whose variance is the minority class variance multiplied by a constant; that constant is computed so that it minimizes the mean integrated squared error of a Gaussian multivariate kernel function.

References

Gao, Ming; Hong, Xia; Chen, Sheng; Harris, Chris J.; Khalaf, Emad. Pdfos: Pdf Estimation Based Oversampling for Imbalanced Two-Class Problems. Neurocomputing 138 (2014), p. 248–259

Silverman, B. W. Density Estimation for Statistics and Data Analysis. Chapman & Hall, 1986. – ISBN 0412246201

Examples

data(iris0) newSamples <- pdfos(iris0, numInstances = 100, classAttr = "Class")