Package that integrates preprocessing algorithms for oversampling, instance/feature selection, normalization, discretization, space transformation, and outliers/missing values/noise cleaning.

Installation

You can install the latest smartdata stable release from CRAN with:

and load it into an R session with:

library("smartdata")

Examples

smartdata provides the following wrappers:

  • instance_selection
  • feature_selection
  • normalize
  • discretize
  • space_transformation
  • clean_outliers
  • impute_missing
  • clean_noise

To get the possible methods available for a certain wrapper, we can do:

To get information about the parameters available for a method:

First let’s load a bunch of datasets:

data(iris0,  package = "imbalance")
data(ecoli1, package = "imbalance")
data(nhanes, package = "mice")

Oversampling

super_iris <- iris0 %>% oversample(method = "MWMOTE", ratio = 0.8, filtering = TRUE)

Instance selection

Feature selection

super_ecoli <- ecoli1 %>% feature_selection("Boruta", class_attr = "Class")

Normalization

super_iris <- iris %>% normalize("min_max", exclude = c("Sepal.Length", "Species"))

Discretization

super_iris <- iris %>% discretize("ameva", class_attr = "Species")

Space transformation

super_ecoli <- ecoli1 %>% space_transformation("lle_knn", k = 3, num_features = 2)

Outliers

super_iris <- iris %>% clean_outliers("multivariate", type = "adj")

Missing values