Outliers cleaning wrapper

clean_outliers(dataset, method, ...)

Arguments

dataset

we want to clean outliers of

method

selected method to clean outliers. Possibilities are:

  • "univariate" detects outliers column by column (an outlier will be an abnormal value inside a column) and fills them with mean or median of the corresponding column

  • "multivariate" detects outliers using a multicolumn approach, so that an outlier will be a whole observation (row). And deletes those observations

...

further arguments for the method

Value

The treated dataset (either with outliers replaced or erased)

Examples

library("smartdata") super_iris <- clean_outliers(iris, method = "multivariate", type = "adj")
#> Registered S3 methods overwritten by 'car': #> method from #> influence.merMod lme4 #> cooks.distance.influence.merMod lme4 #> dfbeta.influence.merMod lme4 #> dfbetas.influence.merMod lme4
#> Registered S3 method overwritten by 'GGally': #> method from #> +.gg ggplot2
#> sROC 0.1-2 loaded
super_iris <- clean_outliers(iris, method = "multivariate", type = "quan") # Use mean as method to substitute outliers super_iris <- clean_outliers(iris, method = "univariate", type = "z", prob = 0.9, fill = "mean") # Use median as method to substitute outliers super_iris <- clean_outliers(iris, method = "univariate", type = "z", prob = 0.9, fill = "median") # Use chi-sq instead of z p-values super_iris <- clean_outliers(iris, method = "univariate", type = "chisq", prob = 0.9, fill = "median") # Use interquartilic range instead (lim argument is mandatory when using it) super_iris <- clean_outliers(iris, method = "univariate", type = "iqr", lim = 0.9, fill = "median")