Bins a data.frame into quantile bins for variable x in data.

qbin(data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, ...)

Arguments

data

a data.frame to be binned

x

character variable name used for the quantile binning

n

integer number of quantile bins.

min_bin_size

integer minimum number of rows/data points that should be in a quantile bin. If NULL it is initially sqrt(nrow(data))

overlap

logical if TRUE the quantile bins will overlap. Default value will be FALSE.

...

reserved for future use

Value

a qbin object with:

  • $x the variable name used for binning

  • $bin a vector of bin numbers

  • $n the number of bins

  • $num_cols a vector of numeric column names

  • $cat_cols a vector of categorical column names

  • $data a list of data.tables with the collected information

Details

Each numeric variable in the data.frame is binned into n quantile bins, for which the fivenum() and mean() is calculated.

When n/nrow(data) is less than min_bin_size, qbin gives a warning and n is adjusted to nrow(data)/min_bin_size. Each categorical variable is binned into n quantile bins, for which the level frequency is calculated.