Bins a data.frame into quantile bins for variable x
in data
.
qbin(data, x = NULL, n = 100, min_bin_size = NULL, overlap = NULL, ...)
a data.frame
to be binned
character
variable name used for the quantile binning
integer
number of quantile bins.
integer
minimum number of rows/data points that should be
in a quantile bin. If NULL it is initially sqrt(nrow(data))
logical
if TRUE
the quantile bins will overlap. Default value will be
FALSE
.
reserved for future use
a qbin
object with:
$x the variable name used for binning
$bin a vector of bin numbers
$n the number of bins
$num_cols a vector of numeric column names
$cat_cols a vector of categorical column names
$data a list of data.tables with the collected information
Each numeric variable in the data.frame is binned into n
quantile bins, for
which the fivenum()
and mean()
is calculated.
When n/nrow(data)
is less than min_bin_size
, qbin
gives a warning and
n
is adjusted to nrow(data)/min_bin_size
.
Each categorical variable is binned into n
quantile bins, for which the
level frequency is calculated.