Find differences with a reference data set. The diff can be used to patch_data, to store the difference for documentation purposes using write_diff or to visualize the difference using render_diff

diff_data(
  data_ref,
  data,
  always_show_header = TRUE,
  always_show_order = FALSE,
  columns_to_ignore = c(),
  count_like_a_spreadsheet = TRUE,
  ids = c(),
  ignore_whitespace = FALSE,
  never_show_order = FALSE,
  ordered = TRUE,
  padding_strategy = c("auto", "smart", "dense", "sparse"),
  show_meta = TRUE,
  show_unchanged = FALSE,
  show_unchanged_columns = FALSE,
  show_unchanged_meta = FALSE,
  unchanged_column_context = 1L,
  unchanged_context = 1L
)

Arguments

data_ref

data.frame reference data frame

data

data.frame to check for changes

always_show_header

logical Should we always give a table header in diffs? This defaults to TRUE, and - frankly - you should leave it at TRUE for now.

always_show_order

logical Diffs for tables where row/column order has been permuted may include an extra row/column specifying the changes in row/column numbers. If you'd like that extra row/column to always be included, turn on this flag, and turn off never_show_order.

columns_to_ignore

character List of columns to ignore in all calculations. Changes related to these columns are ignored.

count_like_a_spreadsheet

logical Should column numbers, if present, be rendered spreadsheet-style as A,B,C,...,AA,BB,CC? Defaults to TRUE.

ids

character List of columns that make up a primary key, if known. Otherwise heuristics are used to find a decent key (or a set of decent keys).

ignore_whitespace

logical Should whitespace be omitted from comparisons. Defaults to FALSE.

never_show_order

logical Diffs for tables where row/column order has been permuted may include an extra row/column specifying the changes in row/column numbers. If you'd like to be sure that that row/column is *never included, turn on this flag, and turn off always_show_order.

ordered

logical Is the order of rows and columns meaningful? Defaults to `TRUE`.

padding_strategy

logical Strategy to use when padding columns. Valid values are "auto", "smart", "dense", and "sparse". Leave null for a sensible default.

show_meta

logical Show changes in column properties, not just data, if available. Defaults to TRUE.

show_unchanged

logical Should we show all rows in diffs? We default to showing just rows that have changes (and some context rows around them, if row order is meaningful), but you can override this here.

show_unchanged_columns

logical Should we show all columns in diffs? We default to showing just columns that have changes (and some context columns around them, if column order is meaningful), but you can override this here. Irrespective of this flag, you can rely on index/key columns needed to identify rows to be included in the diff.

show_unchanged_meta

logical Show all column properties, if available, even if unchanged. Defaults to FALSE.

unchanged_column_context

integer When showing context columns around a changed column, what is the minimum number of such columns we should show?

unchanged_context

integer When showing context rows around a changed row, what is the minimum number of such rows we should show?

Value

difference object

See also

differs_from

Examples

library(daff)
x <- iris
x[1,1] <- 10
diff_data(x, iris)
#> Daff Comparison: ‘x’ vs. ‘iris’ 
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> ->  10->5.1      3.5         1.4          0.2         setosa 
#>     4.9          3           1.4          0.2         setosa 
#> ... ...          ...         ...          ...         ...    
#> 

dd <- diff_data(x, iris)
#write_diff(dd, "diff.csv")
summary(dd)
#> 
#> Data diff: ‘x’ vs. ‘iris’ 
#>           # Modified Reordered Deleted Added
#> Rows    150        1         0       0     0
#> Columns   5        1         0       0     0