Do a data diff

Find differences with a reference data set. The diff can be used to patch_data, to store the difference for documentation purposes using write_diff or to visualize the difference using render_diff

diff_data(
  data_ref,
  data,
  always_show_header = TRUE,
  always_show_order = FALSE,
  columns_to_ignore = c(),
  count_like_a_spreadsheet = TRUE,
  ids = c(),
  ignore_whitespace = FALSE,
  never_show_order = FALSE,
  ordered = TRUE,
  padding_strategy = c("auto", "smart", "dense", "sparse"),
  show_meta = TRUE,
  show_unchanged = FALSE,
  show_unchanged_columns = FALSE,
  show_unchanged_meta = FALSE,
  unchanged_column_context = 1L,
  unchanged_context = 1L
)

Arguments

data_ref: data.frame reference data frame
data: data.frame to check for changes
always_show_header: logical Should we always give a table header in diffs? This defaults to TRUE, and - frankly - you should leave it at TRUE for now.
always_show_order: logical Diffs for tables where row/column order has been permuted may include an extra row/column specifying the changes in row/column numbers. If you'd like that extra row/column to always be included, turn on this flag, and turn off never_show_order.
columns_to_ignore: character List of columns to ignore in all calculations. Changes related to these columns are ignored.
count_like_a_spreadsheet: logical Should column numbers, if present, be rendered spreadsheet-style as A,B,C,...,AA,BB,CC? Defaults to TRUE.
ids: character List of columns that make up a primary key, if known. Otherwise heuristics are used to find a decent key (or a set of decent keys).
ignore_whitespace: logical Should whitespace be omitted from comparisons. Defaults to FALSE.
never_show_order: logical Diffs for tables where row/column order has been permuted may include an extra row/column specifying the changes in row/column numbers. If you'd like to be sure that that row/column is *never included, turn on this flag, and turn off always_show_order.
ordered: logical Is the order of rows and columns meaningful? Defaults to `TRUE`.
padding_strategy: logical Strategy to use when padding columns. Valid values are "auto", "smart", "dense", and "sparse". Leave null for a sensible default.
show_meta: logical Show changes in column properties, not just data, if available. Defaults to TRUE.
show_unchanged: logical Should we show all rows in diffs? We default to showing just rows that have changes (and some context rows around them, if row order is meaningful), but you can override this here.
show_unchanged_columns: logical Should we show all columns in diffs? We default to showing just columns that have changes (and some context columns around them, if column order is meaningful), but you can override this here. Irrespective of this flag, you can rely on index/key columns needed to identify rows to be included in the diff.
show_unchanged_meta: logical Show all column properties, if available, even if unchanged. Defaults to FALSE.
unchanged_column_context: integer When showing context columns around a changed column, what is the minimum number of such columns we should show?
unchanged_context: integer When showing context rows around a changed row, what is the minimum number of such rows we should show?

Value

difference object

Examples

library(daff)
x <- iris
x[1,1] <- 10
diff_data(x, iris)
#> Daff Comparison: ‘x’ vs. ‘iris’ 
#>     Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> ->  10->5.1      3.5         1.4          0.2         setosa 
#>     4.9          3           1.4          0.2         setosa 
#> ... ...          ...         ...          ...         ...    
#> 

dd <- diff_data(x, iris)
#write_diff(dd, "diff.csv")
summary(dd)
#> 
#> Data diff: ‘x’ vs. ‘iris’ 
#>           # Modified Reordered Deleted Added
#> Rows    150        1         0       0     0
#> Columns   5        1         0       0     0

Arguments

Value

See also

Examples