Find differences with a reference data set. The diff can be used to patch_data
, to store the difference
for documentation purposes using write_diff
or to visualize the difference using render_diff
diff_data(
data_ref,
data,
always_show_header = TRUE,
always_show_order = FALSE,
columns_to_ignore = c(),
count_like_a_spreadsheet = TRUE,
ids = c(),
ignore_whitespace = FALSE,
never_show_order = FALSE,
ordered = TRUE,
padding_strategy = c("auto", "smart", "dense", "sparse"),
show_meta = TRUE,
show_unchanged = FALSE,
show_unchanged_columns = FALSE,
show_unchanged_meta = FALSE,
unchanged_column_context = 1L,
unchanged_context = 1L
)
data.frame
reference data frame
data.frame
to check for changes
logical
Should we always give a table header in diffs? This defaults
to TRUE, and - frankly - you should leave it at TRUE for now.
logical
Diffs for tables where row/column order has been permuted may include
an extra row/column specifying the changes in row/column numbers.
If you'd like that extra row/column to always be included,
turn on this flag, and turn off never_show_order.
character
List of columns to ignore in all calculations. Changes
related to these columns are ignored.
logical
Should column numbers, if present, be rendered spreadsheet-style
as A,B,C,...,AA,BB,CC? Defaults to TRUE.
character
List of columns that make up a primary key, if known. Otherwise
heuristics are used to find a decent key (or a set of decent keys).
logical
Should whitespace be omitted from comparisons. Defaults to FALSE.
logical
Diffs for tables where row/column order has been permuted may include
an extra row/column specifying the changes in row/column numbers.
If you'd like to be sure that that row/column is *never
included, turn on this flag, and turn off always_show_order.
logical
Is the order of rows and columns meaningful? Defaults to `TRUE`.
logical
Strategy to use when padding columns. Valid values are "auto",
"smart", "dense", and "sparse". Leave null for a sensible default.
logical
Show changes in column properties, not just data, if available. Defaults to TRUE.
logical
Should we show all rows in diffs? We default to showing
just rows that have changes (and some context rows around
them, if row order is meaningful), but you can override
this here.
logical
Should we show all columns in diffs? We default to showing
just columns that have changes (and some context columns around
them, if column order is meaningful), but you can override
this here. Irrespective of this flag, you can rely
on index/key columns needed to identify rows to be included
in the diff.
logical
Show all column properties, if available, even if unchanged.
Defaults to FALSE.
integer
When showing context columns around a changed column, what
is the minimum number of such columns we should show?
integer
When showing context rows around a changed row, what
is the minimum number of such rows we should show?
difference object
differs_from
library(daff)
x <- iris
x[1,1] <- 10
diff_data(x, iris)
#> Daff Comparison: ‘x’ vs. ‘iris’
#> Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#> -> 10->5.1 3.5 1.4 0.2 setosa
#> 4.9 3 1.4 0.2 setosa
#> ... ... ... ... ... ...
#>
dd <- diff_data(x, iris)
#write_diff(dd, "diff.csv")
summary(dd)
#>
#> Data diff: ‘x’ vs. ‘iris’
#> # Modified Reordered Deleted Added
#> Rows 150 1 0 0 0
#> Columns 5 1 0 0 0