Get data from Statistics Netherlands (CBS)

Retrieves data from a table of Statistics Netherlands. A list of available tables can be retrieved with cbs_get_datasets(). Use the Identifier column of cbs_get_datssets as id in cbs_get_data and cbs_get_meta.

cbs_get_data(
  id,
  ...,
  catalog = "CBS",
  select = NULL,
  typed = TRUE,
  add_column_labels = TRUE,
  dir = tempdir(),
  verbose = FALSE,
  base_url = getOption("cbsodataR.base_url", BASE_URL),
  include_ID = FALSE
)

Arguments

id: Identifier of table, can be found in cbs_get_datasets()
...: optional filter statements, see details.
catalog: catalog id, can be retrieved with cbs_get_datasets() (set catalog=NULL to see all catalogs)
select: character optional, columns to select
typed: Should the data automatically be converted into integer and numeric?
add_column_labels: Should column titles be added as a label (TRUE) which are visible in View
dir: Directory where the table should be downloaded. Defaults to temporary directory
verbose: Print extra messages what is happening.
base_url: optionally specify a different server. Useful for third party data services implementing the same protocol, see details.
include_ID: Should the data include the ID column for the rows?

Value

data.frame with the requested data. Note that a csv copy of the data is stored in dir.

Details

To reduce the download time, optionaly the data can be filtered on category values: for large tables (> 100k records) this is a wise thing to do.

The filter is specified with (see examples below):

<column_name> = <values> in which <values> is a character vector. Rows with values that are not part of the character vector are not returned. Note that the values have to be values from the $Key column of the corresponding meta data. These may contain trailing spaces...
<column_name> = has_substring(x) in which x is a character vector. Rows with values that do not have a substring that is in x are not returned. Useful substrings are "JJ", "KW", "MM" for Periods (years, quarters, months) and "PV", "CR" and "GM" for Regions (provinces, corops, municipalities).
<column_name> = eq(<values>) | has_substring(x), which combines the two statements above.

By default the columns will be converted to their type (typed=TRUE). CBS uses multiple types of missing (unknown, surpressed, not measured, missing): users wanting all these nuances can use typed=FALSE which results in character columns.

Note

All data are downloaded using cbs_download_table()

Specify different server

Besides the official CBS data, there are also third party and preview dataservices implementing the same protocol. The base_url parameter allows to specify a different server. The base_url can either be specified explicitly or set globally with with options(cbsodataR.base_url = "http://example.com"). Some further tweaking may be necessary for third party services, a download url is constructed using: either with:

<base_url>/<BULK>/<id>/... for data
<base_url>/<API>/<id>/?$format=json for metadata

Default values for BASEURL, BULK and API are set in the package options, but can be changed with:

options(
 cbsodataR.base_url = "https://opendata.cbs.nl",
 cbsodataR.BULK = "ODataFeed/odata",
 cbsodataR.API = "ODataAPI/odata"
)

which are the default values set in the package.

Copyright use

The content of CBS opendata is subject to Creative Commons Attribution (CC BY 4.0). This means that the re-use of the content is permitted, provided Statistics Netherlands is cited as the source. For more information see: https://www.cbs.nl/en-gb/about-us/website/copyright

Examples

if (FALSE) { # \dontrun{
cbs_get_data( id      = "7196ENG"      # table id
            , Periods = "2000MM03"     # March 2000
            , CPI     = "000000"       # Category code for total 
            )

# useful substrings:
## Periods: "JJ": years, "KW": quarters, "MM", months
## Regions: "NL", "PV": provinces, "GM": municipalities
  
cbs_get_data( id      = "7196ENG"      # table id
            , Periods = has_substring("JJ")     # all years
            , CPI     = "000000"       # Category code for total 
            )

cbs_get_data( id      = "7196ENG"      # table id
            , Periods = c("2000MM03","2001MM12")     # March 2000 and Dec 2001
            , CPI     = "000000"       # Category code for total 
            )

# combine either this
cbs_get_data( id      = "7196ENG"      # table id
            , Periods = has_substring("JJ") | "2000MM01" # all years and Jan 2001
            , CPI     = "000000"       # Category code for total 
            )

# or this: note the "eq" function
cbs_get_data( id      = "7196ENG"      # table id
            , Periods = eq("2000MM01") | has_substring("JJ") # Jan 2000 and all years
            , CPI     = "000000"       # Category code for total 
            )
} # }