Package 'ipumseasyr'

Title: Easy Access to IPUMS Data
Description: A package with helper functions extending the ipumsr package for accessing NHGIS and other IPUMS data sources.
Authors: Eli Pousson [aut, cre, cph]
Maintainer: Eli Pousson <[email protected]>
License: MIT + file LICENSE
Version: 0.1.0
Built: 2024-11-21 16:27:25 UTC
Source: https://github.com/elipousson/ipumseasyr

Help Index


Define a NHGIS time series extract using ipumsr::define_extract_nhgis

Description

define_nhgis_ts_extract() is a wrapper for ipumsr::define_extract_nhgis() with defaults that support the creation of tidy data using read_nhgis_data() or pivot_nhgis_data().

Usage

define_nhgis_ts_extract(
  year = NULL,
  tables = NULL,
  geography = c("county", "state"),
  extent = "us",
  output = c("tidy", "wide", "file"),
  shape_year = NULL,
  basis = 2008,
  geometry = FALSE,
  ...,
  time_series_tables = NULL,
  description = NULL,
  shapefiles = NULL,
  data_format = "csv_no_header",
  validate = TRUE,
  api_key = Sys.getenv("IPUMS_API_KEY")
)

Arguments

output

Used to set tst_layout value. c("tidy", "wide", "file") corresponding to "time_by_row_layout", "time_by_column_layout", or "time_by_file_layout".

geometry

If TRUE, include shapefiles in the defined extract. If shapefiles is NULL, the function uses list_nhgis_shapefiles() with shape_year as the year parameter.

...

Arguments passed on to ipumsr::define_extract_nhgis

datasets

List of dataset specifications for any datasets to include in the extract request. Use ds_spec() to create a ds_spec object containing a dataset specification. See examples.

geographic_extents

Vector of geographic extents to use for all of the datasets in the extract definition (for instance, to obtain data within a particular state). Use "*" to select all available extents.

Required when any of the datasets included in the extract definition include geog_levels that require extent selection. See get_metadata_nhgis() to determine if a geographic level requires extent selection. At the time of writing, NHGIS supports extent selection only for blocks and block groups.

breakdown_and_data_type_layout

The desired layout of any datasets that have multiple data types or breakdown values.

  • "single_file" (default) keeps all data types and breakdown values in one file

  • "separate_files" splits each data type or breakdown value into its own file

Required if any datasets included in the extract definition consist of multiple data types (for instance, estimates and margins of error) or have multiple breakdown values specified. See get_metadata_nhgis() to determine whether a requested dataset has multiple data types.

time_series_tables

List of time series table specifications for any time series tables to include in the extract request. Use tst_spec() to create a tst_spec object containing a time series table specification. See examples.

description

Description of the extract.

shapefiles

Names of any shapefiles to include in the extract request.

data_format

The desired format of the extract data file.

  • "csv_no_header" (default) includes only a minimal header in the first row

  • "csv_header" includes a second, more descriptive header row.

  • "fixed_width" provides data in a fixed width format

Note that by default, read_nhgis() removes the additional header row in "csv_header" files.

Required when an extract definition includes any datasets or time_series_tables.

api_key

API key associated with your user account. Defaults to the value of the IPUMS_API_KEY environment variable. See set_ipums_api_key().


Download IPUMS extract using ipumsr::wait_for_extract and ipumsr::download_extract

Description

download_ipumsr_extract() is a wrapper for ipumsr::wait_for_extract() and ipumsr::download_extract() to wait until an extract is ready for download before attempting to download it.

Usage

download_ipumsr_extract(
  extract = NULL,
  download_dir = getwd(),
  overwrite = FALSE,
  progress = TRUE,
  ...,
  api_key = Sys.getenv("IPUMS_API_KEY")
)

Arguments

extract

One of:

  • An ipums_extract object

  • The data collection and extract number formatted as a string of the form "collection:number" or as a vector of the form c("collection", number)

  • An extract number to be associated with your default IPUMS collection. See set_ipums_default_collection()

For a list of codes used to refer to each collection, see ipums_data_collections().

download_dir

Path to the directory where the files should be written. Defaults to current working directory.

overwrite

If TRUE, overwrite any conflicting files that already exist in download_dir. Defaults to FALSE.

progress

If TRUE, output progress bar showing the status of the download request. Defaults to TRUE.

...

Arguments passed on to ipumsr::wait_for_extract

initial_delay_seconds

Seconds to wait before first status check. The wait time will automatically increase by 10 seconds between each successive check.

max_delay_seconds

Maximum interval to wait between status checks. When the wait interval reaches this value, checks will continue to occur at max_delay_seconds intervals until the extract is complete or timeout_seconds is reached. Defaults to 300 seconds (5 minutes).

timeout_seconds

Maximum total number of seconds to continue waiting for the extract before throwing an error. Defaults to 10,800 seconds (3 hours).

verbose

If TRUE, print status updates to the R console at the beginning of each wait interval and upon extract completion. Defaults to TRUE.

api_key

API key associated with your user account. Defaults to the value of the IPUMS_API_KEY environment variable. See set_ipums_api_key().


Get extract paths for extract with optional support for cached extract files

Description

Download extract with download_ipumsr_extract() and return a list of file paths for the data and shape files.

Usage

get_ipumsr_extract_paths(
  extract = NULL,
  data_file = NULL,
  shape_file = NULL,
  submit_extract = TRUE,
  download_extract = TRUE,
  download_dir = getwd(),
  overwrite = FALSE,
  progress = TRUE,
  refresh = FALSE,
  api_key = Sys.getenv("IPUMS_API_KEY")
)

Arguments

extract

An ipums_extract object.

submit_extract

If extract is not NULL and submit_extract = TRUE, use ipumsr::submit_extract to submit the extract.

download_dir

Path to the directory where the files should be written. Defaults to current working directory.

overwrite

If TRUE, overwrite any conflicting files that already exist in download_dir. Defaults to FALSE.

progress

If TRUE, output progress bar showing the status of the download request. Defaults to TRUE.

api_key

API key associated with your user account. Defaults to the value of the IPUMS_API_KEY environment variable. See set_ipums_api_key().

Value

A named list with "data" and "shape" elements containing extract file paths.


Get NHGIS time series data

Description

Use define_nhgis_ts_extract(), ipumsr::submit_extract(), ipumsr::download_extract(), and read_nhgis_files() to define, submit, download, and read a NHGIS time series extract. This function is only recommended for interactive use and is not recommended if you are requesting a large number of tables or geographies.

Usage

get_nhgis_ts_data(
  year = NULL,
  tables = NULL,
  geography = c("county", "state"),
  extent = "us",
  output = c("tidy", "wide", "file"),
  basis = 2008,
  shape_year = NULL,
  geometry = FALSE,
  extract = NULL,
  data_file = NULL,
  shape_file = NULL,
  state = NULL,
  ...,
  time_series_tables = NULL,
  description = NULL,
  shapefiles = NULL,
  data_format = "csv_no_header",
  validate = TRUE,
  submit_extract = TRUE,
  download_extract = TRUE,
  read_files = TRUE,
  download_dir = getwd(),
  overwrite = FALSE,
  progress = TRUE,
  verbose = progress,
  api_key = Sys.getenv("IPUMS_API_KEY")
)

Arguments

output

Used to set tst_layout value. c("tidy", "wide", "file") corresponding to "time_by_row_layout", "time_by_column_layout", or "time_by_file_layout".

geometry

If TRUE, include shapefiles in the defined extract. If shapefiles is NULL, the function uses list_nhgis_shapefiles() with shape_year as the year parameter.

extract

An ipums_extract object.

data_file

Path to a .zip archive containing an NHGIS extract or a single file from an NHGIS extract.

shape_file

Path to a single .shp file or a .zip archive containing at least one .shp file. See Details section.

time_series_tables

List of time series table specifications for any time series tables to include in the extract request. Use tst_spec() to create a tst_spec object containing a time series table specification. See examples.

description

Description of the extract.

shapefiles

Names of any shapefiles to include in the extract request.

data_format

The desired format of the extract data file.

  • "csv_no_header" (default) includes only a minimal header in the first row

  • "csv_header" includes a second, more descriptive header row.

  • "fixed_width" provides data in a fixed width format

Note that by default, read_nhgis() removes the additional header row in "csv_header" files.

Required when an extract definition includes any datasets or time_series_tables.

download_dir

Path to the directory where the files should be written. Defaults to current working directory.

overwrite

If TRUE, overwrite any conflicting files that already exist in download_dir. Defaults to FALSE.

progress

If TRUE, output progress bar showing the status of the download request. Defaults to TRUE.

verbose

Logical controlling whether to display output when loading data. If TRUE, displays IPUMS conditions, a progress bar, and column types. Otherwise, all are suppressed.

Will be overridden by readr.show_progress and readr.show_col_types options, if they are set.

api_key

API key associated with your user account. Defaults to the value of the IPUMS_API_KEY environment variable. See set_ipums_api_key().


Join a percent change in variable relative to a reference year

Description

join_nhgis_percent_change() joins a percent change column relative to a reference year. Optionally join a rank from the reference year using dplyr::ntile().

Usage

join_nhgis_percent_change(
  data,
  reference_year = NULL,
  value_col = "value",
  reference_prefix = "reference_",
  variable_col = "variable",
  year_col = "YEAR",
  rank_col = "rank",
  rank = NULL,
  rank_n = NULL,
  rank_by = NULL,
  ...,
  perc_prefix = "perc_change_",
  digits = 2
)

Arguments

reference_year

Reference year to use when calculating a percent change column.

rank, rank_n

Passed to x and n arguments of dplyr::ntile() to join a reference rank value.

rank_by

Used as .by argument of dplyr::mutate() if rank_n is not NULL.


Label ggplot2 plots with the appropriate credit caption for NHGIS data

Description

labs_nhgis() adds a standard credit caption for NHGIS data to make consistent attribution easier.

Usage

labs_nhgis(
  ...,
  caption = NULL,
  credit = "IPUMS NHGIS, University of Minnesota, www.nhgis.org.",
  prefix = "Source: ",
  collapse = " ",
  width = 80
)

Arguments

...

Arguments passed on to ggplot2::labs

title

The text for the title.

subtitle

The text for the subtitle for the plot which will be displayed below the title.

caption

The text for the caption which will be displayed in the bottom-right of the plot by default.

tag

The text for the tag label which will be displayed at the top-left of the plot by default.

alt,alt_insight

Text used for the generation of alt-text for the plot. See get_alt_text for examples.

credit

Credit line for IPUMS.

collapse

String to collapse caption and credit. Defaults to " ". Set to "\n" to place the credit line on a separate line following the caption. Ignored if caption is NULL.

width

Maximum width of caption line passed to stringr::str_wrap().


List NHGIS time series tables using ipumsr::get_metadata_nhgis

Description

Use ipumsr::get_metadata_nhgis() with type = "time_series_tables" to return a data frame of time series tables. Optionally filter by geographical integration type "nominal" or "standardized" ("2010" or "standardized to 2010" also work).

Usage

list_nhgis_ts_tables(
  ...,
  cache = TRUE,
  cache_file = "nhgis_time_series_tables.rds",
  refresh = FALSE,
  integration = NULL
)

Arguments

...

Additional parameters passed to ipumsr::get_metadata_nhgis()

refresh

If FALSE, do not read a file from cache. If TRUE, read a file from cache if it exists at the supplied path.

integration

Optional filter for geographical integration.


NHGIS Time Series Table names

Description

A vector of NHGIS time series table names named with table descriptions.

Usage

nhgis_ts_tables

Format

A character vector with 389 time series table names.


Read IPUMS geometry using ipumsr::read_ipums_sf

Description

Read IPUMS geometry using ipumsr::read_ipums_sf

Usage

read_ipums_geometry(
  shape_file = NULL,
  path = NULL,
  file_select = NULL,
  vars = "GISJOIN",
  encoding = NULL,
  bind_multiple = TRUE,
  add_layer_var = NULL,
  verbose = FALSE
)

Arguments

shape_file

Path to a single .shp file or a .zip archive containing at least one .shp file. See Details section.

file_select

If shape_file is a .zip archive that contains multiple files, an expression identifying the files to load. Accepts a character string specifying the file name, a tidyselect selection, or index position. If multiple files are selected, bind_multiple must be equal to TRUE.

vars

Names of variables to include in the output. Accepts a character vector of names or a tidyselect selection. If NULL, includes all variables in the file.

encoding

Encoding to use when reading the shape file. If NULL, defaults to "latin1" unless the file includes a .cpg metadata file with encoding information. The default value should generally be appropriate.

bind_multiple

If TRUE and shape_file contains multiple .shp files, row-bind the files into a single sf object. Useful when shape_file contains multiple files that represent the same geographic units for different extents (e.g. block-level data for multiple states).

add_layer_var

If TRUE, add a variable to the output data indicating the file that each row originates from. Defaults to FALSE unless bind_multiple = TRUE and multiple files exist in shape_file.

The column name will always be prefixed with "layer", but will be adjusted to avoid name conflicts if another column named "layer" already exists in the data.

verbose

If TRUE report additional progress information on load.


Read NHGIS data and geometry

Description

Read NHGIS data and geometry to return a named list or a combined sf object.

Usage

read_nhgis_files(
  path = NULL,
  data_file = NULL,
  data_file_select = NULL,
  shape_file = NULL,
  shape_file_select = NULL,
  verbose = FALSE,
  geometry = FALSE,
  ...
)

Arguments

path

Optional if data_file is supplied. A named list with a "data" and "shape" element containing the paths to the data_file and shape_file arguments of used by ipumsr::read_nhgis() and ipumsr::read_ipums_sf().

data_file

Path to a .zip archive containing an NHGIS extract or a single file from an NHGIS extract.

data_file_select, shape_file_select

Passed to file_select parameter of read_nhgis_data() or read_ipums_geometry().

shape_file

Path to a single .shp file or a .zip archive containing at least one .shp file. See Details section.

verbose

Logical controlling whether to display output when loading data. If TRUE, displays IPUMS conditions, a progress bar, and column types. Otherwise, all are suppressed.

Will be overridden by readr.show_progress and readr.show_col_types options, if they are set.

Value

A named list with "data" and "shape" elements or a combined sf data frame.


U.S. States Reference data

Description

Reference data with U.S. state names, USPS abbreviations, and Census divisions, and regions. Includes 50 U.S. States and the District of Columbia.

Usage

usa_states

Format

A data frame with 51 rows and 4 variables:

STATE

State name

STUSPS

State USPS abbreviation

division

U.S. Census Division name

region

U.S. Census Region name