From e6bcee6b4d9e4c9f254d47fe741a9406169b89cb Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 10:28:04 -0500 Subject: [PATCH 01/19] remove usage from landing page; add licensing info and more description --- README.Rmd | 84 ++++++--------------------- README.md | 164 +++++++++++++++++++---------------------------------- 2 files changed, 75 insertions(+), 173 deletions(-) diff --git a/README.Rmd b/README.Rmd index cd43d3c9..f1801944 100644 --- a/README.Rmd +++ b/README.Rmd @@ -21,86 +21,34 @@ ggplot2::theme_set(ggplot2::theme_bw()) [![codecov](https://codecov.io/gh/dsweber2/epidatr/branch/dev/graph/badge.svg?token=jVHL9eHZNZ)](https://app.codecov.io/gh/dsweber2/epidatr) -The [Delphi Epidatr package](https://cmu-delphi.github.io/epidatr/) is an R front-end for the [Delphi Epidata API](https://cmu-delphi.github.io/delphi-epidata/), which provides real-time access to epidemiological surveillance data for influenza, COVID-19, and other diseases for the USA at various geographical resolutions, both from official government sources such as the [Center for Disease Control (CDC)](https://www.cdc.gov/datastatistics/index.html) and [Google Trends](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/google-symptoms.html) and private partners such as [Facebook](https://delphi.cmu.edu/blog/2020/08/26/covid-19-symptom-surveys-through-facebook/) and [Change Healthcare](https://www.changehealthcare.com/). It is built and maintained by the Carnegie Mellon University [Delphi research group](https://delphi.cmu.edu/). - -This package is designed to streamline the downloading and usage of data from the [Delphi Epidata -API](https://cmu-delphi.github.io/delphi-epidata/). It provides a simple R interface to the API, including functions for downloading data, parsing the results, and converting the data into a tidy format. The API stores a historical record of all data, including corrections and updates, which is particularly useful for accurately backtesting forecasting models. We also provide packages for downstream data processing ([epiprocess](https://github.com/cmu-delphi/epiprocess)) and modeling ([epipredict](https://github.com/cmu-delphi/epipredict)). - -## Usage - -You can find detailed docs here: - -```{r} -library(epidatr) -# Obtain the smoothed covid-like illness (CLI) signal from the -# Facebook survey as it was on April 10, 2021 for the US -epidata <- pub_covidcast( - source = "fb-survey", - signals = "smoothed_cli", - geo_type = "nation", - time_type = "day", - geo_values = "us", - time_values = epirange(20210101, 20210601), - as_of = "2021-06-01" -) -epidata -``` +The [Delphi `epidatr` package](https://cmu-delphi.github.io/epidatr/) is an R front-end for the [Delphi Epidata API](https://cmu-delphi.github.io/delphi-epidata/), which provides real-time access to epidemiological surveillance data for influenza, COVID-19, and other diseases. `epidatr` is built and maintained by the Carnegie Mellon University [Delphi research group](https://delphi.cmu.edu/). -```{r fb-cli-signal} -# Plot this data -library(ggplot2) -ggplot(epidata, aes(x = time_value, y = value)) + - geom_line() + - labs( - title = "Smoothed CLI from Facebook Survey", - subtitle = "US, 2021", - x = "Date", - y = "CLI" - ) -``` +Data is available for the United States and a handful of other countries at various geographical resolutions, both from official government sources such as the [US Center for Disease Control (CDC)](https://www.cdc.gov/datastatistics/index.html), and private partners such as [Facebook](https://delphi.cmu.edu/blog/2020/08/26/covid-19-symptom-surveys-through-facebook/) and [Change Healthcare](https://www.changehealthcare.com/). The API stores a historical record of all data, including corrections and updates, which is particularly useful for accurately backtesting forecasting models. +`epidatr` is designed to streamline the downloading and usage of data from the Epidata API. The package provides a simple R interface to the API, including functions for downloading data, parsing the results, and converting the data into a tidy format. We also provide the [epiprocess](https://github.com/cmu-delphi/epiprocess) package for downstream data processing and [epipredict](https://github.com/cmu-delphi/epipredict) for modeling. -## Installation +Consult the [Epidata API documentation](https://cmu-delphi.github.io/delphi-epidata/) for details on the data included in the API, API key registration, licensing, and how to cite this data in your work. The documentation lists all the data sources and signals available through this API for [COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) and for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). -You can install the stable version of this package from CRAN: +**To get started** using this package, view the Getting Started guide at `vignette("epidatr")`. -```R -install.packages("epidatr") -pak::pkg_install("epidatr") -renv::install("epidatr") -``` +## Get updates -Or if you want the development version, install from GitHub: +**You should consider subscribing to the [API mailing list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api)** to be notified of package updates, new data sources, corrections, and other updates. -```R -# Install the dev version using `pak` or `remotes` -pak::pkg_install("cmu-delphi/epidatr") -remotes::install_github("cmu-delphi/epidatr") -renv::install("cmu-delphi/epidatr") -``` +## For users of the `covidcast` R package + +`epidatr` is a complete rewrite of the [`covidcast` package](https://cmu-delphi.github.io/covidcast/covidcastR/), with a focus on speed, reliability, and ease of use. The `covidcast` package is deprecated and will no longer be updated. -### API Keys +## Usage terms and citation -The Delphi API requires a (free) API key for full functionality. To generate -your key, register for a pseudo-anonymous account -[here](https://api.delphi.cmu.edu/epidata/admin/registration_form) and see more -discussion on the [general API -website](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html). See the -`save_api_key()` function documentation for details on how to use your API key. +We request that if you use the `epidatr` package in your work, or use any of the data provided by the Delphi Epidata API through non-`covidcast` endpoints, that you cite us using the citation given by [`citation("epidatr")`](https://cmu-delphi.github.io/epidatr/dev/authors.html#citation). If you use any of the data from the `covidcast` endpoint, please use the [COVIDcast citation](https://cmu-delphi.github.io/covidcast/covidcastR/authors.html#citation) as well. See the [COVIDcast licensing documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_licensing.html) and the [licensing documentation for other endpoints](https://cmu-delphi.github.io/delphi-epidata/api/README.html#data-licensing) for information about citing the datasets provided by the API. + +**Warning:** If you use data from the Epidata API to power a product, dashboard, app, or other service, please download the data you need and store it centrally rather than making API requests for every user. Our server resources are limited and cannot support high-volume interactive use. + +See also the [Terms of Use](https://delphi.cmu.edu/covidcast/terms-of-use/), noting that the data is a research product and not warranted for a particular purpose. -Note that the private endpoints (i.e. those prefixed with `pvt_`) require a -separate key that needs to be passed as an argument. These endpoints require -specific data use agreements to access. [mit-image]: https://img.shields.io/badge/License-MIT-yellow.svg [mit-url]: https://opensource.org/license/mit/ [github-actions-image]: https://github.com/cmu-delphi/epidatr/workflows/ci/badge.svg [github-actions-url]: https://github.com/cmu-delphi/epidatr/actions - -## Get updates - -You should consider subscribing to the [API mailing list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api) to be notified of package updates, new data sources, corrections, and other updates. - -## For users of the `covidcast` R package - -The `epidatr` package is a complete rewrite of the [`covidcast` package](https://cmu-delphi.github.io/covidcast/covidcastR/), with a focus on speed, reliability, and ease of use. The `covidcast` package is deprecated and will no longer be updated. diff --git a/README.md b/README.md index ea27a6fd..dabaf5a9 100644 --- a/README.md +++ b/README.md @@ -12,126 +12,80 @@ Actions](https://github.com/cmu-delphi/epidatr/workflows/ci/badge.svg)](https:// [![codecov](https://codecov.io/gh/dsweber2/epidatr/branch/dev/graph/badge.svg?token=jVHL9eHZNZ)](https://app.codecov.io/gh/dsweber2/epidatr) -The [Delphi Epidatr package](https://cmu-delphi.github.io/epidatr/) is +The [Delphi `epidatr` package](https://cmu-delphi.github.io/epidatr/) is an R front-end for the [Delphi Epidata API](https://cmu-delphi.github.io/delphi-epidata/), which provides real-time access to epidemiological surveillance data for influenza, -COVID-19, and other diseases for the USA at various geographical -resolutions, both from official government sources such as the [Center -for Disease Control -(CDC)](https://www.cdc.gov/datastatistics/index.html) and [Google -Trends](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/google-symptoms.html) -and private partners such as -[Facebook](https://delphi.cmu.edu/blog/2020/08/26/covid-19-symptom-surveys-through-facebook/) -and [Change Healthcare](https://www.changehealthcare.com/). It is built -and maintained by the Carnegie Mellon University [Delphi research +COVID-19, and other diseases. `epidatr` is built and maintained by the +Carnegie Mellon University [Delphi research group](https://delphi.cmu.edu/). -This package is designed to streamline the downloading and usage of data -from the [Delphi Epidata -API](https://cmu-delphi.github.io/delphi-epidata/). It provides a simple -R interface to the API, including functions for downloading data, -parsing the results, and converting the data into a tidy format. The API +Data is available for the United States and a handful of other countries +at various geographical resolutions, both from official government +sources such as the [US Center for Disease Control +(CDC)](https://www.cdc.gov/datastatistics/index.html), and private +partners such as +[Facebook](https://delphi.cmu.edu/blog/2020/08/26/covid-19-symptom-surveys-through-facebook/) +and [Change Healthcare](https://www.changehealthcare.com/). The API stores a historical record of all data, including corrections and updates, which is particularly useful for accurately backtesting -forecasting models. We also provide packages for downstream data -processing ([epiprocess](https://github.com/cmu-delphi/epiprocess)) and -modeling ([epipredict](https://github.com/cmu-delphi/epipredict)). - -## Usage - -You can find detailed docs here: - -``` r -library(epidatr) -# Obtain the smoothed covid-like illness (CLI) signal from the -# Facebook survey as it was on April 10, 2021 for the US -epidata <- pub_covidcast( - source = "fb-survey", - signals = "smoothed_cli", - geo_type = "nation", - time_type = "day", - geo_values = "us", - time_values = epirange(20210101, 20210601), - as_of = "2021-06-01" -) -epidata -#> # A tibble: 151 × 15 -#> geo_value signal source geo_type time_type time_value direction issue -#> -#> 1 us smoothed… fb-su… nation day 2021-01-01 NA 2021-01-06 -#> 2 us smoothed… fb-su… nation day 2021-01-02 NA 2021-01-07 -#> 3 us smoothed… fb-su… nation day 2021-01-03 NA 2021-01-08 -#> 4 us smoothed… fb-su… nation day 2021-01-04 NA 2021-01-09 -#> 5 us smoothed… fb-su… nation day 2021-01-05 NA 2021-01-10 -#> 6 us smoothed… fb-su… nation day 2021-01-06 NA 2021-01-29 -#> 7 us smoothed… fb-su… nation day 2021-01-07 NA 2021-01-29 -#> 8 us smoothed… fb-su… nation day 2021-01-08 NA 2021-01-29 -#> 9 us smoothed… fb-su… nation day 2021-01-09 NA 2021-01-29 -#> 10 us smoothed… fb-su… nation day 2021-01-10 NA 2021-01-29 -#> # ℹ 141 more rows -#> # ℹ 7 more variables: lag , missing_value , missing_stderr , -#> # missing_sample_size , value , stderr , sample_size -``` - -``` r -# Plot this data -library(ggplot2) -ggplot(epidata, aes(x = time_value, y = value)) + - geom_line() + - labs( - title = "Smoothed CLI from Facebook Survey", - subtitle = "US, 2021", - x = "Date", - y = "CLI" - ) -``` - - - -## Installation - -You can install the stable version of this package from CRAN: - -``` r -install.packages("epidatr") -pak::pkg_install("epidatr") -renv::install("epidatr") -``` - -Or if you want the development version, install from GitHub: - -``` r -# Install the dev version using `pak` or `remotes` -pak::pkg_install("cmu-delphi/epidatr") -remotes::install_github("cmu-delphi/epidatr") -renv::install("cmu-delphi/epidatr") -``` - -### API Keys - -The Delphi API requires a (free) API key for full functionality. To -generate your key, register for a pseudo-anonymous account -[here](https://api.delphi.cmu.edu/epidata/admin/registration_form) and -see more discussion on the [general API -website](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html). -See the `save_api_key()` function documentation for details on how to -use your API key. - -Note that the private endpoints (i.e. those prefixed with `pvt_`) -require a separate key that needs to be passed as an argument. These -endpoints require specific data use agreements to access. +forecasting models. + +`epidatr` is designed to streamline the downloading and usage of data +from the Epidata API. The package provides a simple R interface to the +API, including functions for downloading data, parsing the results, and +converting the data into a tidy format. We also provide the +[epiprocess](https://github.com/cmu-delphi/epiprocess) package for +downstream data processing and +[epipredict](https://github.com/cmu-delphi/epipredict) for modeling. + +Consult the [Epidata API +documentation](https://cmu-delphi.github.io/delphi-epidata/) for details +on the data included in the API, API key registration, licensing, and +how to cite this data in your work. The documentation lists all the data +sources and signals available through this API for +[COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) +and for [other +diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). + +**To get started** using this package, view the Getting Started guide at +`vignette("epidatr")`. ## Get updates -You should consider subscribing to the [API mailing -list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api) +**You should consider subscribing to the [API mailing +list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api)** to be notified of package updates, new data sources, corrections, and other updates. ## For users of the `covidcast` R package -The `epidatr` package is a complete rewrite of the [`covidcast` +`epidatr` is a complete rewrite of the [`covidcast` package](https://cmu-delphi.github.io/covidcast/covidcastR/), with a focus on speed, reliability, and ease of use. The `covidcast` package is deprecated and will no longer be updated. + +## Usage terms and citation + +We request that if you use the `epidatr` package in your work, or use +any of the data provided by the Delphi Epidata API through +non-`covidcast` endpoints, that you cite us using the citation given by +[`citation("epidatr")`](https://cmu-delphi.github.io/epidatr/dev/authors.html#citation). +If you use any of the data from the `covidcast` endpoint, please use the +[COVIDcast +citation](https://cmu-delphi.github.io/covidcast/covidcastR/authors.html#citation) +as well. See the [COVIDcast licensing +documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_licensing.html) +and the [licensing documentation for other +endpoints](https://cmu-delphi.github.io/delphi-epidata/api/README.html#data-licensing) +for information about citing the datasets provided by the API. + +**Warning:** If you use data from the Epidata API to power a product, +dashboard, app, or other service, please download the data you need and +store it centrally rather than making API requests for every user. Our +server resources are limited and cannot support high-volume interactive +use. + +See also the [Terms of +Use](https://delphi.cmu.edu/covidcast/terms-of-use/), noting that the +data is a research product and not warranted for a particular purpose. From 637f6f4d4115f6af91e5a242c4ecc8096fb7f813 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 10:56:32 -0500 Subject: [PATCH 02/19] move mailing list ad down --- README.Rmd | 8 ++++---- README.md | 14 +++++++------- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/README.Rmd b/README.Rmd index f1801944..792ed89f 100644 --- a/README.Rmd +++ b/README.Rmd @@ -31,14 +31,14 @@ Consult the [Epidata API documentation](https://cmu-delphi.github.io/delphi-epid **To get started** using this package, view the Getting Started guide at `vignette("epidatr")`. -## Get updates - -**You should consider subscribing to the [API mailing list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api)** to be notified of package updates, new data sources, corrections, and other updates. - ## For users of the `covidcast` R package `epidatr` is a complete rewrite of the [`covidcast` package](https://cmu-delphi.github.io/covidcast/covidcastR/), with a focus on speed, reliability, and ease of use. The `covidcast` package is deprecated and will no longer be updated. +## Get updates + +**You should consider subscribing to the [API mailing list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api)** to be notified of package updates, new data sources, corrections, and other updates. + ## Usage terms and citation We request that if you use the `epidatr` package in your work, or use any of the data provided by the Delphi Epidata API through non-`covidcast` endpoints, that you cite us using the citation given by [`citation("epidatr")`](https://cmu-delphi.github.io/epidatr/dev/authors.html#citation). If you use any of the data from the `covidcast` endpoint, please use the [COVIDcast citation](https://cmu-delphi.github.io/covidcast/covidcastR/authors.html#citation) as well. See the [COVIDcast licensing documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_licensing.html) and the [licensing documentation for other endpoints](https://cmu-delphi.github.io/delphi-epidata/api/README.html#data-licensing) for information about citing the datasets provided by the API. diff --git a/README.md b/README.md index dabaf5a9..c44270a8 100644 --- a/README.md +++ b/README.md @@ -51,13 +51,6 @@ diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-spe **To get started** using this package, view the Getting Started guide at `vignette("epidatr")`. -## Get updates - -**You should consider subscribing to the [API mailing -list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api)** -to be notified of package updates, new data sources, corrections, and -other updates. - ## For users of the `covidcast` R package `epidatr` is a complete rewrite of the [`covidcast` @@ -65,6 +58,13 @@ package](https://cmu-delphi.github.io/covidcast/covidcastR/), with a focus on speed, reliability, and ease of use. The `covidcast` package is deprecated and will no longer be updated. +## Get updates + +**You should consider subscribing to the [API mailing +list](https://lists.andrew.cmu.edu/mailman/listinfo/delphi-covidcast-api)** +to be notified of package updates, new data sources, corrections, and +other updates. + ## Usage terms and citation We request that if you use the `epidatr` package in your work, or use From 3ba38367a7be0e7c960e7c3612b606dfee29672b Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 12:16:06 -0500 Subject: [PATCH 03/19] move install, api keys to intro vignette; add more examples, take out giant endpt list --- vignettes/epidatr.Rmd | 426 +++++++++++------------------------------- 1 file changed, 105 insertions(+), 321 deletions(-) diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index 6a512bd2..1b10fa55 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -11,379 +11,163 @@ vignette: > ```{r, echo = FALSE, message = FALSE} knitr::opts_chunk$set(collapse = TRUE, comment = "#>") options(tibble.print_min = 4L, tibble.print_max = 4L, max.print = 4L) -library(epidatr) -library(dplyr) ``` The epidatr package provides access to all the endpoints of the [Delphi Epidata API](https://cmu-delphi.github.io/delphi-epidata/), and can be used to make -requests for specific signals on specific dates and in selected geographic +requests for specific signals on specific dates and in select geographic regions. -We recommend you register for an API key. While most endpoints are available -without one, there are [limits on API usage for anonymous -users](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html), including -a rate limit. See `save_api_key()` for details on how to obtain an API key and -set this package to use it. -## Basic Usage +## Setup -Fetching some data from the Delphi Epidata API is simple. Suppose we are -interested in the [`covidcast` -endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html), which -provides access to a range of data on COVID-19. Reviewing the endpoint -documentation, we see that we need to specify a data source name, a signal name, -a geographic level, a time resolution, and the location and times of interest. +### Installation -In this case, the `pub_covidcast()` function lets us specify these parameters for -the endpoint and returns a tibble with the results: +You can install the stable version of this package from CRAN: -```{r} -epidata <- pub_covidcast( - "fb-survey", "smoothed_cli", "state", "day", "pa", - epirange(20210105, 20210410) -) -epidata -``` - -We can then easily plot the data using ggplot2: - -```{r, out.height="65%"} -library(ggplot2) -ggplot(epidata, aes(x = time_value, y = value)) + - geom_line() + - labs( - title = "Smoothed CLI from Facebook Survey", - subtitle = "PA, 2021", - x = "Date", - y = "CLI" - ) +```R +install.packages("epidatr") +pak::pkg_install("epidatr") +renv::install("epidatr") ``` -The [Delphi Epidata API documentation](https://cmu-delphi.github.io/delphi-epidata/) has more information on the available endpoints and arguments. You can also use the `avail_endpoints()` function to get a table of endpoint functions: +Or if you want the development version, install from GitHub: -```{r} -avail_endpoints() +```R +# Install the dev version using `pak` or `remotes` +pak::pkg_install("cmu-delphi/epidatr@dev") +remotes::install_github("cmu-delphi/epidatr", ref = "dev") +renv::install("cmu-delphi/epidatr@dev") ``` -Example queries with all the endpoint functions available in this package are given [below](#example-queries). +### API Keys -## Advanced Usage (Experimental) +The Delphi API requires a (free) API key for full functionality. While most +endpoints are available without one, there are +[limits on API usage for anonymous users](https://cmu-delphi.github.io/delphi-epidata/api/api_keys.html), +including a rate limit. -The [COVIDcast -endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html) of the -Epidata API contains many separate data sources and signals. It can be difficult -to find the name of the signal you're looking for, so you can use -`covidcast_epidata()` to get help with finding sources and functions without -leaving R. +To generate your key, +[register for a pseudo-anonymous account](https://api.delphi.cmu.edu/epidata/admin/registration_form). +See the `save_api_key()` function documentation for details on how to set up +`epidatr` to use your API key. -The `covidcast_epidata()` function fetches a list of all signals, and returns an -object containing fields for every signal: +_Note_ that private endpoints (i.e. those prefixed with `pvt_`) require a +separate key that needs to be passed as an argument. These endpoints require +specific data use agreements to access. -```{r} -epidata <- covidcast_epidata() -epidata$signals -``` -If you use an editor that supports tab completion, such as RStudio, type -`epidata$signals$` and wait for the tab completion popup. You will be able to -type the name of signals and have the autocomplete feature select them from the -list for you. Note that some signal names have dashes in them, so to access them -we rely on the backtick operator: +## Basic Usage -```{r} -epidata$signals$`fb-survey:smoothed_cli` -``` +Fetching data from the Delphi Epidata API is simple. Suppose we are +interested in the +[`covidcast` endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html), +which provides access to a +[wide range of data](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) +on COVID-19. Reviewing the endpoint documentation, we see that we +[need to specify](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html#constructing-api-queries) +a data source name, a signal name, a geographic level, a time resolution, and +the location and times of interest. -These objects can be used directly to fetch data, without requiring us to use -the `pub_covidcast()` function. Simply use the `$call` attribute of the object: +The `pub_covidcast()` function lets us access the `covidcast` endpoint: ```{r} -epidata$signals$`fb-survey:smoothed_cli`$call("state", "pa", epirange(20210405, 20210410)) -``` - -## Advanced Usage (Debugging) - -We can obtain the [`epidata_call`] object underlying a request by setting the -`dry_run` argument to `TRUE` in `fetch_args_list()`: +library(epidatr) +library(dplyr) -```{r} -pub_covidcast( - "fb-survey", "smoothed_cli", "state", "day", "pa", - epirange(20210405, 20210410), - fetch_args = fetch_args_list(dry_run = TRUE) +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for the US +epidata <- pub_covidcast( + source = "fb-survey", + signals = "smoothed_cli", + geo_type = "nation", + time_type = "day", + geo_values = "us", + time_values = epirange(20210105, 20210410) ) +knitr::kable(head(epidata)) ``` -## Example Queries - -(Some endpoints allow for the use of `*` to access data at all locations. Check the help for a given endpoint to see if it supports `*`.) - -### COVIDcast Main Endpoint - -API docs: +`pub_covidcast()` returns a `tibble`. (Here we’re using `knitr::kable()` to make +it more readable.) Each row represents one observation in Pennsylvania on one +day. The state abbreviation is given in the `geo_value` column, the date in +the `time_value` column. Here `value` is the requested signal -- in this +case, the smoothed estimate of the percentage of people with COVID-like +illness, based on the symptom surveys, and `stderr` is its standard error. -County geo_values are [FIPS codes](https://en.wikipedia.org/wiki/List_of_United_States_FIPS_codes_by_county) and are discussed in the API docs [here](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html). The example below is for Orange County, California. +The Epidata API makes signals available at different geographic levels, +depending on the endpoint. To request signals for all states instead of the +entire US, we use the `geo_type` argument paired with `*` for the +`geo_values` argument. (Only some endpoints allow for the use of `*` to +access data at all locations. Check the help for a given endpoint to see if +it supports `*`.) -```{r} +```{r, eval = FALSE} +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for all states pub_covidcast( source = "fb-survey", - signals = "smoothed_accept_covid_vaccine", - geo_type = "county", + signals = "smoothed_cli", + geo_type = "state", time_type = "day", - time_values = epirange(20201221, 20201225), - geo_values = "06059" + geo_values = "*", + time_values = epirange(20210105, 20210410) ) ``` -The `covidcast` endpoint supports `*` in its time and geo fields: +We can fetch a subset of states by listing out the desired locations: -```{r} +```{r, eval = FALSE} +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for Pennsylvania pub_covidcast( source = "fb-survey", - signals = "smoothed_accept_covid_vaccine", - geo_type = "county", + signals = "smoothed_cli", + geo_type = "state", time_type = "day", - time_values = epirange(20201221, 20201225), - geo_values = "*" + geo_values = c("pa", "ca", "fl"), + time_values = epirange(20210105, 20210410) ) ``` -### Other Covid Endpoints - -#### COVID-19 Hospitalization: Facility Lookup - -API docs: - -```{r, eval = FALSE} -pub_covid_hosp_facility_lookup(city = "southlake") -pub_covid_hosp_facility_lookup(state = "WY") -# A non-example (there is no city called New York in Wyoming) -pub_covid_hosp_facility_lookup(state = "WY", city = "New York") -``` - -#### COVID-19 Hospitalization by Facility - -API docs: - -```{r, eval = FALSE} -pub_covid_hosp_facility( - hospital_pks = "100075", - collection_weeks = epirange(20200101, 20200501) -) -``` - -#### COVID-19 Hospitalization by State - -API docs: - -```{r, eval = FALSE} -pub_covid_hosp_state_timeseries(states = "MA", dates = "20200510") -``` - -### Flu Endpoints - -#### Delphi's ILINet forecasts - -API docs: - -```{r, eval = FALSE} -del <- pub_delphi(system = "ec", epiweek = 201501) -names(del[[1L]]$forecast) -``` - -#### FluSurv hospitalization data - -API docs: - -```{r, eval = FALSE} -pub_flusurv(locations = "ca", epiweeks = 202001) -``` - -#### Fluview data - -API docs: - -```{r, eval = FALSE} -pub_fluview(regions = "nat", epiweeks = epirange(201201, 202001)) -``` - -#### Fluview virological data from clinical labs - -API docs: - -```{r, eval = FALSE} -pub_fluview_clinical(regions = "nat", epiweeks = epirange(201601, 201701)) -``` - -#### Fluview metadata - -API docs: - -```{r, eval = FALSE} -pub_fluview_meta() -``` - -#### Google Flu Trends data - -API docs: +We can also request data for a single location at a time, via the `geo_values` argument. -```{r, eval = FALSE} -pub_gft(locations = "hhs1", epiweeks = epirange(201201, 202001)) -``` - -#### ECDC ILI - -API docs: - -```{r, eval = FALSE} -pub_ecdc_ili(regions = "Armenia", epiweeks = 201840) -``` - -#### KCDC ILI - -API docs: - -```{r, eval = FALSE} -pub_kcdc_ili(regions = "ROK", epiweeks = 200436) -``` - -#### NIDSS Flu - -API docs: - -```{r, eval = FALSE} -pub_nidss_flu(regions = "taipei", epiweeks = epirange(200901, 201301)) -``` - -#### ILI Nearby Nowcast - -API docs: - -```{r, eval = FALSE} -pub_nowcast(locations = "ca", epiweeks = epirange(202201, 202319)) -``` - -### Dengue Endpoints - -#### Delphi's Dengue Nowcast - -API docs: - -```{r, eval = FALSE} -pub_dengue_nowcast(locations = "pr", epiweeks = epirange(201401, 202301)) -``` - -#### NIDSS dengue - -API docs: - -```{r, eval = FALSE} -pub_nidss_dengue(locations = "taipei", epiweeks = epirange(200301, 201301)) -``` - -### PAHO Dengue - -API docs: - -```{r, eval=FALSE} -pub_paho_dengue(regions = "ca", epiweeks = epirange(200201, 202319)) -``` - -### Other Endpoints - -#### Wikipedia Access - -API docs: - -```{r, eval = FALSE} -pub_wiki(language = "en", articles = "influenza", epiweeks = epirange(202001, 202319)) -``` - -### Private methods - -These require private access keys to use (separate from the Delphi Epidata API key). -To actually run these locally, you will need to store these secrets in your `.Reviron` file, or set them as environmental variables. - -#### CDC - -API docs: - -```{r, eval=FALSE} -pvt_cdc(auth = Sys.getenv("SECRET_API_AUTH_CDC"), epiweeks = epirange(202003, 202304), locations = "ma") -``` - -#### Dengue Digital Surveillance Sensors - -API docs: - -```{r, eval=FALSE} -pvt_dengue_sensors( - auth = Sys.getenv("SECRET_API_AUTH_SENSORS"), - names = "ght", - locations = "ag", - epiweeks = epirange(201404, 202004) -) -``` - -#### Google Health Trends - -API docs: - -```{r, eval=FALSE} -pvt_ght( - auth = Sys.getenv("SECRET_API_AUTH_GHT"), - epiweeks = epirange(199301, 202304), - locations = "ma", - query = "how to get over the flu" +```{r} +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for Pennsylvania +epidata <- pub_covidcast( + source = "fb-survey", + signals = "smoothed_cli", + geo_type = "state", + time_type = "day", + geo_values = "pa", + time_values = epirange(20210105, 20210410) ) +knitr::kable(head(epidata)) ``` -#### NoroSTAT metadata - -API docs: - -```{r, eval=FALSE} -pvt_meta_norostat(auth = Sys.getenv("SECRET_API_AUTH_NOROSTAT")) -``` - -#### NoroSTAT data +## Plotting -API docs: +Because the output data is in a standard `tibble` format, we can easily plot +it using `ggplot2`: -```{r, eval=FALSE} -pvt_norostat(auth = Sys.getenv("SECRET_API_AUTH_NOROSTAT"), locations = "1", epiweeks = 201233) -``` - -#### Quidel Influenza testing - -API docs: - -```{r, eval=FALSE} -pvt_quidel(auth = Sys.getenv("SECRET_API_AUTH_QUIDEL"), locations = "hhs1", epiweeks = epirange(200301, 202105)) +```{r, out.height="65%"} +library(ggplot2) +ggplot(epidata, aes(x = time_value, y = value)) + + geom_line() + + labs( + title = "Smoothed CLI from Facebook Survey", + subtitle = "PA, 2021", + x = "Date", + y = "CLI" + ) ``` -#### Sensors +## Signal discovery -API docs: +The [Delphi Epidata API documentation](https://cmu-delphi.github.io/delphi-epidata/) has more information on available endpoints. You can also use the `avail_endpoints()` function to get a table of endpoint functions: -```{r, eval=FALSE} -pvt_sensors( - auth = Sys.getenv("SECRET_API_AUTH_SENSORS"), - names = "sar3", - locations = "nat", - epiweeks = epirange(200301, 202105) -) +```{r} +avail_endpoints() ``` -#### Twitter - -API docs: - -```{r, eval=FALSE} -pvt_twitter( - auth = Sys.getenv("SECRET_API_AUTH_TWITTER"), - locations = "nat", - epiweeks = epirange(200301, 202105) -) -``` From 6a8a224d9e9601ac0def5a56b82721ce364c33b0 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 12:19:48 -0500 Subject: [PATCH 04/19] signal discovery blurb and vignette stub --- vignettes/epidatr.Rmd | 7 ++++++- vignettes/signal-discovery.Rmd | 0 2 files changed, 6 insertions(+), 1 deletion(-) create mode 100644 vignettes/signal-discovery.Rmd diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index 1b10fa55..ceddab59 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -165,9 +165,14 @@ ggplot(epidata, aes(x = time_value, y = value)) + ## Signal discovery -The [Delphi Epidata API documentation](https://cmu-delphi.github.io/delphi-epidata/) has more information on available endpoints. You can also use the `avail_endpoints()` function to get a table of endpoint functions: +Above we used data from [Delphi’s symptom surveys](https://delphi.cmu.edu/covid19/ctis/), but the Epidata API includes numerous data streams: medical claims data, cases and deaths, mobility, and many others. This can make it a challenge to find the data stream that you are most interested in. + +The Epidata documentation lists all the data sources and signals available through the API for [COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) and for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). + +You can also use the `avail_endpoints()` function to get a table of endpoint functions: ```{r} avail_endpoints() ``` +See `vignette("signal-discovery")` for more information. diff --git a/vignettes/signal-discovery.Rmd b/vignettes/signal-discovery.Rmd new file mode 100644 index 00000000..e69de29b From a8b81bb3c72e500a13fb8c15b7aa217722745e8d Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 12:26:29 -0500 Subject: [PATCH 05/19] versioned data blurb and vignette stub --- vignettes/epidatr.Rmd | 23 +++++++++++++++++++++++ vignettes/versioned-data.Rmd | 0 2 files changed, 23 insertions(+) create mode 100644 vignettes/versioned-data.Rmd diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index ceddab59..79cc2e0d 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -146,6 +146,29 @@ epidata <- pub_covidcast( knitr::kable(head(epidata)) ``` +## Getting versioned data + +The Epidata API stores a historical record of all data, including corrections and updates, which is particularly useful for accurately backtesting forecasting models. To fetch versioned data + +We can also request data for a single location at a time, via the `geo_values` argument. + +```{r, eval = FALSE} +# Obtain the smoothed covid-like illness (CLI) signal from the COVID-19 +# Trends and Impact survey for Pennsylvania as it was on 2021-06-01 +pub_covidcast( + source = "fb-survey", + signals = "smoothed_cli", + geo_type = "state", + time_type = "day", + geo_values = "pa", + time_values = epirange(20210105, 20210410), + as_of = "2021-06-01" +) +``` + +See `vignette("versioned-data")` for more information and more ways to specify versioned data. + + ## Plotting Because the output data is in a standard `tibble` format, we can easily plot diff --git a/vignettes/versioned-data.Rmd b/vignettes/versioned-data.Rmd new file mode 100644 index 00000000..e69de29b From f857d67f756b02e3d30bd54f82d74006ad281d6b Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 12:45:26 -0500 Subject: [PATCH 06/19] add geo info and format avail_endpoint --- vignettes/epidatr.Rmd | 18 ++++++++++++++++-- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index 79cc2e0d..94314143 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -186,7 +186,16 @@ ggplot(epidata, aes(x = time_value, y = value)) + ) ``` -## Signal discovery +## Finding locations of interest + +Most data is only available for the US. Select endpoints report other countries at the national and/or regional levels. Endpoint descriptions explicitly state when they cover non-US locations. + +For endpoints that report US data, see the +[geographic coding documentation](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html) +for available geographic levels. + + +## Finding data sources and signals of interest Above we used data from [Delphi’s symptom surveys](https://delphi.cmu.edu/covid19/ctis/), but the Epidata API includes numerous data streams: medical claims data, cases and deaths, mobility, and many others. This can make it a challenge to find the data stream that you are most interested in. @@ -194,8 +203,13 @@ The Epidata documentation lists all the data sources and signals available throu You can also use the `avail_endpoints()` function to get a table of endpoint functions: -```{r} +```{r, eval = FALSE} avail_endpoints() ``` +```{r, echo = FALSE} +invisible(capture.output(endpts <- avail_endpoints())) +knitr::kable(endpts) +``` + See `vignette("signal-discovery")` for more information. From cf2ed737089f49960eb8d2d154b9ffe1796fcd9b Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 12:51:19 -0500 Subject: [PATCH 07/19] list international endpoints --- vignettes/epidatr.Rmd | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index 94314143..6fa7815b 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -195,6 +195,19 @@ For endpoints that report US data, see the for available geographic levels. +### International data + +International data is available via + +- `pub_dengue_nowcast` (North and South America) +- `pub_ecdc_ili` (Europe) +- `pub_kcdc_ili` (Korea) +- `pub_nidss_dengue` (Taiwan) +- `pub_nidss_flu` (Taiwan) +- `pub_paho_dengue` (North and South America) +- `pvt_dengue_sensors` (North and South America) + + ## Finding data sources and signals of interest Above we used data from [Delphi’s symptom surveys](https://delphi.cmu.edu/covid19/ctis/), but the Epidata API includes numerous data streams: medical claims data, cases and deaths, mobility, and many others. This can make it a challenge to find the data stream that you are most interested in. From 86502a37d58460466dcbea20116605b101e610bc Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 13:23:04 -0500 Subject: [PATCH 08/19] rename and reflow text --- vignettes/epidatr.Rmd | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index 6fa7815b..c17caa15 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -1,8 +1,8 @@ --- -title: "Delphi Epidata R API Client" +title: "Get started with `epidatr`" output: rmarkdown::html_vignette vignette: > - %\VignetteIndexEntry{Delphi Epidata R API Client} + %\VignetteIndexEntry{Get started with `epidatr`} %\VignetteEngine{knitr::rmarkdown} %\VignetteDepends{ggplot2} \usepackage[utf8]{inputenc} @@ -210,9 +210,14 @@ International data is available via ## Finding data sources and signals of interest -Above we used data from [Delphi’s symptom surveys](https://delphi.cmu.edu/covid19/ctis/), but the Epidata API includes numerous data streams: medical claims data, cases and deaths, mobility, and many others. This can make it a challenge to find the data stream that you are most interested in. +Above we used data from [Delphi’s symptom surveys](https://delphi.cmu.edu/covid19/ctis/), +but the Epidata API includes numerous data streams: medical claims data, cases +and deaths, mobility, and many others. This can make it a challenge to find +the data stream that you are most interested in. -The Epidata documentation lists all the data sources and signals available through the API for [COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) and for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). +The Epidata documentation lists all the data sources and signals available +through the API for [COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) +and for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). You can also use the `avail_endpoints()` function to get a table of endpoint functions: From ade3ea249af722484714be2bfa69cde6c255e006 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 13:50:08 -0500 Subject: [PATCH 09/19] flesh out signal discovery vignette --- vignettes/signal-discovery.Rmd | 360 +++++++++++++++++++++++++++++++++ 1 file changed, 360 insertions(+) diff --git a/vignettes/signal-discovery.Rmd b/vignettes/signal-discovery.Rmd index e69de29b..6a3bd628 100644 --- a/vignettes/signal-discovery.Rmd +++ b/vignettes/signal-discovery.Rmd @@ -0,0 +1,360 @@ +--- +title: "Finding data sources and signals of interest" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Finding data sources and signals of interest} + %\VignetteEngine{knitr::rmarkdown} + \usepackage[utf8]{inputenc} +--- + +```{r, echo = FALSE, message = FALSE} +knitr::opts_chunk$set(collapse = TRUE, comment = "#>") +options(tibble.print_min = 4L, tibble.print_max = 4L, max.print = 4L) +library(epidatr) +library(dplyr) +``` + +The Epidata API includes numerous data streams -- medical claims data, cases and deaths, mobility, and many others -- covering different geographic regions. This can make it a challenge to find the data stream that you are most interested in. + +Example queries with all the endpoint functions available in this package are +given [below](#example-queries). + + +## Using the documentation + +The Epidata documentation lists all the data sources and signals available +through the API for +[COVID-19](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_signals.html) and +for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html#source-specific-parameters). +The site also includes a search tool if you have a keyword (e.g. "Taiwan") in mind. + + +## Interactive tooling + +We provide a couple `epidatr` functions to help find data sources and signals. + +The `avail_endpoints()` function lists endpoints, each of which, except for +COVIDcast, corresponds to a single data source. `avail_endpoints()` outputs a +`tibble` of endpoints and brief descriptions, which explicitly state when they +cover non-US locations: + +```{r, eval = FALSE} +avail_endpoints() +``` + +```{r, echo = FALSE} +invisible(capture.output(endpts <- avail_endpoints())) +knitr::kable(endpts) +``` + +The `covidcast_epidata()` function lets you look more in-depth at the data +sources available through the COVIDcast endpoint. The function describes +all available data sources and signals: + +```{r} +covid_sources <- covidcast_epidata() +head(covid_sources$sources, n = 2) +``` + +Each source is included as an entry in the `covid_sources$sources` list, associated +with a `tibble` describing included signals. + +If you use an editor that supports tab completion, such as RStudio, type +`covid_sources$source$` and wait for the tab completion popup. You will be able to +browse the list of data sources. + +```{r} +covid_sources$signals +``` + +If you use an editor that supports tab completion, type +`covid_sources$signals$` and wait for the tab completion popup. You will be +able to type the name of signals and have the autocomplete feature select +them from the list for you. In the tab-completion popup, signal names are +prefixed with the name of the data source for filtering convenience. + +_Note_ that some signal names have dashes in them, so to access them +we rely on the backtick operator: + +```{r} +covid_sources$signals$`fb-survey:smoothed_cli` +``` + +These signal objects can be used directly to fetch data, without requiring us to use +the `pub_covidcast()` function. Simply use the `$call` attribute of the object: + +```{r} +covid_sources$signals$`fb-survey:smoothed_cli`$call("state", "pa", epirange(20210405, 20210410)) +``` + + +## Example Queries + +### COVIDcast Main Endpoint + +API docs: + +County geo_values are [FIPS codes](https://en.wikipedia.org/wiki/List_of_United_States_FIPS_codes_by_county) and are discussed in the API docs [here](https://cmu-delphi.github.io/delphi-epidata/api/covidcast_geography.html). The example below is for Orange County, California. + +```{r} +pub_covidcast( + source = "fb-survey", + signals = "smoothed_accept_covid_vaccine", + geo_type = "county", + time_type = "day", + time_values = epirange(20201221, 20201225), + geo_values = "06059" +) +``` + +The `covidcast` endpoint supports `*` in its time and geo fields: + +```{r} +pub_covidcast( + source = "fb-survey", + signals = "smoothed_accept_covid_vaccine", + geo_type = "county", + time_type = "day", + time_values = epirange(20201221, 20201225), + geo_values = "*" +) +``` + +### Other Covid Endpoints + +#### COVID-19 Hospitalization: Facility Lookup + +API docs: + +```{r, eval = FALSE} +pub_covid_hosp_facility_lookup(city = "southlake") +pub_covid_hosp_facility_lookup(state = "WY") +# A non-example (there is no city called New York in Wyoming) +pub_covid_hosp_facility_lookup(state = "WY", city = "New York") +``` + +#### COVID-19 Hospitalization by Facility + +API docs: + +```{r, eval = FALSE} +pub_covid_hosp_facility( + hospital_pks = "100075", + collection_weeks = epirange(20200101, 20200501) +) +``` + +#### COVID-19 Hospitalization by State + +API docs: + +```{r, eval = FALSE} +pub_covid_hosp_state_timeseries(states = "MA", dates = "20200510") +``` + +### Flu Endpoints + +#### Delphi's ILINet forecasts + +API docs: + +```{r, eval = FALSE} +del <- pub_delphi(system = "ec", epiweek = 201501) +names(del[[1L]]$forecast) +``` + +#### FluSurv hospitalization data + +API docs: + +```{r, eval = FALSE} +pub_flusurv(locations = "ca", epiweeks = 202001) +``` + +#### Fluview data + +API docs: + +```{r, eval = FALSE} +pub_fluview(regions = "nat", epiweeks = epirange(201201, 202001)) +``` + +#### Fluview virological data from clinical labs + +API docs: + +```{r, eval = FALSE} +pub_fluview_clinical(regions = "nat", epiweeks = epirange(201601, 201701)) +``` + +#### Fluview metadata + +API docs: + +```{r, eval = FALSE} +pub_fluview_meta() +``` + +#### Google Flu Trends data + +API docs: + +```{r, eval = FALSE} +pub_gft(locations = "hhs1", epiweeks = epirange(201201, 202001)) +``` + +#### ECDC ILI + +API docs: + +```{r, eval = FALSE} +pub_ecdc_ili(regions = "Armenia", epiweeks = 201840) +``` + +#### KCDC ILI + +API docs: + +```{r, eval = FALSE} +pub_kcdc_ili(regions = "ROK", epiweeks = 200436) +``` + +#### NIDSS Flu + +API docs: + +```{r, eval = FALSE} +pub_nidss_flu(regions = "taipei", epiweeks = epirange(200901, 201301)) +``` + +#### ILI Nearby Nowcast + +API docs: + +```{r, eval = FALSE} +pub_nowcast(locations = "ca", epiweeks = epirange(202201, 202319)) +``` + +### Dengue Endpoints + +#### Delphi's Dengue Nowcast + +API docs: + +```{r, eval = FALSE} +pub_dengue_nowcast(locations = "pr", epiweeks = epirange(201401, 202301)) +``` + +#### NIDSS dengue + +API docs: + +```{r, eval = FALSE} +pub_nidss_dengue(locations = "taipei", epiweeks = epirange(200301, 201301)) +``` + +### PAHO Dengue + +API docs: + +```{r, eval=FALSE} +pub_paho_dengue(regions = "ca", epiweeks = epirange(200201, 202319)) +``` + +### Other Endpoints + +#### Wikipedia Access + +API docs: + +```{r, eval = FALSE} +pub_wiki(language = "en", articles = "influenza", epiweeks = epirange(202001, 202319)) +``` + +### Private methods + +These require private access keys to use (separate from the Delphi Epidata API key). +To actually run these locally, you will need to store these secrets in your `.Reviron` file, or set them as environmental variables. + +#### CDC + +API docs: + +```{r, eval=FALSE} +pvt_cdc(auth = Sys.getenv("SECRET_API_AUTH_CDC"), epiweeks = epirange(202003, 202304), locations = "ma") +``` + +#### Dengue Digital Surveillance Sensors + +API docs: + +```{r, eval=FALSE} +pvt_dengue_sensors( + auth = Sys.getenv("SECRET_API_AUTH_SENSORS"), + names = "ght", + locations = "ag", + epiweeks = epirange(201404, 202004) +) +``` + +#### Google Health Trends + +API docs: + +```{r, eval=FALSE} +pvt_ght( + auth = Sys.getenv("SECRET_API_AUTH_GHT"), + epiweeks = epirange(199301, 202304), + locations = "ma", + query = "how to get over the flu" +) +``` + +#### NoroSTAT metadata + +API docs: + +```{r, eval=FALSE} +pvt_meta_norostat(auth = Sys.getenv("SECRET_API_AUTH_NOROSTAT")) +``` + +#### NoroSTAT data + +API docs: + +```{r, eval=FALSE} +pvt_norostat(auth = Sys.getenv("SECRET_API_AUTH_NOROSTAT"), locations = "1", epiweeks = 201233) +``` + +#### Quidel Influenza testing + +API docs: + +```{r, eval=FALSE} +pvt_quidel(auth = Sys.getenv("SECRET_API_AUTH_QUIDEL"), locations = "hhs1", epiweeks = epirange(200301, 202105)) +``` + +#### Sensors + +API docs: + +```{r, eval=FALSE} +pvt_sensors( + auth = Sys.getenv("SECRET_API_AUTH_SENSORS"), + names = "sar3", + locations = "nat", + epiweeks = epirange(200301, 202105) +) +``` + +#### Twitter + +API docs: + +```{r, eval=FALSE} +pvt_twitter( + auth = Sys.getenv("SECRET_API_AUTH_TWITTER"), + locations = "nat", + epiweeks = epirange(200301, 202105) +) +``` From 06adbcd71835fc85cb059e8f57e1e88174906a5a Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 13:54:05 -0500 Subject: [PATCH 10/19] clean up versioned data wording --- vignettes/epidatr.Rmd | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index c17caa15..ae7f1c07 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -148,9 +148,10 @@ knitr::kable(head(epidata)) ## Getting versioned data -The Epidata API stores a historical record of all data, including corrections and updates, which is particularly useful for accurately backtesting forecasting models. To fetch versioned data - -We can also request data for a single location at a time, via the `geo_values` argument. +The Epidata API stores a historical record of all data, including corrections +and updates, which is particularly useful for accurately backtesting +forecasting models. To fetch versioned data, we can use the `as_of` +argument. ```{r, eval = FALSE} # Obtain the smoothed covid-like illness (CLI) signal from the COVID-19 @@ -166,7 +167,7 @@ pub_covidcast( ) ``` -See `vignette("versioned-data")` for more information and more ways to specify versioned data. +See `vignette("versioned-data")` for details and more ways to specify versioned data. ## Plotting From 3872109a3445e05bddd37856982d972fb08bd497 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 13:59:59 -0500 Subject: [PATCH 11/19] pretty print tibbles in signal discovery vignette --- vignettes/signal-discovery.Rmd | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/vignettes/signal-discovery.Rmd b/vignettes/signal-discovery.Rmd index 6a3bd628..4df94ddc 100644 --- a/vignettes/signal-discovery.Rmd +++ b/vignettes/signal-discovery.Rmd @@ -29,6 +29,11 @@ for [other diseases](https://cmu-delphi.github.io/delphi-epidata/api/README.html The site also includes a search tool if you have a keyword (e.g. "Taiwan") in mind. +## Signal metadata + +...? + + ## Interactive tooling We provide a couple `epidatr` functions to help find data sources and signals. @@ -84,7 +89,10 @@ These signal objects can be used directly to fetch data, without requiring us to the `pub_covidcast()` function. Simply use the `$call` attribute of the object: ```{r} -covid_sources$signals$`fb-survey:smoothed_cli`$call("state", "pa", epirange(20210405, 20210410)) +epidata <- covid_sources$signals$`fb-survey:smoothed_cli`$call( + "state", "pa", epirange(20210405, 20210410) +) +knitr::kable(epidata) ``` From 06aed90cf7f0087ccd4f805e33fc5290f032c537 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 14:17:07 -0500 Subject: [PATCH 12/19] metadata blurb for signal discovery --- vignettes/signal-discovery.Rmd | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/vignettes/signal-discovery.Rmd b/vignettes/signal-discovery.Rmd index 4df94ddc..8dbe2d35 100644 --- a/vignettes/signal-discovery.Rmd +++ b/vignettes/signal-discovery.Rmd @@ -31,8 +31,14 @@ The site also includes a search tool if you have a keyword (e.g. "Taiwan") in mi ## Signal metadata -...? +Some endpoints have partner metadata available that, depending on +the endpoint, provides information about the signals that are available, what +time ranges they are available for, and when they have been updated. +```{r, echo = FALSE} +invisible(capture.output(endpts <- avail_endpoints())) +filter(endpts, endsWith(Endpoint, "_meta()")) %>% knitr::kable() +``` ## Interactive tooling From 12a6e637f931ed0d1c4eb89972c956df3dc7df53 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 14:20:43 -0500 Subject: [PATCH 13/19] silence us-only message --- vignettes/signal-discovery.Rmd | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/vignettes/signal-discovery.Rmd b/vignettes/signal-discovery.Rmd index 8dbe2d35..8cd7799c 100644 --- a/vignettes/signal-discovery.Rmd +++ b/vignettes/signal-discovery.Rmd @@ -36,7 +36,7 @@ the endpoint, provides information about the signals that are available, what time ranges they are available for, and when they have been updated. ```{r, echo = FALSE} -invisible(capture.output(endpts <- avail_endpoints())) +suppressMessages(invisible(capture.output(endpts <- avail_endpoints()))) filter(endpts, endsWith(Endpoint, "_meta()")) %>% knitr::kable() ``` @@ -54,7 +54,7 @@ avail_endpoints() ``` ```{r, echo = FALSE} -invisible(capture.output(endpts <- avail_endpoints())) +suppressMessages(invisible(capture.output(endpts <- avail_endpoints()))) knitr::kable(endpts) ``` From cab8caeb67043b77d2180e5e71ae338bcc28c0a9 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Fri, 8 Dec 2023 14:31:50 -0500 Subject: [PATCH 14/19] versioned data header --- vignettes/versioned-data.Rmd | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/vignettes/versioned-data.Rmd b/vignettes/versioned-data.Rmd index e69de29b..048ba858 100644 --- a/vignettes/versioned-data.Rmd +++ b/vignettes/versioned-data.Rmd @@ -0,0 +1,15 @@ +--- +title: "Understanding and accessing versioned data" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{Understanding and accessing versioned data} + %\VignetteEngine{knitr::rmarkdown} + \usepackage[utf8]{inputenc} +--- + +```{r, echo = FALSE, message = FALSE} +knitr::opts_chunk$set(collapse = TRUE, comment = "#>") +options(tibble.print_min = 4L, tibble.print_max = 4L, max.print = 4L) +library(epidatr) +library(dplyr) +``` From b6b0a6fb4a3d7b9e87b48f380470294f297357a1 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Mon, 11 Dec 2023 13:06:21 -0500 Subject: [PATCH 15/19] choropleth link and example --- vignettes/epidatr.Rmd | 53 ++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 50 insertions(+), 3 deletions(-) diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index ae7f1c07..ca9c6c16 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -1,8 +1,10 @@ --- -title: "Get started with `epidatr`" -output: rmarkdown::html_vignette +title: "Get started with epidatr" +output: + rmarkdown::html_vignette: + code_folding: show vignette: > - %\VignetteIndexEntry{Get started with `epidatr`} + %\VignetteIndexEntry{Get started with epidatr} %\VignetteEngine{knitr::rmarkdown} %\VignetteDepends{ggplot2} \usepackage[utf8]{inputenc} @@ -187,6 +189,51 @@ ggplot(epidata, aes(x = time_value, y = value)) + ) ``` +`ggplot2` can also be used to [create choropleths](https://r-graphics.org/recipe-miscgraph-choropleth). + + +```{r class.source = "fold-hide", out.height="65%"} +library(maps) + +# Obtain the most up-to-date version of the smoothed covid-like illness (CLI) +# signal from the COVID-19 Trends and Impact survey for all states on a single day +cli_states <- pub_covidcast( + source = "fb-survey", + signals = "smoothed_cli", + geo_type = "state", + time_type = "day", + geo_values = "*", + time_values = 20210410 +) + +# Get a mapping of states to longitude/latitude coordinates +states_map <- map_data("state") + +# Convert state abbreviations into state names +cli_states <- mutate( + cli_states, state = ifelse( + geo_value == "dc", + "district of columbia", + state.name[match(geo_value, tolower(state.abb))] %>% tolower() + ) +) + +# Add coordinates for each state +cli_states <- left_join(states_map, cli_states, by = c("region" = "state")) + +# Plot +ggplot(cli_states, aes(x = long, y = lat, group = group, fill = value)) + + geom_polygon(colour = "black", linewidth = 0.2) + + coord_map("polyconic") + + labs( + title = "Smoothed CLI from Facebook Survey", + subtitle = "All states, 2021-04-10", + x = "Longitude", + y = "Latitude" + ) +``` + + ## Finding locations of interest Most data is only available for the US. Select endpoints report other countries at the national and/or regional levels. Endpoint descriptions explicitly state when they cover non-US locations. From 53d820112fa6af8c10408504d7093c2d6ad3ab2b Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Mon, 11 Dec 2023 13:56:13 -0500 Subject: [PATCH 16/19] add content to versioned data vignette --- vignettes/versioned-data.Rmd | 134 +++++++++++++++++++++++++++++++++++ 1 file changed, 134 insertions(+) diff --git a/vignettes/versioned-data.Rmd b/vignettes/versioned-data.Rmd index 048ba858..70523e6f 100644 --- a/vignettes/versioned-data.Rmd +++ b/vignettes/versioned-data.Rmd @@ -13,3 +13,137 @@ options(tibble.print_min = 4L, tibble.print_max = 4L, max.print = 4L) library(epidatr) library(dplyr) ``` + + +The Epidata API records not just each signal's estimate for a given location +on a given day, but also *when* that estimate was made, and all updates to that +estimate. + +For example, let's look at the [doctor visits +signal](https://cmu-delphi.github.io/delphi-epidata/api/covidcast-signals/doctor-visits.html) +from the [`covidcast` endpoint](https://cmu-delphi.github.io/delphi-epidata/api/covidcast.html), +which estimates the percentage of outpatient doctor visits that are +COVID-related. Consider a result row with `time_value` 2020-05-01 for +`geo_values = "pa"`. This is an estimate for Pennsylvania on +May 1, 2020. That estimate was *issued* on May 5, 2020, the delay being due to +the aggregation of data by our source and the time taken by the Epidata API to +ingest the data provided. Later, the estimate for May 1st could be updated, +perhaps because additional visit data from May 1st arrived at our source and was +reported to us. This constitutes a new *issue* of the data. + + +### Data known "as of" a specific date + +By default, endpoint functions fetch the most recent issue available. This +is the best option for users who simply want to graph the latest data or +construct dashboards. But if we are interested in knowing *when* data was +reported, we can request specific data versions using the `as_of`, `issues`, or +`lag` arguments. + +_Note_ that these are mutually exclusive; only one can be specified +at a time. Also, not all endpoints support all three parameters, so please +check the documentation for that specific endpoint. + +First, we can request the data that was available *as of* a specific date, using +the `as_of` argument: + + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-01"), + geo_type = "state", + geo_values = "pa", + as_of = "2020-05-07" +) +knitr::kable(epidata) +``` + +This shows that an estimate of about 2.3% was issued on May 7. If we don't +specify `as_of`, we get the most recent estimate available: + + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-01"), + geo_type = "state", + geo_values = "pa" +) +knitr::kable(epidata) +``` + +Note the substantial change in the estimate, from less than 3% to almost 6%, +reflecting new data that became available after May 7 about visits *occurring on* +May 1. This illustrates the importance of issue date tracking, particularly +for forecasting tasks. To backtest a forecasting model on past data, it is +important to use the data that would have been available *at the time* the model +was or would have been fit, not data that arrived much later. + + +### Multiple issues of observations + +By using the `issues` argument, we can request all issues in a certain time +period: + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-01"), + geo_type = "state", + geo_values = "pa", + issues = epirange("2020-05-01", "2020-05-15") +) +knitr::kable(epidata) +``` + +This estimate was clearly updated many times as new data for May 1st arrived. + +Note that these results include only data issued or updated between +(inclusive) 2020-05-01 and 2020-05-15. If a value was first reported on +2020-04-15, and never updated, a query for issues between 2020-05-01 and +2020-05-15 will not include that value among its results. + + +### Observations issued with a specific lag + +Finally, we can use the `lag` argument to request only data reported with a +certain lag. For example, requesting a lag of 7 days fetches only data issued +exactly 7 days after the corresponding `time_value`: + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-07"), + geo_type = "state", + geo_values = "pa", + lag = 7 +) +knitr::kable(epidata) +``` + +Note that though this query requested all values between 2020-05-01 and +2020-05-07, May 3rd and May 4th were *not* included in the results set. This is +because the query will only include a result for May 3rd if a value were issued +on May 10th (a 7-day lag), but in fact the value was not updated on that day: + +```{r} +epidata <- pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-03", "2020-05-03"), + geo_type = "state", + geo_values = "pa", + issues = epirange("2020-05-09", "2020-05-15") +) +knitr::kable(epidata) +``` From 7b6b9bd14deaf68eb33dad7a1502ea2f1a35a219 Mon Sep 17 00:00:00 2001 From: nmdefries Date: Mon, 11 Dec 2023 18:57:44 +0000 Subject: [PATCH 17/19] style: styler (GHA) --- vignettes/epidatr.Rmd | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index ca9c6c16..33c560cf 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -211,7 +211,8 @@ states_map <- map_data("state") # Convert state abbreviations into state names cli_states <- mutate( - cli_states, state = ifelse( + cli_states, + state = ifelse( geo_value == "dc", "district of columbia", state.name[match(geo_value, tolower(state.abb))] %>% tolower() From 433b748f488a9e1d757e064fc0c26e86945162d9 Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Mon, 11 Dec 2023 16:44:00 -0500 Subject: [PATCH 18/19] suggest `maps` for vignettes --- DESCRIPTION | 1 + 1 file changed, 1 insertion(+) diff --git a/DESCRIPTION b/DESCRIPTION index 3e02fafe..d05d4186 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -44,6 +44,7 @@ Suggests: dplyr, ggplot2, knitr, + maps, rmarkdown, rlang, testthat (>= 3.1.5), From 04c57b72875df70bc9594e1f952aa871e08eb21a Mon Sep 17 00:00:00 2001 From: Nat DeFries <42820733+nmdefries@users.noreply.github.com> Date: Mon, 11 Dec 2023 16:53:27 -0500 Subject: [PATCH 19/19] add example showing use of list of dates with `issues` --- DESCRIPTION | 1 + vignettes/epidatr.Rmd | 2 +- vignettes/versioned-data.Rmd | 14 ++++++++++++++ 3 files changed, 16 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION b/DESCRIPTION index d05d4186..3e1fc47e 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -45,6 +45,7 @@ Suggests: ggplot2, knitr, maps, + mapproj, rmarkdown, rlang, testthat (>= 3.1.5), diff --git a/vignettes/epidatr.Rmd b/vignettes/epidatr.Rmd index 33c560cf..d96a2fdf 100644 --- a/vignettes/epidatr.Rmd +++ b/vignettes/epidatr.Rmd @@ -192,7 +192,7 @@ ggplot(epidata, aes(x = time_value, y = value)) + `ggplot2` can also be used to [create choropleths](https://r-graphics.org/recipe-miscgraph-choropleth). -```{r class.source = "fold-hide", out.height="65%"} +```{r, class.source = "fold-hide", out.height="65%"} library(maps) # Obtain the most up-to-date version of the smoothed covid-like illness (CLI) diff --git a/vignettes/versioned-data.Rmd b/vignettes/versioned-data.Rmd index 70523e6f..27271c6c 100644 --- a/vignettes/versioned-data.Rmd +++ b/vignettes/versioned-data.Rmd @@ -110,6 +110,20 @@ Note that these results include only data issued or updated between 2020-04-15, and never updated, a query for issues between 2020-05-01 and 2020-05-15 will not include that value among its results. +The `issues` parameter also accepts a list of dates. + +```{r, eval = FALSE} +pub_covidcast( + source = "doctor-visits", + signals = "smoothed_adj_cli", + time_type = "day", + time_values = epirange("2020-05-01", "2020-05-01"), + geo_type = "state", + geo_values = "pa", + issues = c("2020-05-07", "2020-05-09", "2020-05-15") +) +``` + ### Observations issued with a specific lag