Skip to content

Add checks & test for additional_metadata format in epi_df #182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 tasks done
mgyliu opened this issue Aug 2, 2022 · 1 comment
Closed
2 tasks done

Add checks & test for additional_metadata format in epi_df #182

mgyliu opened this issue Aug 2, 2022 · 1 comment
Assignees
Labels
bug Something isn't working documentation Improvements or additions to documentation

Comments

@mgyliu
Copy link
Contributor

mgyliu commented Aug 2, 2022

Action items

  • Update examples in as_epi-df so that additional_metadata is always a list() type.
  • Add example in as_epi_df with multiple other_keys

Related to issue cmu-delphi/epipredict#114

The problem

other_keys gets stored differently in epi_df if you initialize it as a vector vs. a list. The epi_df constructor expects additional_metadata to be a list but passing a vector still "works" (i.e., no error). Examples are shown below for each case. The examples are taken from the as_epi-df reference.

library(dplyr)
library(recipes)
library(epiprocess)

Example 1

ex1_input <- tibble::tibble(
  geo_value = rep(c("ca", "fl", "pa"), each = 2),
  county_code = c("06059","06061","06067",
                  "12111","12113","12117"),
  another_key = 1:6, # <- I added this additional key 
  time_value = rep(seq(as.Date("2020-06-01"), as.Date("2020-06-03"),
                       by = "day"), length.out = length(geo_value)),
  value = 1:length(geo_value) + 0.01 * rnorm(length(geo_value))) %>% 
  tsibble::as_tsibble(
    index = time_value, 
    key = c(geo_value, county_code, another_key))

# The `other_keys` metadata (`"county_code"` in this case) is automatically
# inferred from the `tsibble`'s `key`:
ex1 <- as_epi_df(x = ex1_input, geo_type = "state", time_type = "day", as_of = "2020-06-03")
attr(ex1,"metadata")

Output:

$geo_type
[1] "state"

$time_type
[1] "day"

$as_of
[1] "2020-06-03"

$other_keys
[1] "county_code" "another_key"

Example 2 (but ex3 from the as_epi_df documentation so we'll keep the names)

ex3_input <- jhu_csse_county_level_subset %>%
  dplyr::filter(time_value > "2021-12-01", state_name == "Massachusetts") %>%
  dplyr::slice_tail(n = 6) %>% 
  tsibble::as_tsibble() %>% # needed to add the additional metadata
  dplyr::mutate(state = rep("MA",6)) %>%
  dplyr::mutate(pol = rep(c("blue", "swing", "swing"), each = 2)) # extra key

# Note: additional_metadata is vector, not list
ex3 <- ex3_input %>%  as_epi_df(additional_metadata = c(other_keys = c("state", "pol")))
attr(ex3,"metadata")

Output:

$geo_type
[1] "county"

$time_type
[1] "day"

$as_of
[1] "2022-08-02 15:31:30 PDT"

$other_keys1
[1] "state"

$other_keys2
[1] "pol"
@mgyliu mgyliu self-assigned this Aug 2, 2022
@mgyliu mgyliu added bug Something isn't working documentation Improvements or additions to documentation labels Aug 2, 2022
@mgyliu
Copy link
Contributor Author

mgyliu commented Aug 5, 2022

Closed via. PR #183

@mgyliu mgyliu closed this as completed Aug 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

1 participant