Adjust `other_keys` or decay to tibble when selecting, renaming columns #192

brookslogan · 2022-08-08T16:10:50Z

Taken from discussion on #185.

## pak::pkg_install("cmu-delphi/epiprocess@9259796dafa159c8139f1f2eef619e1405c778ed")
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(epiprocess)
#> 
#> Attaching package: 'epiprocess'
#> The following object is masked from 'package:stats':
#> 
#>     filter
ex1_input <- tibble::tibble(
  geo_value = rep(c("ca", "fl", "pa"), each = 3),
  county_code = c("06059","06061","06067",
                  "12111","12113","12117",
                  "42101", "42103","42105"),
  time_value = rep(seq(as.Date("2020-06-01"), as.Date("2020-06-03"),
                       by = "day"), length.out = length(geo_value)),
  value = 1:length(geo_value) + 0.01 * rnorm(length(geo_value))
) %>% 
  tsibble::as_tsibble(index = time_value, key = c(geo_value, county_code))
ex1 <- as_epi_df(x = ex1_input, geo_type = "state", time_type = "day", as_of = "2020-06-03")
# NOTE: cols are re-ordered so 3rd column is `county_code`
# ex1[-3] # (still an `epi_df`, good(? see unique key discussion))
ex1[-3] %>% attr("metadata") %>% .$other_keys
#> [1] "county_code"
# (bad)

^{Created on 2022-08-08 by the reprex package (v2.0.1)}

Unlike geo_value and time_value, we can potentially drop a key column from other_keys and still get a sensible epi_df. This epi_df should not have other_keys that refer to nonexistent columns, though; any key columns that have been selected out from the table data should also be removed from other_keys. A more difficult question is what to do if removing this key column has taken us from a valid unique key to a nonunique key; we have other checks in [ designed to decay to tibble in those cases, but are lax about enforcing a unique key elsewhere, including in epi_df construction.

Additionally, we should consider adding names<- and dimnames<- implementations in order to update other_keys metadata when other_keys are renamed, or decay to tibble when {geo,time}_value are renamed.

Marking P2 as induced bugs probably will trigger errors rather than incorrect behavior, and I don't know if it's causing common issues downstream in epipredict or not.

The text was updated successfully, but these errors were encountered:

brookslogan added bug Something isn't working P2 low priority labels Aug 8, 2022

brookslogan mentioned this issue Aug 8, 2022

Abort or decay to tibble if epi_df column duplication or renaming invalidates key #194

Closed

rachlobay mentioned this issue Aug 8, 2022

Create a rename() function that is more specialized for an epi_df #195

Open

rachlobay self-assigned this Aug 8, 2022

rachlobay mentioned this issue Aug 8, 2022

Addressed issues 192-194 #196

Merged

rachlobay linked a pull request Aug 9, 2022 that will close this issue

Addressed issues 192-194 #196

Merged

rachlobay closed this as completed in #196 Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adjust `other_keys` or decay to tibble when selecting, renaming columns #192

Adjust `other_keys` or decay to tibble when selecting, renaming columns #192

brookslogan commented Aug 8, 2022 •

edited

Loading

Adjust other_keys or decay to tibble when selecting, renaming columns #192

Adjust other_keys or decay to tibble when selecting, renaming columns #192

Comments

brookslogan commented Aug 8, 2022 • edited Loading

Adjust `other_keys` or decay to tibble when selecting, renaming columns #192

Adjust `other_keys` or decay to tibble when selecting, renaming columns #192

brookslogan commented Aug 8, 2022 •

edited

Loading