Skip to content

Demonstrate time aggregation in vignette #24

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jacobbien opened this issue Nov 9, 2021 · 5 comments
Closed

Demonstrate time aggregation in vignette #24

jacobbien opened this issue Nov 9, 2021 · 5 comments
Assignees

Comments

@jacobbien
Copy link
Contributor

jacobbien commented Nov 9, 2021

In the first half of the "aggregation.Rmd" vignette, we should include some examples of how to perform time aggregation. In particular, the approach to demo is casting to tsibble on-the-fly and then leveraging the handy functions there.

A particularly useful example would involve adding aggregation to the epiweek level:

library(tsibble)
dat %>%
  as_tsibble(index = time_value, key = geo_value) %>%
  index_by(Epiweek = ~ epiweek(.)) %>%
  summarize(num_cases = sum(Cases))

Here the function epiweek function would have to be defined (perhaps based on the MMWRweek R package).

(For more context on this issue, see #7)

@ryantibs
Copy link
Member

Thanks @jacobbien. Wondering what @earowang thinks?

We're currently thinking that casting to tsibble on-the-fly (which, as Jacob mentions, we'll demo in the first half of this vignette) is a good strategy to access all the tsibble utilities, without worrying about compatibility in all cases, up-front. That is, it puts the onus on the user to define the index variable carefully when they want to use tsibble utilities, and not on us (packages designers) in general. In general, it seems like how the index variable gets updated across various sliding and pivoting operations could get complicated.

Do you see a downside to this approach, or have different perspectives on its pros & cons?

@earowang
Copy link

Coercing to tsibble as needful definitely works.

  1. Do index = time_value and key = geo_value hold all the time? If so, users don't need to pass these parameters while coercing, by defining as_tsibble.epi_tibble(x, ...).
dat %>% 
  as_tsibble() %>% # less learning
  index_by()
  1. {lubridate} also provides epiweek(), although not sure if they refer to the same epi weeks.

@qpmnguyen
Copy link
Contributor

@ryantibs @jacobbien happy to take up writing the vignette for this issue if it's available.

@ryantibs
Copy link
Member

ryantibs commented Jan 30, 2022

@qpmnguyen Thanks! Please go for it.

I like Earo's idea of defining as_tsibble.epi_df() with the defaults being index = time_value and key = geo_value. But the user can override this and set a key based on multiple variables, if they want.

I also think you could consider demonstratinging tsibble's functionality for detecting and filling gaps in the time series (either with NAs or with LOCF). Thanks again for volunteering.

@ryantibs
Copy link
Member

ryantibs commented Feb 9, 2022

Closed by #37 #38.

@ryantibs ryantibs closed this as completed Feb 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants