more flexible specification of columns identifying units of analysis for epi_signal #12

elray1 · 2021-10-20T13:16:33Z

The main example I'm imagining supporting is flusurv hospitalizations, which report hospitalization rates for each combination of location and age group. For example, a user might want to be able to apply sliding functions to each series corresponding to one combination of these factors. I think supporting this could happen with two changes:

Allow users to specify something like unit_ids when creating an epi_signal. This should be a character vector of column names in the data set, and would be stored in the metadata. This could default to "geo_value" to match current behaviors. A user working with flusurv data might specify unit_ids = c("geo_value", "age_group").
Maybe rename slide_by_geo to slide_by_unit? And in that function change the group_by to group by the id columns stored in the metadata.

The text was updated successfully, but these errors were encountered:

ryantibs · 2021-10-20T13:21:50Z

Yep, great idea, again @elray1. Thanks! Please check out my reply on #10, and see if you agree with where I ended up, then I can try to address the current suggestion in a refactor as well.

One note I forgot to mention on #10, applies equally-well here. I think making the structure more flexible as in #10 and the current issue is going to force us to forgo inheriting from tsibble, see #7. At least in the most general case (with multiple columns representing signals and units). I think tsibble has a single key and index. But I haven't read about it in detail. Maybe you could check out tsibble and see if you agree that it'd be too rigid? Would be good to know that we'd be forgoing that if we do indeed refactor as suggested here and on #10.

elray1 · 2021-10-20T13:30:22Z

It looks like the key in tsibble can support multiple columns, so I think it should work ok. Maybe we should use the key terminology to match what's already done in tsibble rather than the unit terminology i suggested in this issue.

ryantibs · 2021-10-20T13:55:04Z

Great! Thanks for checking that. So it looks the index column in tsibble must remain a single column (time_value for us) but I guess that's not a big deal.

So tsibble offers the flexibility we need then. And then it looks like then a single big refactor can be used to address #12, #10, and #7 all at once.

One minor question before I attempt anything, which I should probably ask on #7, but just for convenience let me ask here. Should we keep the column naming scheme index and key to be exactly as in tsibble? Meaning, no time_value and geo_value named columns?

elray1 · 2021-10-20T14:39:16Z

Yeah, I guess I have a weak preference to generally match what's being done for tsibble and avoid introducing new terms.

jacobbien · 2021-10-20T15:50:36Z

This sounds good to me. So the function would then be named slide_by_key?

- As per the discussion on #7, I ditched the `value` column in an `epi_signal` object completely - Aside from `geo_value` and `time_value`, the other columns are arbitrary and can be treated as measured variables - This required refactoring `pct_change()`, `estimate_deriv()`, and `sliced_cor()` so that the user can specify the variables they want to act on - In doing so, I used tidy evaluation (so the user just passes variable names directly), and extended this to the `by` arg of `sliced_cor()` as well - This will dovetail nicely into allow arbitrary groupings later on (for sliding by more than just geo, for example, as per #12) - The other thing I did here (probably shouldn't have lumped it into the same commit though ...) was to broaden the allowed `time_type`s to include a new "day-time" type, coded as a `POSIXct` object (time on a given day, measured to the second) - I expanded `slide_by_geo()` so that the user can specify the definition of a time step; so, for example, one can slide over the last 7 hours (if `time_type` was "day-time"), or the last 7 minutes, or whatever - Rewrote a lot of the documentation, updated the vignettes

brookslogan · 2021-10-26T17:15:00Z

Another type of "unit" to consider: forecasting tasks. E.g., in the AR example, ahead should be another part of the key; if we tried to use slide_by_geo on rbound results for multiple aheads, I would expect it to break / give invalid results.

- `epi_signal` is now `epi_tibble`. As suggested by Logan on #14, it might not be a good idea to call it use "signal, since it can contain more than one signal column - Got rid of the grouping done internally by `slide_by_geo()`. Instead allow the user to do their own grouping ahead of time - This jells with the discussion in #12, and it seems more in line with standard tidyverse-style data wrangling - As a result, renamed the sliding function to `epi_slide()` - And had to define S3 methods for grouping and joining for our `epi_tibble` objects, so that the class and metadata wouldn't be dropped by these actions - Update documentation and vignettes

ryantibs · 2021-10-28T15:01:19Z

Just a heads up that I'm going to close this with #14.

This was referenced Oct 21, 2021

Should epi_signal inherit from tsibble? #7

Closed

Big redesign #14

Merged

ryantibs closed this as completed Oct 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more flexible specification of columns identifying units of analysis for epi_signal #12

more flexible specification of columns identifying units of analysis for epi_signal #12

elray1 commented Oct 20, 2021

ryantibs commented Oct 20, 2021

elray1 commented Oct 20, 2021 •

edited

Loading

ryantibs commented Oct 20, 2021

elray1 commented Oct 20, 2021

jacobbien commented Oct 20, 2021

brookslogan commented Oct 26, 2021

ryantibs commented Oct 28, 2021

more flexible specification of columns identifying units of analysis for epi_signal #12

more flexible specification of columns identifying units of analysis for epi_signal #12

Comments

elray1 commented Oct 20, 2021

ryantibs commented Oct 20, 2021

elray1 commented Oct 20, 2021 • edited Loading

ryantibs commented Oct 20, 2021

elray1 commented Oct 20, 2021

jacobbien commented Oct 20, 2021

brookslogan commented Oct 26, 2021

ryantibs commented Oct 28, 2021

elray1 commented Oct 20, 2021 •

edited

Loading