-
Notifications
You must be signed in to change notification settings - Fork 8
more flexible specification of columns identifying units of analysis for epi_signal #12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yep, great idea, again @elray1. Thanks! Please check out my reply on #10, and see if you agree with where I ended up, then I can try to address the current suggestion in a refactor as well. One note I forgot to mention on #10, applies equally-well here. I think making the structure more flexible as in #10 and the current issue is going to force us to forgo inheriting from |
It looks like the |
Great! Thanks for checking that. So it looks the So One minor question before I attempt anything, which I should probably ask on #7, but just for convenience let me ask here. Should we keep the column naming scheme |
Yeah, I guess I have a weak preference to generally match what's being done for tsibble and avoid introducing new terms. |
This sounds good to me. So the function would then be named |
- As per the discussion on #7, I ditched the `value` column in an `epi_signal` object completely - Aside from `geo_value` and `time_value`, the other columns are arbitrary and can be treated as measured variables - This required refactoring `pct_change()`, `estimate_deriv()`, and `sliced_cor()` so that the user can specify the variables they want to act on - In doing so, I used tidy evaluation (so the user just passes variable names directly), and extended this to the `by` arg of `sliced_cor()` as well - This will dovetail nicely into allow arbitrary groupings later on (for sliding by more than just geo, for example, as per #12) - The other thing I did here (probably shouldn't have lumped it into the same commit though ...) was to broaden the allowed `time_type`s to include a new "day-time" type, coded as a `POSIXct` object (time on a given day, measured to the second) - I expanded `slide_by_geo()` so that the user can specify the definition of a time step; so, for example, one can slide over the last 7 hours (if `time_type` was "day-time"), or the last 7 minutes, or whatever - Rewrote a lot of the documentation, updated the vignettes
Another type of "unit" to consider: forecasting tasks. E.g., in the AR example, |
- `epi_signal` is now `epi_tibble`. As suggested by Logan on #14, it might not be a good idea to call it use "signal, since it can contain more than one signal column - Got rid of the grouping done internally by `slide_by_geo()`. Instead allow the user to do their own grouping ahead of time - This jells with the discussion in #12, and it seems more in line with standard tidyverse-style data wrangling - As a result, renamed the sliding function to `epi_slide()` - And had to define S3 methods for grouping and joining for our `epi_tibble` objects, so that the class and metadata wouldn't be dropped by these actions - Update documentation and vignettes
Just a heads up that I'm going to close this with #14. |
The main example I'm imagining supporting is flusurv hospitalizations, which report hospitalization rates for each combination of location and age group. For example, a user might want to be able to apply sliding functions to each series corresponding to one combination of these factors. I think supporting this could happen with two changes:
unit_ids
when creating anepi_signal
. This should be a character vector of column names in the data set, and would be stored in the metadata. This could default to"geo_value"
to match current behaviors. A user working with flusurv data might specifyunit_ids = c("geo_value", "age_group")
.slide_by_geo
toslide_by_unit
? And in that function change thegroup_by
to group by the id columns stored in the metadata.The text was updated successfully, but these errors were encountered: