Big redesign #14

ryantibs · 2021-10-21T01:06:02Z

Attempting to address many issues at once: #7 #8 #10 #12.

- As per the discussion on #8, from now on, an `epi_signal` object represents a single snapshot of a data set - Modify `as.epi_signal()` to reflect this change - Update documentation and vignettes accordingly

ryantibs · 2021-10-25T04:00:39Z

It's worth noting that I decided to forgo the tsibble inheritance as in #8. I started making the switch this morning, but then decided this may not be a good idea (and it'd slow me down a bit right now). I explain in this comment.

- As per the discussion on #7, I ditched the `value` column in an `epi_signal` object completely - Aside from `geo_value` and `time_value`, the other columns are arbitrary and can be treated as measured variables - This required refactoring `pct_change()`, `estimate_deriv()`, and `sliced_cor()` so that the user can specify the variables they want to act on - In doing so, I used tidy evaluation (so the user just passes variable names directly), and extended this to the `by` arg of `sliced_cor()` as well - This will dovetail nicely into allow arbitrary groupings later on (for sliding by more than just geo, for example, as per #12) - The other thing I did here (probably shouldn't have lumped it into the same commit though ...) was to broaden the allowed `time_type`s to include a new "day-time" type, coded as a `POSIXct` object (time on a given day, measured to the second) - I expanded `slide_by_geo()` so that the user can specify the definition of a time step; so, for example, one can slide over the last 7 hours (if `time_type` was "day-time"), or the last 7 minutes, or whatever - Rewrote a lot of the documentation, updated the vignettes

- This was basically "for free" implementation-wise; just needed to update needed the documentation to reflect this - Rename to `grouped_cor()` from `sliced_cor()`; update all uses of the word "slice" in documentation and vignettes accordingly

ryantibs · 2021-10-26T14:05:36Z

Just a note to myself or anybody else reading this: the last 2 commits were just me being silly with rebasing. I was trying to fix some typos in the commit messages, but clearly I need more practice with git rebase.

- `epi_signal` is now `epi_tibble`. As suggested by Logan on #14, it might not be a good idea to call it use "signal, since it can contain more than one signal column - Got rid of the grouping done internally by `slide_by_geo()`. Instead allow the user to do their own grouping ahead of time - This jells with the discussion in #12, and it seems more in line with standard tidyverse-style data wrangling - As a result, renamed the sliding function to `epi_slide()` - And had to define S3 methods for grouping and joining for our `epi_tibble` objects, so that the class and metadata wouldn't be dropped by these actions - Update documentation and vignettes

- Create scaffolding for new vignettes on time and geo aggregation and on issue dates and `epi_archive` objects - My notes on these vignettes also contains ideas about what to implement - While I'm here, get rid of all join methods for the `epi_tibble` class. This was flawed because it's not clear how to handle the metadata - Rewrite the correlations vignette accordingly - Introduce an `ungroup()` method for the `epi_tibble` class, and fix a few small things to do with class and metadata assignment

ryantibs · 2021-10-27T03:12:05Z

@brookslogan @elray1 @dajmcdon @jacobbien @capnrefsmmat I'm about done with this now.

The sequence of commits on this PR is not as clean as it should have been (too many big commits, and my rebasing attempts only clogged the presentation above). Oops, sorry ... next time I will remember to do it better.

So because the commits are too big and may be unwieldy to scan, here's what I did as a summary.

Renaming

Renamed a bunch of things (with explanation below as to why).
- epi_signal --> epi_tibble
- slide_by_geo() --> epi_slide()
- sliced_cor() --> group_cor()

Handling of signal variables, groupings

Removed the value from an epi_tibble object, allowing for an arbitrary number of signal columns, as discussed on Think about how to handle multiple signals #10. As raised there, given a change to allowing multiple signal columns, it would make sense to move away from "signal" in the name of the object, hence the epi_tibble name.
Note that there is currently no special designation as to something being a signal column or not. This is my current preference is to make the data structure very lightweight: just geo_value, time_value, and then everything else. "Everything else" could be something you want to group on, like age_group, or something you want to do computations on, like covid_cases. But all functionality downstream allows you to flexibly handle this in a way that I think is easy.
Refactored all functions pct_change(), estimate_deriv(), grouped_cor(), so that you specify the variable(s) you would like to be operating on (since there is no one designated signal column).
Accordingly, grouped_cor() now takes a single epi_tibble object, and allows you to correlate any two specified columns.
I removed groupings from within the slide function, and hence renamed it to epi_slide(). It now works by having the user call group_by() first, as in:
```
 x %>% 
  group_by(geo_value, age_group) %>%
  epi_slide(...)
```
This should address more flexible specification of columns identifying units of analysis for epi_signal #12. Of course, you can do the same thing with pct_change() and estimate_deriv().
In order to pull this off, I had to write S3 methods for group_by() and ungroup() for the epi_tibble (since these would otherwise drop the class and attributes). This was a great idea that Alex had. Except the first time I wrote it I got some infinite recursion that ended up with C stack errors and it took me a while to figure out how to get past this. So, thank you, Alex.
For grouped_cor(), I left the grouping inside, because there's some annoying stuff that would need to be done each time if it were pulled out. (Namely, regardless of how you group for the correlations, you always want to group by geo value before you compute lags or leads, and then ungroup, to prep for the whatever grouping you want to do for the correlation). Actually, now, you can do arbitrary groupings with the by argument (not only time or geo) in grouped_cor().
Note that epi_slide() (and the functions pct_change(), estimate_deriv() based on it) return an epi_tibble object; but grouped_cor() does not. I think this makes sense; the latter wouldn't necessarily have to be an epi_tibble, depending on how you grouped, you could have gotten rid of time of space variables completely (or both, if you didn't group at all).

Issues

As discussed on Distinguish snapshot of signal vs. archive of signal, and as-of vs. issue #8, I removed the issue column from the epi_tibble object. This now is in the metadata. It generally represents the latest issue among the issues of all individual observations.
I wrote an outline for a new vignette on issues and archive objects that describes what I want us to implement for the epi_archive object and the functionality as_of() that we provide to derive an epi_tibble from it, and functionality to slide over it.
I want us to proceed in a way that's very simple on the backend implementation at the moment (for example, an epi_archive is just an epi_signal with an issue column) but just gets the interface right for as_of(), epi_slide(), and so on. That can be on the "next" PR.
Then Logan can implement some fancy backend stuff that's much faster on the "next next" PR.

Tsibble

I decided not to go inheriting from tsibble at the moment, as I explained in [this comment[(https://github.com/Should epi_signal inherit from tsibble? #7#issuecomment-950352984) towards the bottom of Should epi_signal inherit from tsibble? #7.
However, I do think we should just cast to tsibble on-the-fly and use a lot of the cool functionality there, as we need it. I created an outline for a vignette to demo this.
In some future PR, we can also implement some analogous geo aggregation utilities as Jacob described on Should epi_signal inherit from tsibble? #7, and then add to this vignette the complementary workflow in the geo dimension.

ryantibs · 2021-10-27T03:15:18Z

Well, I'm a bit of an idiot. I just closed this PR, by clicking "close with comment" instead of "comment". Luckily I could open it back up easily!

Anyway, I am planning on merging this soon. I've rewritten all the documentation and vignettes and checked that it all works.

I'll give you guys a little bit of time to chime in if you'd like ... but since nobody is really using this package yet, and we're still in the early stages, I feel like merging without an official review process is still OK. Probably soon though, we should implement some tests, and do things more properly.

- Changing what we highlight about what this function does; also, moving "cor" to the front of the name - While I'm here, I snuck in a fix to the `NA` problem raised by Logan on #17. Now `pct_change()` and `estimate_deriv()` take as an argument `na_rm` with default value `TRUE`

ryantibs · 2021-10-28T15:15:29Z

I just snuck in a few more fixes. And, I got the thumbs up on slack from a bunch of you. So I'm going to go ahead and merge this, which will close #7 #8 #10 #12 #17.

Move issue from column to metadata

1b001cd

- As per the discussion on #8, from now on, an `epi_signal` object represents a single snapshot of a data set - Modify `as.epi_signal()` to reflect this change - Update documentation and vignettes accordingly

ryantibs mentioned this pull request Oct 25, 2021

Think about how to handle multiple signals #10

Closed

ryantibs force-pushed the big-redesign branch from f1f5967 to 7e93071 Compare October 26, 2021 13:59

Go back to new_col_xyz arg names

78527ae

ryantibs force-pushed the big-redesign branch from 7e93071 to 78527ae Compare October 26, 2021 14:01

ryantibs mentioned this pull request Oct 27, 2021

Add data wrangling example #3

Closed

ryantibs closed this Oct 27, 2021

ryantibs reopened this Oct 27, 2021

ryantibs marked this pull request as ready for review October 27, 2021 03:15

This was referenced Oct 27, 2021

Change treatment of NAs and odd n in pct_change #17

Closed

Add data wrangling functionality #1

Closed

This was referenced Oct 28, 2021

more flexible specification of columns identifying units of analysis for epi_signal #12

Closed

Distinguish snapshot of signal vs. archive of signal, and as-of vs. issue #8

Closed

Should epi_signal inherit from tsibble? #7

Closed

ryantibs merged commit d8fe095 into main Oct 28, 2021

elray1 mentioned this pull request Nov 11, 2021

validation for correct grouping structure #26

Closed

brookslogan mentioned this pull request Nov 15, 2021

Consider tidyselect in epi_slide() #18

Closed

ryantibs deleted the big-redesign branch January 5, 2022 19:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Big redesign #14

Big redesign #14

ryantibs commented Oct 21, 2021

ryantibs commented Oct 25, 2021

ryantibs commented Oct 26, 2021 •

edited

Loading

ryantibs commented Oct 27, 2021

ryantibs commented Oct 27, 2021

ryantibs commented Oct 28, 2021

Big redesign #14

Big redesign #14

Conversation

ryantibs commented Oct 21, 2021

ryantibs commented Oct 25, 2021

ryantibs commented Oct 26, 2021 • edited Loading

ryantibs commented Oct 27, 2021

Renaming

Handling of signal variables, groupings

Issues

Tsibble

ryantibs commented Oct 27, 2021

ryantibs commented Oct 28, 2021

ryantibs commented Oct 26, 2021 •

edited

Loading