Skip to content

Allow epix_slide to access version history if desired #259

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Mar 9, 2023

Conversation

nmdefries
Copy link
Contributor

@nmdefries nmdefries commented Jan 6, 2023

Implement all_versions = TRUE options for epix_slide and as_of with the goal of supporting revision-aware models.

Closes #49.

@nmdefries nmdefries force-pushed the ndefries/epix-slide-versions branch from 024bcf4 to c6a7141 Compare January 23, 2023 20:53
@nmdefries nmdefries changed the base branch from dev to lcb/grouped_epi_archive January 23, 2023 20:54
@nmdefries nmdefries changed the title [wip] epix slide over all versions Allow epix_slide to access version history if desired Jan 23, 2023
@nmdefries nmdefries marked this pull request as ready for review January 23, 2023 21:08
@dajmcdon
Copy link
Contributor

I don't really have any business reviewing the details of the code changes here. But please add some tests to illustrate/confirm the desired performance.

When `x` is a grouped `epi_archive`, it must be cloned and ungrouped so
basic checks on `max_version` can be done. The cloning step is expensive
and is unnecessary when the call is invalid or when the archive is not
grouped. Separate the fn calls to simplify logic and avoid the
sometimes-unecessary `clone`.
@brookslogan
Copy link
Contributor

Status:

  • Nat & I are discussing some design decisions about how things should work and might want to get a wider group's feelings.

In the meantime:

  • Nat, you should add yourself as at least as a contributor in the DESCRIPTION file.
  • suggestion: epix_truncate_versions code duplication looks avoidable, e.g., by having the grouped_epi_archive method do
result = x$clone()
result$ungrouped <- epix_truncate_versions(result$ungrouped, .......)
return (result)

and maybe there's even a way to eliminate some awkward cloning by doing the right structure of whether the S3 method calls the R6 method or vice versa, not sure. But maybe we don't need to be too elegant here because we hope to get rid of the R6 stuff altogether sometime

  • Thanks for the .group_key fixup; wish we had a more different name for the key of the archive (as well as for the non-version non-time parts of the key --- maybe we could call this the "epikey"). Please share any ideas here. I think dplyr does call this the group key, unfortunate.

@nmdefries
Copy link
Contributor Author

Thanks for the epix_truncate_versions_after.grouped_epi_archive suggestion, that looks a lot better! (Had to use the ungroup()/group_by() here because of the function context but used the general approach).

@nmdefries
Copy link
Contributor Author

wish we had a more different name for the key of the archive (as well as for the non-version non-time parts of the key --- maybe we could call this the "epikey")

You're proposing:

  • key = version, time value, and epikey
  • epikey = geo value, and other user-specified fields (maybe age group, race, etc)

?

epikey fits in well with the general naming schema. It's fairly evocative (e.g. a normal epidemiological grouping might be geo value + age group, which we could well imagine in this context), and much much shorter than the current "non-time, non-version keys" phrase that the package docs frequently use. Seems convenient.

Also tweak the `grouped_epi_archive` variant to avoid regrouping, taking
advantage of the availability of `private$ungrouped`.
Before: `epix_slide` with `all_versions=TRUE` provides the user computation an
  `epi_archive` with its `DT` set to a tibble, but is expected to be a
  data.table by `epi_archive` operations.

After: the `$DT`s are now set to `data.table`s.

Plus: add (nonthorough) tests that
* the `$DT` class & key are properly set.
* we can use `data.table`-dependent `epi_archive` methods in the user
  computations.
@brookslogan
Copy link
Contributor

brookslogan commented Feb 24, 2023

Hi Nat, sorry for the long delay. I've pushed a couple of changes:

  • Added R6 analogues for these methods, to keep things consistent until we can overhaul and remove R6 altogether (and maybe data.table as well).
  • Fixed $DT in the computation inputs being a tibble rather than a data.table.

I think we're still missing:

  • something in @examples with all_versions=TRUE; maybe the test I added with f2 could be a starting point. A more common/intuitive use case would actually have f2 be a nowcaster (rather than evaluating a particular lag as a potential nowcaster). However, one/both of these might be too much code for @examples.
  • a test for behavior with group_by(geo_value) --- I've started on this locally.
  • returning a grouped tibble out of grouped archives [Not possible how I was thinking. We already return grouped tibbles from nontrivially-grouped archives. However, we can't get the same for trivially-grouped archives, as grouped_dfs automatically decay if they have zero grouping variables.]
  • a vignette showing this off --- but let's handle this in another Issue/PR; I want to see if we can combine it with some other vignette ideas.
  • [NEWS.md entry]

lcbrooks added 6 commits March 8, 2023 14:36
* Fix partial rename of variable `as_of_tbl` -> `as_of_df` in `epix_slide`
  implementation.
* Fix some bad links:
  * Fix transposition of link text and destination
  * Fix link text containing formatting, which is not allowed
* Re-`document()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants