Skip to content

Pass ref time value and group key to epix_slide for tidy computations #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
8ee084f
reference group, ref time by changing quo env
nmdefries May 5, 2023
267c441
set special arg names in func definition
nmdefries May 15, 2023
a3b31bd
import env
nmdefries May 15, 2023
d0fc5b0
describe dots var access
nmdefries May 15, 2023
d0374a1
basic way to calculate ref_time_value in epi_slide
nmdefries May 18, 2023
f2f3436
support using ref_time_value in epi_slide computations
nmdefries May 18, 2023
d1b5d72
drop .real, clean up names
nmdefries May 19, 2023
a95b2a8
import bind_rows
nmdefries May 19, 2023
05c311f
tests
nmdefries May 19, 2023
cd4c382
add empty rows for all group var combos
nmdefries May 23, 2023
51b13a6
add dot prefix to dots arg names
nmdefries May 23, 2023
8bfb4d8
add support for and test sliding on ungrouped epi_dfs
nmdefries May 23, 2023
5564c0d
speed up slow filter and select steps
nmdefries May 24, 2023
22b9b69
tests expect epi_df output
nmdefries May 24, 2023
0933801
In `epix_slide` tidyeval, use custom data mask rather than quo env
lcbrooks Jun 1, 2023
4410a39
Merge branch 'ndefries/epix-slide-pass-reftimevalue-without-env' into…
nmdefries Jun 5, 2023
e644624
import env
nmdefries Jun 5, 2023
feb65b3
some tests expect a tibble
nmdefries Jun 5, 2023
620648e
add placeholders for .x, .group_key, and .rtv
nmdefries Jun 5, 2023
e9580d3
test helper function behavior
nmdefries Jun 5, 2023
b7d7c53
test reftimevalue access on ungrouped epidf
nmdefries Jun 6, 2023
4aa59cb
Merge branch 'ndefries/epix-slide-dots-rtv' into ndefries/epi-slide-rtv
nmdefries Jun 6, 2023
e98e03d
expect output date col as date not double
nmdefries Jun 6, 2023
fed6f11
filter result by .real when all_rows == TRUE
nmdefries Jun 6, 2023
9ccfe91
keep .real col from being accessible from user f computation
nmdefries Jun 6, 2023
307fb6e
use custom data mask rather than quo env to match `epix_slide` tidyev…
nmdefries Jun 7, 2023
0b48ea4
test helper function behavior
nmdefries Jun 7, 2023
9db1165
rename inrange... for phony helper dates
nmdefries Jun 7, 2023
db00432
ci: set DELPHI_EPIDATA_KEY, from a GitHub Secret
lcbrooks Jun 9, 2023
f79bc08
ci: annotate actions explaining differences from usethis output
lcbrooks Jun 9, 2023
aeb9a25
cleanup: add renv-related entries to .Rbuildignore
lcbrooks Jun 9, 2023
d8a95c7
Merge pull request #313 from cmu-delphi/ndefries/epix-slide-pass-reft…
brookslogan Jun 9, 2023
1d26905
ci: note more edits in comments for GitHub Actions
lcbrooks Jun 9, 2023
572f6e6
Merge pull request #328 from cmu-delphi/lcb/use-api-key
brookslogan Jun 9, 2023
a4c19f7
Install `.x` etc. into data masks in same way pronouns are
brookslogan Jun 15, 2023
512f08a
Simplify&flatten epi_slide data mask implementation
brookslogan Jun 16, 2023
d22b88b
Describe our pronoun-like objects in `epi_slide` `...` docs
brookslogan Jun 16, 2023
39f4d2d
Merge pull request #318 from cmu-delphi/ndefries/epi-slide-rtv
brookslogan Jun 16, 2023
e7c4b8d
Rework NEWS.md about `.ref_time_value` and almost-pronouns available
brookslogan Jun 16, 2023
6039646
Merge remote-tracking branch 'upstream/dev' into ndefries/epix-slide-…
brookslogan Jun 16, 2023
e047a4a
Fix grammar in `epix_slide` `...` docs
brookslogan Jun 16, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@ importFrom(rlang,caller_env)
importFrom(rlang,check_dots_empty0)
importFrom(rlang,enquo)
importFrom(rlang,enquos)
importFrom(rlang,env)
importFrom(rlang,eval_tidy)
importFrom(rlang,f_env)
importFrom(rlang,f_rhs)
Expand All @@ -110,6 +111,7 @@ importFrom(rlang,pairlist2)
importFrom(rlang,quo_get_env)
importFrom(rlang,quo_get_expr)
importFrom(rlang,quo_is_missing)
importFrom(rlang,quo_set_env)
importFrom(rlang,sym)
importFrom(rlang,syms)
importFrom(stats,cor)
Expand Down
6 changes: 5 additions & 1 deletion R/grouped_epi_archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,7 @@ grouped_epi_archive =
#' details.
#' @importFrom data.table key address
#' @importFrom rlang !! !!! enquo quo_is_missing enquos is_quosure sym syms
#' quo_set_env env
slide = function(f, ..., before, ref_time_values,
time_step, new_col_name = "slide_value",
as_list_col = FALSE, names_sep = "_",
Expand Down Expand Up @@ -437,7 +438,10 @@ grouped_epi_archive =
}

quo = quos[[1]]
f = function(x, quo, ...) rlang::eval_tidy(quo, x)
f = function(.x, .group_key, .ref_time_value, quo, ...) {
quo = quo_set_env(quo, env())
rlang::eval_tidy(quo, .x)
}
Copy link
Contributor

@brookslogan brookslogan Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: ([maybe] shared by the old code [or maybe not... keep getting tricked by eval_tidy default env for exprs when using quosures]) env() uses the function execution environment [by default], which handles making .x, .group_key, and .ref_time_value available, but also:

  • probably makes quo available
  • messes with some helper functions:
helper = function(archive_haystack, time_value_needle) {
  archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)
}
helper(archive_cases_dv_subset, as.Date("2021-01-01"))
  • probably doesn't work if there were another package, e.g., epipredict, that used slides but relied on packages not in epiprocess imports nor attached in the user's GlobalEnv. (The GlobalEnv thing saves us in some other cases, but seems like an artifact that suggests that R didn't have proper package import namespacing at one point and relied on everything being attached via Depends:.)

I'll push an update for this.

Copy link
Contributor Author

@nmdefries nmdefries Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper functions

I can see that the helper function example works with your updates and exits with an error with the old code, but I don't understand the difference in scope of the two approaches.

This line

data_mask = rlang::new_data_mask(data_env, top = data_parent_env)
says that only objects included in all environments (inclusive) between top and bottom will be accessible to the quosure -- right? So why is time_value_needle findable?

And how is the helper function example

helper = function(archive_haystack, time_value_needle) {
  archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)
}
helper(archive_cases_dv_subset, as.Date("2021-01-01"))

different from

archive_haystack <- archive_cases_dv_subset
time_value_needle <- as.Date("2021-01-01")
archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)

? The issue is more than the fact that we're passing a variable rather than a "raw", unassigned value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: In various places in this package we use :: to use some (but not all) external functions. Why is that? Like here, we're using rlang::new_data_mask but there aren't any conflicting function names in epiprocess or other imported packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We haven't really standardized this yet, and it is probably more of a mix because at some point, in new code, I started to try to favor :: in our internal code (but favor no :: in examples) based on previous discussion, but never spent the time and effort to refactor to any standard.

Copy link
Contributor

@brookslogan brookslogan Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't see your first question. eval_tidy makes both the data mask and the quosure environment available to the computation. It looks like rlang probably combines these into a single environment. [In the old approach, the data mask didn't have any of the stuff, and the environment was something like an env() with no bindings -> our f's execution environment -> slide execution environment -> archive internal environment -> epiprocess pkg namespace -> epiprocess imports -> stuff leading to the global namespace for backward compatibility reasons with old old R stuff, which saved us in a lot of cases. In the newer approaches, we're putting stuff in the data mask part and leaving the env alone, which lets it get stuff from the helper execution environment. Maybe we could have gotten similar results by modifying the first approach to not use the default parent env in the env() call, but instead setting its parent to the quosure's original quo_get_env() and manually adding bindings for .x etc. like you did in even older versions.]

library(rlang)
# put in a bunch of duplicate names and tricks to see what eval_tidy actually grabs:
data_env_parent = env(a = 15)
data_env = env(data_env_parent, a = 10, .data = list(a = 8), .env = list(b = 10000))
## data_env = env(data_env_parent, a = 10, .env = list(b = 10000))
data_mask = new_data_mask(bottom = data_env, top = data_env_parent)
data_mask$.custom_pronoun <- as_data_pronoun(fizz = "buzz")
#> Error in as_data_pronoun(fizz = "buzz"): unused argument (fizz = "buzz")
class(data_mask)
#> [1] "environment"
data_mask %>% names()
#> Error in data_mask %>% names(): could not find function "%>%"
class(data_mask$.env) # not sure if this is a real or a placeholder or something else
#> [1] "rlang_ctxt_pronoun"
# ?eval_tidy tells us that it will add the .data and .env pronouns, so maybe
# it's something else?
data_mask %>% env_parent(n = 1L) %>% names()
#> Error in data_mask %>% env_parent(n = 1L) %>% names(): could not find function "%>%"
data_mask %>% env_parent(n = 2L) %>% names()
#> Error in data_mask %>% env_parent(n = 2L) %>% names(): could not find function "%>%"
data_mask %>% env_parent(n = 3L) %>% env_label()
#> Error in data_mask %>% env_parent(n = 3L) %>% env_label(): could not find function "%>%"
# 
get_quo = function(x) {
  .data = list(a = 80000)
  quo(paste(a, .data$a, .data$.data$a, x, .env$x))
}
my_quo = get_quo(42)
# ?new_data_mask tells us that the tidy eval engine will change the parent of
# `top`. So I'd guess that `eval_tidy` is constructing the following environment
# chain: data_mask (holds pronouns etc.) -> data_env -> data_env_parent -> quo_get_env(my_quo) -> our current environment -> our current environment's ancestors
eval_tidy(my_quo, data = data_mask)
#> [1] "10 8  42 42"
# Oops, I forgot to install `.data` pronoun. ?as_data_mask tells us that
# `as_data_mask` automatically installs `.data`, but `new_data_mask` does not.
# Need to check on my suggested edits to make sure we install it properly.

Created on 2023-06-15 by the reprex package (v2.0.1)

Also, I don't think we can use as_data_pronoun for .ref_time_value, it's going to mess up the type, and I don't see what the benefit is anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, looking at this again, I think I overcomplicated it with the top & bottom. I think maybe we should be "installing" our pronoun-like objects into the data mask in the same way as regular pronouns. And regardless, we may need to document differences between the actual-pronoun .data vs. the non-pronoun .x. I'll test this out.

Question: what do you think about disallowing . in the formula interface? We're faced with . meaning something different in formulas vs. tidyeval, or meaning something different between slides and mutate/etc. We can't make these all consistent, but we can decide on whether the inconsistency is between two different non-error behaviors, one thing working & the other forbidden, or both forbidden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we wanted . to be available in the formula interface so that it matches standard formula behavior, i.e. with access to the first arg via ., .x, and ..1. (In reality, I'm not sure . will be used much.)

. by itself doesn't mean anything in tidy expressions, right? It's just a prefix to some pronoun/var names. Unless there's a subtlety I'm missing, it doesn't seem inconsistent or confusing to me.

new_col = sym(names(rlang::quos_auto_name(quos)))

x = purrr::map_dfr(ref_time_values, function(ref_time_value) {
Expand Down
11 changes: 7 additions & 4 deletions R/methods-epi_archive.R
Original file line number Diff line number Diff line change
Expand Up @@ -662,12 +662,15 @@ group_by.epi_archive = function(.data, ..., .add=FALSE, .drop=dplyr::group_by_dr
#' by a one-row tibble containing the values of the grouping variables for
#' the associated group; followed by a Date containing the reference time
#' value to use; followed by any number of named arguments. If a formula,
#' `f` can operate directly on columns accessed via `.x$var` or `.`, as in
#' `~ mean (.x$var)` to compute a mean of a column `var` for each
#' `f` can operate directly on columns accessed via `.x$var` or `.$var`, as
#' in `~ mean(.x$var)` to compute a mean of a column `var` for each
#' `ref_time_value`-group combination. The group key can be accessed via
#' `.y` or `.group_key`, and the reference time value can be accessed via
#' `.z` or `.ref_time_value`. If `f` is missing, then `...` will specify
#' the computation.
#' `.z` or `.ref_time_value`. If `f` is missing, then `...` will specify the
#' computation. In this case, the computation can reference columns via the
#' column name directly or via `.x$var`, as in `mean(var)` or `mean
#' (.x$var)`. The group key can be accessed via `.group_key`, and the
#' reference time value can be accessed via `.ref_time_value`.
#' @param ... Additional arguments to pass to the function or formula specified
#' via `f`. Alternatively, if `f` is missing, then `...` is interpreted as an
#' expression for tidy evaluation. See details of [`epi_slide`].
Expand Down
10 changes: 6 additions & 4 deletions man/epix_slide.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

98 changes: 78 additions & 20 deletions tests/testthat/test-epix_slide.R
Original file line number Diff line number Diff line change
Expand Up @@ -361,53 +361,111 @@ test_that("epix_slide alerts if the provided f doesn't take enough args", {
class = "check_sufficient_f_args__f_needs_min_args_before_dots")
})

test_that("epix_slide computation can use ref_time_value", {
# Formula
xx1 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(f = ~ .ref_time_value,
before = 2)

test_that("epix_slide computation via formula can use ref_time_value", {
xx_ref <- tibble(geo_value = rep("x",4),
time_value = c(4,5,6,7),
slide_value = c(4,5,6,7)
) %>%
as_epi_df(as_of = 4) %>% # Also a bug (issue #213)
group_by(geo_value)

expect_identical(xx1,xx_ref)
xx1 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(f = ~ .ref_time_value,
before = 2)

expect_identical(xx1, xx_ref)

xx2 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(f = ~ .z,
before = 2)

expect_identical(xx2,xx_ref)
expect_identical(xx2, xx_ref)

xx3 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(f = ~ ..3,
before = 2)

expect_identical(xx3,xx_ref)
expect_identical(xx3, xx_ref)
})

# Function
xx4 <- xx %>%
test_that("epix_slide computation via function can use ref_time_value", {
xx_ref <- tibble(geo_value = rep("x",4),
time_value = c(4,5,6,7),
slide_value = c(4,5,6,7)
) %>%
as_epi_df(as_of = 4) %>% # Also a bug (issue #213)
group_by(geo_value)

xx1 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(f = function(x, g, t) t,
before = 2)

expect_identical(xx4,xx_ref)
expect_identical(xx1, xx_ref)
})

test_that("epix_slide computation via dots can use ref_time_value and group", {
# ref_time_value
xx_ref <- tibble(geo_value = rep("x",4),
time_value = c(4,5,6,7),
slide_value = c(4,5,6,7)
) %>%
as_epi_df(as_of = 4) %>% # Also a bug (issue #213)
group_by(geo_value)

xx1 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(before = 2,
slide_value = .ref_time_value)

# Dots
expect_error(xx %>%
expect_identical(xx1, xx_ref)

xx2 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(before = 2,
slide_value = ref_time_value),
"object 'ref_time_value' not found")
expect_error(xx %>%
slide_value = .env$.ref_time_value)

expect_identical(xx2, xx_ref)

# group_key
xx_ref <- tibble(geo_value = rep("x",4),
time_value = c(4,5,6,7),
slide_value = "x"
) %>%
as_epi_df(as_of = 4) %>% # Also a bug (issue #213)
group_by(geo_value)

# Use group_key column
xx3 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(before = 2,
slide_value = .env$ref_time_value),
"object 'ref_time_value' not found")
slide_value = .group_key$geo_value)

expect_identical(xx3, xx_ref)

# Use entire group_key object
expect_error(
xx %>%
group_by(.data$geo_value) %>%
epix_slide(before = 2,
slide_value = nrow(.group_key)),
NA
)
})

test_that("epix_slide computation via dots outputs the same result using col names and the data var", {
xx_ref <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(before = 2,
sum_binary = sum(time_value))

xx1 <- xx %>%
group_by(.data$geo_value) %>%
epix_slide(before = 2,
sum_binary = sum(.x$time_value))

expect_identical(xx1, xx_ref)
})