Skip to content

Pass ref time value and group key to epix_slide for tidy computations #317

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

nmdefries
Copy link
Contributor

@nmdefries nmdefries commented May 15, 2023

Addresses #170 for epix_slide for tidy computation via dots.

Change the quosure's environment from where it was defined to a child of f's environment, which lets the tidy computation use any of f's args by name. Set the arg names to mirror the descriptive names allowed in formula computations. If we want to exactly mirror formula vars (.x, ., .y, .z) those can be set in the call to env().

@nmdefries nmdefries requested review from brookslogan and removed request for brookslogan May 15, 2023 22:45
nmdefries and others added 3 commits May 24, 2023 11:59
- Try to keep expected function visible even when using `epix_slide()` within a
  helper function or other packages.
- Make `.{x,group_key,ref_time_value}` inaccessible via `.data` and `.env`, only
  directly accessible.
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[See edge case + commit to address below.] Please:

  • Sanity check my changes [to interface (see changed test) & implementation].
  • If they look okay, please add a test based on the helper function example.
  • Now or in a separate Issue: we should probably provide some .x, .group_key, and .ref_time_value placeholders to avoid downstream packages from getting warnings about using undefined things in checks. I think that's what {dplyr,rlang}::.data is; I don't think this package member is actually directly used in tidy eval, just more "convenient fiction" for CHECKs. [?.data confirms this, in the "Where does '.data' live?:" section.]

f = function(.x, .group_key, .ref_time_value, quo, ...) {
quo = quo_set_env(quo, env())
rlang::eval_tidy(quo, .x)
}
Copy link
Contributor

@brookslogan brookslogan Jun 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: ([maybe] shared by the old code [or maybe not... keep getting tricked by eval_tidy default env for exprs when using quosures]) env() uses the function execution environment [by default], which handles making .x, .group_key, and .ref_time_value available, but also:

  • probably makes quo available
  • messes with some helper functions:
helper = function(archive_haystack, time_value_needle) {
  archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)
}
helper(archive_cases_dv_subset, as.Date("2021-01-01"))
  • probably doesn't work if there were another package, e.g., epipredict, that used slides but relied on packages not in epiprocess imports nor attached in the user's GlobalEnv. (The GlobalEnv thing saves us in some other cases, but seems like an artifact that suggests that R didn't have proper package import namespacing at one point and relied on everything being attached via Depends:.)

I'll push an update for this.

Copy link
Contributor Author

@nmdefries nmdefries Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helper functions

I can see that the helper function example works with your updates and exits with an error with the old code, but I don't understand the difference in scope of the two approaches.

This line

data_mask = rlang::new_data_mask(data_env, top = data_parent_env)
says that only objects included in all environments (inclusive) between top and bottom will be accessible to the quosure -- right? So why is time_value_needle findable?

And how is the helper function example

helper = function(archive_haystack, time_value_needle) {
  archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)
}
helper(archive_cases_dv_subset, as.Date("2021-01-01"))

different from

archive_haystack <- archive_cases_dv_subset
time_value_needle <- as.Date("2021-01-01")
archive_haystack %>% epix_slide(has_needle = time_value_needle %in% time_value, before = 365000L)

? The issue is more than the fact that we're passing a variable rather than a "raw", unassigned value?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: In various places in this package we use :: to use some (but not all) external functions. Why is that? Like here, we're using rlang::new_data_mask but there aren't any conflicting function names in epiprocess or other imported packages.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. We haven't really standardized this yet, and it is probably more of a mix because at some point, in new code, I started to try to favor :: in our internal code (but favor no :: in examples) based on previous discussion, but never spent the time and effort to refactor to any standard.

Copy link
Contributor

@brookslogan brookslogan Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, didn't see your first question. eval_tidy makes both the data mask and the quosure environment available to the computation. It looks like rlang probably combines these into a single environment. [In the old approach, the data mask didn't have any of the stuff, and the environment was something like an env() with no bindings -> our f's execution environment -> slide execution environment -> archive internal environment -> epiprocess pkg namespace -> epiprocess imports -> stuff leading to the global namespace for backward compatibility reasons with old old R stuff, which saved us in a lot of cases. In the newer approaches, we're putting stuff in the data mask part and leaving the env alone, which lets it get stuff from the helper execution environment. Maybe we could have gotten similar results by modifying the first approach to not use the default parent env in the env() call, but instead setting its parent to the quosure's original quo_get_env() and manually adding bindings for .x etc. like you did in even older versions.]

library(rlang)
# put in a bunch of duplicate names and tricks to see what eval_tidy actually grabs:
data_env_parent = env(a = 15)
data_env = env(data_env_parent, a = 10, .data = list(a = 8), .env = list(b = 10000))
## data_env = env(data_env_parent, a = 10, .env = list(b = 10000))
data_mask = new_data_mask(bottom = data_env, top = data_env_parent)
data_mask$.custom_pronoun <- as_data_pronoun(fizz = "buzz")
#> Error in as_data_pronoun(fizz = "buzz"): unused argument (fizz = "buzz")
class(data_mask)
#> [1] "environment"
data_mask %>% names()
#> Error in data_mask %>% names(): could not find function "%>%"
class(data_mask$.env) # not sure if this is a real or a placeholder or something else
#> [1] "rlang_ctxt_pronoun"
# ?eval_tidy tells us that it will add the .data and .env pronouns, so maybe
# it's something else?
data_mask %>% env_parent(n = 1L) %>% names()
#> Error in data_mask %>% env_parent(n = 1L) %>% names(): could not find function "%>%"
data_mask %>% env_parent(n = 2L) %>% names()
#> Error in data_mask %>% env_parent(n = 2L) %>% names(): could not find function "%>%"
data_mask %>% env_parent(n = 3L) %>% env_label()
#> Error in data_mask %>% env_parent(n = 3L) %>% env_label(): could not find function "%>%"
# 
get_quo = function(x) {
  .data = list(a = 80000)
  quo(paste(a, .data$a, .data$.data$a, x, .env$x))
}
my_quo = get_quo(42)
# ?new_data_mask tells us that the tidy eval engine will change the parent of
# `top`. So I'd guess that `eval_tidy` is constructing the following environment
# chain: data_mask (holds pronouns etc.) -> data_env -> data_env_parent -> quo_get_env(my_quo) -> our current environment -> our current environment's ancestors
eval_tidy(my_quo, data = data_mask)
#> [1] "10 8  42 42"
# Oops, I forgot to install `.data` pronoun. ?as_data_mask tells us that
# `as_data_mask` automatically installs `.data`, but `new_data_mask` does not.
# Need to check on my suggested edits to make sure we install it properly.

Created on 2023-06-15 by the reprex package (v2.0.1)

Also, I don't think we can use as_data_pronoun for .ref_time_value, it's going to mess up the type, and I don't see what the benefit is anyway.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, looking at this again, I think I overcomplicated it with the top & bottom. I think maybe we should be "installing" our pronoun-like objects into the data mask in the same way as regular pronouns. And regardless, we may need to document differences between the actual-pronoun .data vs. the non-pronoun .x. I'll test this out.

Question: what do you think about disallowing . in the formula interface? We're faced with . meaning something different in formulas vs. tidyeval, or meaning something different between slides and mutate/etc. We can't make these all consistent, but we can decide on whether the inconsistency is between two different non-error behaviors, one thing working & the other forbidden, or both forbidden.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we wanted . to be available in the formula interface so that it matches standard formula behavior, i.e. with access to the first arg via ., .x, and ..1. (In reality, I'm not sure . will be used much.)

. by itself doesn't mean anything in tidy expressions, right? It's just a prefix to some pronoun/var names. Unless there's a subtlety I'm missing, it doesn't seem inconsistent or confusing to me.

@nmdefries nmdefries requested a review from brookslogan June 5, 2023 21:34
nmdefries and others added 19 commits June 6, 2023 12:03
so developers can use (untracked) renv's at the repo level without as many git
and CHECK complaints.
…imevalue-without-env

Pass `ref_time_value` to `epix_slide` for functions and formulas
Set DELPHI_EPIDATA_KEY, from a GitHub Secret
and add some unrelated comments regarding time value counting details.

Since we're not potentially dealing with R6 objects, we can just us
`as_data_mask(.x)`. We can also flatten the data mask by installing the `.x`,
`.group_key` and `.ref_time_value` in the same way as that pronouns are
installed (except keeping their original classes, not converting to pronouns),
rather than using another level in the environment chain for the data mask.
Allow `epi_slide` to access `ref_time_value`
Copy link
Contributor

@brookslogan brookslogan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to go ahead and merge this with a tweaked approach to the data masking approach. I also threw in .x as a not-quite-pronoun in the mask; I'm not sure if this is a design change or a bugfix to my first data mask attempt. Going to go ahead and merge and we can discuss any design issues / further bugfixes in another Issue.

@brookslogan
Copy link
Contributor

brookslogan commented Jun 16, 2023

I've failed the GitHub Actions [trigger, because I didn't update the PR base & GitHub didn't do it automatically], so we're not getting checks here. Unfortunate.

@brookslogan brookslogan merged commit a2c9aa1 into ndefries/epix-slide-pass-reftimevalue-without-env Jun 16, 2023
@brookslogan brookslogan deleted the ndefries/epix-slide-dots-rtv branch June 16, 2023 20:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants