Skip to content

Make slide function broadcasting controllable. #227

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brookslogan opened this issue Oct 6, 2022 · 2 comments
Closed

Make slide function broadcasting controllable. #227

brookslogan opened this issue Oct 6, 2022 · 2 comments
Labels
enhancement New feature or request op-semantics Operational semantics; many potentially breaking changes here P1 medium priority

Comments

@brookslogan
Copy link
Contributor

brookslogan commented Oct 6, 2022

Work on #64 reveals some further incompatibilities between epi_slide and epix_slide regarding broadcasting, and it's unclear what the default should be. In previous discussions, we decided that it should be simple enough for the user to postprocess the current broadcasting approach to something not-broadcasting, but I'm hoping that we can achieve better clarity by providing a parameter and/or splitting into multiple functions and/or providing helper functions.

epi_slide (this adds columns and maintains row count like mutate):

  • Broadcasting is required to maintain row count. (Behavior without a group_by might be surprising.)
  • To get summarize-like behavior, one should do a (still-grouped) slice(1L) and drop certain columns (typically expected to be the non-grouping key cols); to get this to an epi_df, depending on if geo_value was in the grouping vars, one may need to add a geo_value="us" column and construct the epi_df.

epix_slide (this throws away non-grouping columns like summarize and does broadcasting differently): we may want a few behaviors:

  • non-broadcasting --- closest to summarize [TODO check if we need this behavior if f is doing an epi_cor etc., where output doesn't have an nrow matching the broadcast target nrow. + there will be work here trying to decide what the appropriate class and key of the slide output should be given various f outputs.]
  • broadcast to non-grouping key var values in each group --- current
  • broadcast to non-grouping key var values from the entire archive --- at some point this seemed like it'd be needed to forecast new locations promptly, but I'm not sure it's actually needed, and I'm not sure it's a good idea; it's nonreproducible in that added locations in later versions can affect results for previous versions
  • to get mutate-like behavior: broadcast, form an epi_archive, and epix_merge back with the original archive --- not sure this is well-formed as iteration is across time_value and version simultaneously [should we take the slide result, duplicate its time_value as version, form an epi_archive, and epix_merge; or take the slide result, rename time_value to version, form a generalized archive that doesn't have a time_value column, and merge it (where its rows are duplicated for each time_value in the existing epi_archive with corresponding common key variables, or using a symbolic merge that does this for every possible time_value, not just those in the exisitng epi_archive)?]
@brookslogan brookslogan added enhancement New feature or request P1 medium priority op-semantics Operational semantics; many potentially breaking changes here labels Oct 6, 2022
@brookslogan
Copy link
Contributor Author

One mechanism for this might be #295.

@brookslogan
Copy link
Contributor Author

We've moved to no broadcasting at all for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request op-semantics Operational semantics; many potentially breaking changes here P1 medium priority
Projects
None yet
Development

No branches or pull requests

1 participant