Make slide function broadcasting controllable. #227

brookslogan · 2022-10-06T17:57:36Z

Work on #64 reveals some further incompatibilities between epi_slide and epix_slide regarding broadcasting, and it's unclear what the default should be. In previous discussions, we decided that it should be simple enough for the user to postprocess the current broadcasting approach to something not-broadcasting, but I'm hoping that we can achieve better clarity by providing a parameter and/or splitting into multiple functions and/or providing helper functions.

epi_slide (this adds columns and maintains row count like mutate):

Broadcasting is required to maintain row count. (Behavior without a group_by might be surprising.)
To get summarize-like behavior, one should do a (still-grouped) slice(1L) and drop certain columns (typically expected to be the non-grouping key cols); to get this to an epi_df, depending on if geo_value was in the grouping vars, one may need to add a geo_value="us" column and construct the epi_df.

epix_slide (this throws away non-grouping columns like summarize and does broadcasting differently): we may want a few behaviors:

non-broadcasting --- closest to summarize [TODO check if we need this behavior if f is doing an epi_cor etc., where output doesn't have an nrow matching the broadcast target nrow. + there will be work here trying to decide what the appropriate class and key of the slide output should be given various f outputs.]
broadcast to non-grouping key var values in each group --- current
broadcast to non-grouping key var values from the entire archive --- at some point this seemed like it'd be needed to forecast new locations promptly, but I'm not sure it's actually needed, and I'm not sure it's a good idea; it's nonreproducible in that added locations in later versions can affect results for previous versions
to get mutate-like behavior: broadcast, form an epi_archive, and epix_merge back with the original archive --- not sure this is well-formed as iteration is across time_value and version simultaneously [should we take the slide result, duplicate its time_value as version, form an epi_archive, and epix_merge; or take the slide result, rename time_value to version, form a generalized archive that doesn't have a time_value column, and merge it (where its rows are duplicated for each time_value in the existing epi_archive with corresponding common key variables, or using a symbolic merge that does this for every possible time_value, not just those in the exisitng epi_archive)?]

The text was updated successfully, but these errors were encountered:

brookslogan · 2023-04-21T15:55:05Z

One mechanism for this might be #295.

brookslogan · 2024-10-15T09:30:46Z

We've moved to no broadcasting at all for now.

brookslogan added enhancement New feature or request P1 medium priority op-semantics Operational semantics; many potentially breaking changes here labels Oct 6, 2022

brookslogan mentioned this issue Oct 21, 2022

Check/test all_rows, improve docs #147

Closed

3 tasks

brookslogan closed this as completed Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make slide function broadcasting controllable. #227

Make slide function broadcasting controllable. #227

brookslogan commented Oct 6, 2022 •

edited

Loading

brookslogan commented Apr 21, 2023

brookslogan commented Oct 15, 2024

Make slide function broadcasting controllable. #227

Make slide function broadcasting controllable. #227

Comments

brookslogan commented Oct 6, 2022 • edited Loading

brookslogan commented Apr 21, 2023

brookslogan commented Oct 15, 2024

brookslogan commented Oct 6, 2022 •

edited

Loading