`epi_df` structure #482

dajmcdon · 2024-07-16T22:40:26Z

Possibly worth adding some sorting to as_epi_df. Some ideas,

Columns are always time_value, geo_value, additional keys, other columns
Rows always vary time_value fastest, then geo_value, etc.

For the second, perhaps add a dplyr::arrange.epi_df() method.

The text was updated successfully, but these errors were encountered:

brookslogan · 2024-07-17T18:30:30Z

On point 1., I polled some preferences regarding data-fetching argument ordering which might be relevant.

On point 2., I'm a little confused. I think arrange already works properly on epi_dfs; is it not working for you, or are you saying it should be forbidden? Related: what do you think about these approaches?:

A. epi_dfs should always have rows sorted in this manner (so we should also make sure to maintain this order after any manipulations); arrange.epi_df() should output an explanatory error message.
B. epi_dfs should be sorted upon construction (maybe only by default, controlled by an arrange arg to as_epi_df), e.g., by calling arrange() at some point, but user should be able to later re-arrange() as they'd like
C. Users should be able to choose either behavior by using a constructor argument (e.g., keep_arranged).

dajmcdon · 2024-07-17T18:52:50Z

Point 2.

Hmmm, yes, arrange() works. I guess I was imagining a function (possibly different from arrange() that would rearrange to a standard ordering if requested. We don't even have this in our saved data, for example, compare epiprocess::jhu_csse_daily_subset to epipredict::case_death_rate_subset.

dshemetov · 2024-07-20T00:59:32Z

RE point 1, we looked into similar polling when deciding for argument orders for epidatr and came to: location > time > other. Sorting by geo value and then time value seems most performant to me too, since group_by(geo_value) is common for time series operations. Do we not already do this in the constructor? Seems like something we should do. (I guess I'm assuming tibbles maintain some metadata about how they were sorted last that they can leverage to speedup groupby or other operations; not sure if this is actually the case.)

RE 2, I'm not really seeing the benefit of a "default" arrange function over a generic one that can accept different orders as arguments. Using existing arrange to sort at construction sounds good.

dajmcdon added the enhancement New feature or request label Jul 16, 2024

dajmcdon mentioned this issue Jul 25, 2024

275 step epi slide cmu-delphi/epipredict#364

Merged

4 tasks

dajmcdon mentioned this issue Aug 13, 2024

Arrange cannonical #511

Merged

3 tasks

brookslogan closed this as completed in #511 Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`epi_df` structure #482

`epi_df` structure #482

dajmcdon commented Jul 16, 2024

brookslogan commented Jul 17, 2024

dajmcdon commented Jul 17, 2024

dshemetov commented Jul 20, 2024 •

edited

Loading

epi_df structure #482

epi_df structure #482

Comments

dajmcdon commented Jul 16, 2024

brookslogan commented Jul 17, 2024

dajmcdon commented Jul 17, 2024

dshemetov commented Jul 20, 2024 • edited Loading

`epi_df` structure #482

`epi_df` structure #482

dshemetov commented Jul 20, 2024 •

edited

Loading