Create geo-aggregation extension of tsibble and demonstrate its use in aggregation vignette #1

jacobbien · 2021-11-09T20:14:12Z

The basic idea here is to do for geography what tsibble does for time. I will elaborate on this below. In particular, a key outcome of this issue would be to demonstrate geographic aggregation functionality in the second half of the aggregate.Rmd vignette. But unlike in cmu-delphi/epiprocess#24, where the only task was to write the demo, in this case one would first need to actually develop the geographic aggregation functionality itself. I sketch out here a particular approach to this, which would involve writing a new separate R package that would inherit from tsibble, as described below.

The text was updated successfully, but these errors were encountered:

jacobbien · 2021-11-09T20:16:08Z

For the full discussion that this issue is based on, see cmu-delphi/epiprocess#7

jacobbien · 2021-11-09T20:37:00Z

Here is some context for what we are interested in achieving. The idea sketched out here would be to create a class that inherits from tsibble, perhaps called tsibble_us that inherits from tsibble where it enforces the key to be one of national, state, hrr, county, msa. Performing spatial aggregation is a common yet non-trivial task, so if the ability to do this easily is built in, that could be very useful.

Consider this example from the tsibble documentation showing how index_by() + summarize() can be used to aggregate (in time) to a coarser index:

tourism %>%
  index_by(Year = ~ year(.)) %>%
  group_by(Region, State) %>%
  summarise(Total = sum(Trips))

If one wanted to aggregate to the Year-Region level instead, couldn't one just remove State from the group_by? Well, but there's a problem here. Namely, what if some states are missing? The advantage of tsibble_us is that it would know if some states are missing (and likewise, if you try to aggregate counties to state level it would know if there are missing counties).

This is directly analogous to how tsibble worries a lot (so that the user doesn't have to!) about missing time values. In Handle implicit missingness with tsibble, Earo Wang describes four functions that have to do with handling missing time values: has_gaps, scan_gaps, count_gaps, and fill_gaps. Our class tsibble_us could do the same for missing locations. Imagine four corresponding functions for missing locations (let's call a missing location a "hole" for now for lack of a better term):

has_holes - are there missing locations? (E.g., if key is at the county level, are all US counties there? One could also have an argument where one specifies a limited scope, e.g., if the scope is defined as CA, then %>% has_gaps(scope = state("CA")) would only check if there were missing CA counties in the data object.
scan_holes - what are the missing counties
count_holes - how many missing counties
fill_holes - this function could at minimum create NA time series for the missing counties. This means that if we aggregate from the county to state level, it will be apparent that CA was missing some counties. One could also imagine cases where filling in 0s makes sense. In fact, tsibble_us could even offer imputation based on the geo hierarchy (e.g. fill in with the average of all counties within this state) or based on spatial proximity.

tsibble_us could also have population size information for each geo value, so that weighted averages could be easy to do in aggregation.

An example: Suppose dat is a tsibble_us object with daily-county cases (three columns: Date, the index; County, the geo-key; and Cases being a column with incidence rate, per 100k). We could aggregate to state-level epiweek data with something like the following:

dat %>%
  fill_gaps() %>% 
  fill_holes() %>%
  index_by(Epiweek = ~ epiweek(.)) %>%
  geokey_by(State = ~ state(.)) %>%
  summarise(Aggregated_cases = population_weighted_mean(Cases))

Here geokey_by, state and population_weighted_mean would be tsibble_us functions that are based on the information contained in the package about the us geography.

Obviously, instead of tsibble_us, one could write a more general tsibble_geo and then there could be location specific classes that inherit from it, like tsibble_us, tsibble_europe, tsibble_world, etc.

qpmnguyen · 2022-02-09T17:51:45Z

@ryantibs @jacobbien I'm happy to take this task if it's still open. Also happy to help with any issues I've created with my previous PR as well.

ryantibs · 2022-02-09T18:04:02Z

@qpmnguyen Sounds good! Let's discuss on slack what the best approach is, because there is a lot of functionality already written (for our indicators pipeline) for geo aggregation stuff in Python, which is in the covidcast_indicators repo. Can you please post a message on #epi-tooling channel recapping what the basic issue here is, and what are some proposed plans of attack, and tag all the "usual" folks for discussion?

There's also some smaller issues I'm about to open if that slack discussion takes a while to resolve on what's the best strategy. I'll point you to this when I open them.

Re your time aggregation PR: I'm just finishing going through it now, should be able to merge it soon.

ryantibs · 2022-02-09T18:59:03Z

As a follow-up, I only opened up one tiny issue cmu-delphi/epiprocess#39, the other one I managed to figure out and fix already. Your PR is merged. Thanks again!

qpmnguyen · 2022-02-09T19:29:34Z

Sounds good! I'll take a look at the indicators Repo and draft up some discussion points in Slack.

ryantibs · 2022-04-15T13:41:43Z

@qpmnguyen Do we have a repo yet for gtsibble or whatever we're calling it? If so, can you transfer this issue over to there (do you have permissions?)

qpmnguyen · 2022-04-15T15:46:05Z

@ryantibs I just transferred the issue! I have a branch working locally on my fork and will initiate a pull request once more things have been added.

jacobbien changed the title ~~Add geo-aggregation functionality~~ Create geo-aggregation extension of tsibble and demonstrate its use in aggregation vignette Nov 9, 2021

ryantibs assigned qpmnguyen Feb 24, 2022

dshemetov added the question Further information is requested label Mar 9, 2022

qpmnguyen transferred this issue from cmu-delphi/epiprocess Apr 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create geo-aggregation extension of tsibble and demonstrate its use in aggregation vignette #1

Create geo-aggregation extension of tsibble and demonstrate its use in aggregation vignette #1

jacobbien commented Nov 9, 2021 •

edited

Loading

jacobbien commented Nov 9, 2021 •

edited

Loading

jacobbien commented Nov 9, 2021 •

edited

Loading

qpmnguyen commented Feb 9, 2022

ryantibs commented Feb 9, 2022

ryantibs commented Feb 9, 2022

qpmnguyen commented Feb 9, 2022

ryantibs commented Apr 15, 2022

qpmnguyen commented Apr 15, 2022

Create geo-aggregation extension of tsibble and demonstrate its use in aggregation vignette #1

Create geo-aggregation extension of tsibble and demonstrate its use in aggregation vignette #1

Comments

jacobbien commented Nov 9, 2021 • edited Loading

jacobbien commented Nov 9, 2021 • edited Loading

jacobbien commented Nov 9, 2021 • edited Loading

qpmnguyen commented Feb 9, 2022

ryantibs commented Feb 9, 2022

ryantibs commented Feb 9, 2022

qpmnguyen commented Feb 9, 2022

ryantibs commented Apr 15, 2022

qpmnguyen commented Apr 15, 2022

jacobbien commented Nov 9, 2021 •

edited

Loading

jacobbien commented Nov 9, 2021 •

edited

Loading

jacobbien commented Nov 9, 2021 •

edited

Loading