Skip to content

Consider supporting a national geo level for all indicators #199

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krivard opened this issue Aug 10, 2020 · 19 comments
Closed

Consider supporting a national geo level for all indicators #199

krivard opened this issue Aug 10, 2020 · 19 comments
Labels
Engineering Used to filter issues when synching with Asana good first issue needs-coordination This work should be assigned to a coordinator and split up into several subtasks Triage Nominate for inclusion in the next release
Milestone

Comments

@krivard
Copy link
Contributor

krivard commented Aug 10, 2020

This would be useful for generating nationwide time series plots, and also for completing the nesting doll of scales for viz.

Easiest way to do this is probably just to add it in the geo utility for all the python indicators, and roll out national level as we convert indicators to use the package. The fb-package branch already has an implementation of it for R.

@krivard krivard added the Triage Nominate for inclusion in the next release label Aug 10, 2020
@krivard
Copy link
Contributor Author

krivard commented Sep 17, 2020

Depends on #215

@RoniRos
Copy link
Member

RoniRos commented Oct 5, 2020

National would be very useful, and while implementing it we should also implement an HHS Regions level.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 5, 2020 via email

@RoniRos
Copy link
Member

RoniRos commented Oct 5, 2020

HHS Regions: awesome!

National: So far we have been focusing on US locations only. I expect this will continue to be the case for the next ~6 months or so. At some point, we may well want to expand internationally. So we should not put any effort into creating codes for other countries, but at the same time we shouldn't do anything that will make it harder to expand internationally later.

@krivard
Copy link
Contributor Author

krivard commented Oct 5, 2020

National should use the standard two-character country codes (https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2)

see also cmu-delphi/delphi-epidata#207

@dshemetov
Copy link
Contributor

dshemetov commented Oct 5, 2020

I am not familiar enough with other countries' county-level geocodes to say how the mapping from county to country code should work in general.

In the meantime though, in #217, I have implemented a very simple aggregation to the "us" national level only as a stopgap, which works by summing records with a FIPS or a ZIP code.

Extending to international locations may require larger changes, such as keeping the national ISO code in a separate column from the finer geocode (like JHU does, for example).

@RoniRos
Copy link
Member

RoniRos commented Oct 6, 2020

Many countries have other types of divisions, e.g. provinces, cantons, etc. I wouldn't worry about it now.

@dshemetov When you discuss the simple aggregation by "summing records" above, do you mean aggregation of sample elements? Or weighted averaging of signal values? The important thing is to properly account for all FIPS codes in the country, including those for which an estimate wasn't produced, a sample was too small (or even empty), etc. I assume this was already done in aggregating counties to states. Why not create the national signal directly from the states' signal? Or better yet, from the HHS Regions signals? These are clean hierarchies: every county belongs to exactly one state or territory, every state and territory belong to exactly one HHS Region, and the 10 HHS Regions comprise the national territory.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 6, 2020

@RoniRos Sounds good, simple for now.

By aggregation, I mean just the aggregation of sample elements. There is no detailed accounting of edge-case FIPS codes in the part of the code I touched (excepting the transformation to megacounties, when sample sizes are below threshold); the main edge case I have thought about is nan-handling, which at the moment are zero-filled. Are there other cases I should be aware of?

I defaulted to working with the finest level geocode in the transformations to simplify the crosswalks transformation graph. I think it's mostly a personal conceptual preference for providing a star graph instead of requiring the user to do a chain of transformations (e.g. FIPS to state to HRR to nation). Since we don't support arbitrary crosswalks between geocodes, my thought was that it would be easier for the utility user to know that FIPS -> * is always available instead of hunting for the correct chain of transformations.

@RoniRos
Copy link
Member

RoniRos commented Oct 6, 2020

the main edge case I have thought about is nan-handling, which at the moment are zero-filled

I am not sure what you mean by zero-filled. Presumably, the nan's are 0/0, so they add zero to both enumerator and denominator, right?

@dshemetov
Copy link
Contributor

The geocoding aggregation in the utility treats all data fields like a counts value and does weighted summing. I have not considered the effects of a denominator. Where do these come up?

@RoniRos
Copy link
Member

RoniRos commented Oct 6, 2020

If these are counts, that's probably the numerator. The denominator is the sample size.
All is well.

@krivard
Copy link
Contributor Author

krivard commented Oct 6, 2020 via email

@dshemetov
Copy link
Contributor

dshemetov commented Oct 6, 2020

Unassigned and Out of State records are given a FIPS code of XX000, so they should be summed with the rest of the FIPS records.

@nickreich
Copy link

I wanted to mention that this would be a super helpful feature for us at the COVID-19 Forecast Hub! We are currently not using covidcast data for jhu-csse due to not having all the backfill data in place and not having a national signal. would be great to hae this resolved!

@RoniRos
Copy link
Member

RoniRos commented Oct 22, 2020

Thanks for the input @nickreich , it is helpful to know. Hopefully this will happen soon.

@dshemetov
Copy link
Contributor

I think this could be started by scoping out the work needed for a particular signal, such as a JHU. @nickreich do you need a national signal for all countries or just the US?

@krivard krivard added the needs-coordination This work should be assigned to a coordinator and split up into several subtasks label Oct 22, 2020
@nickreich
Copy link

Thanks @dshemetov . Just the US.

@krivard krivard added this to the December OKRs milestone Nov 12, 2020
@SumitDELPHI SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020
This was referenced Dec 8, 2020
@capnrefsmmat
Copy link
Contributor

capnrefsmmat commented Mar 17, 2021

FYI, nation support for doctor-visits is more important until CHNG comes back, because COVIDcast 2.0 shows the nation view of all indicators by default. It currently shows a bit N/A for doctor-visits unless you click to see state-by-state numbers.

This is also true for Quidel.

@krivard
Copy link
Contributor Author

krivard commented Mar 17, 2021

Reviving #616

@krivard krivard closed this as completed Aug 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Engineering Used to filter issues when synching with Asana good first issue needs-coordination This work should be assigned to a coordinator and split up into several subtasks Triage Nominate for inclusion in the next release
Projects
None yet
Development

No branches or pull requests

6 participants