-
Notifications
You must be signed in to change notification settings - Fork 16
NSSP data for HRRs is not summed properly #2129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Probably adjust some wording/add warning in geomapper here too. This original geomap instruction was why I implemented custom weighting for msa and hhs but not hrr. covidcast-indicators/_delphi_utils_python/delphi_utils/geomap.py Lines 99 to 104 in 454ac56
|
I just did another read-through of some GeoMapper code... It looks like |
I'm pretty familiar with GeoMapper, having written much of the current core logic in #217 and in #1960, so I figure I should provide some context. These functions contain a lot of implicit assumptions about the data, by the way they handle the missingness edge case, and these assumptions are hard to convey succinctly (more on that below). I don't think the fips->hrr weights are wrong. I think the TL;DR of the problem here is that
We should definitely use Context: ("Source geo" and "target geo" below refer to the geos being translated. So for the bug in question, source geos are fips and target geos are hrrs). NSSP data is all percent of state ED visits and it's full of incomplete reporting (i.e. not every county reports values), so (a) adding values across separate source geos doesn't make sense without some sort of population adjustment, (b) we need a strategy to handle missingness in source geos. Since the target geo needs to also report a percentage, it has to be normalized after aggregating and we're forced into a modeling choice by our choice of denominator:
David and I settled on the latter choice, since we figured that assuming nearby missing geos are similar is more correct than assuming they are zero. He wrote the original |
Reopening because this still needs some sort of "patching-like" action, either deleting the erroneous data or overwriting it with correct data, depending on available options (im not familiar enough with the issuance and revision behavior of the source data nor with our backups of it off the top of my head). |
todo:
|
it is computed with a basic sum instead of a weighted one like the other geo types we aggregate ourselves. as a result the scale is off and can be >100% (see
nssp:pct_ed_visits_influenza
@hrr:366
)The text was updated successfully, but these errors were encountered: