NSSP data for HRRs is not summed properly #2129

melange396 · 2025-02-27T21:54:21Z

it is computed with a basic sum instead of a weighted one like the other geo types we aggregate ourselves. as a result the scale is off and can be >100% (see nssp:pct_ed_visits_influenza @hrr:366 )

The text was updated successfully, but these errors were encountered:

minhkhul · 2025-02-28T15:42:20Z

Probably adjust some wording/add warning in geomapper here too. This original geomap instruction was why I implemented custom weighting for msa and hhs but not hrr.

covidcast-indicators/_delphi_utils_python/delphi_utils/geomap.py

Lines 99 to 104 in 454ac56

    
               Example 2: to replace a geocode column with a new one, aggregating the data 
        
               with weights: 
        
               > gmpr = GeoMapper() 
        
               > df = gmpr.replace_geocode(df, "fips", "zip", 
        
                                           from_col="fips", new_col="geo_id", 
        
                                           date_col="timestamp", dropna=False)

melange396 · 2025-02-28T17:10:19Z

I just did another read-through of some GeoMapper code... It looks like .replace_geocode() can/should produce weighted sums for certain geo-level conversions (through a sneaky and convoluted path), but the precomputed fips-->hrr weights do not add up the way they should. Using your other approach that calls .aggregate_by_weighted_sum() w/ population should get around this.

dshemetov · 2025-02-28T20:14:03Z

I'm pretty familiar with GeoMapper, having written much of the current core logic in #217 and in #1960, so I figure I should provide some context. These functions contain a lot of implicit assumptions about the data, by the way they handle the missingness edge case, and these assumptions are hard to convey succinctly (more on that below). I don't think the fips->hrr weights are wrong. I think the TL;DR of the problem here is that

replace_geocode() is appropriate for aggregating unnormalized counts data and implicitly considers missing geos to be zero
aggregate_by_weighted_sum() is intended for normalized data and implicitly extrapolates to missing geos from present neighbor geos

We should definitely use aggregate_by_weighted_sum() in this code path. And I should improve the GeoMapper docstring. Maybe add a "GeoMapper by example" notebook too.

Context:

("Source geo" and "target geo" below refer to the geos being translated. So for the bug in question, source geos are fips and target geos are hrrs).

NSSP data is all percent of state ED visits and it's full of incomplete reporting (i.e. not every county reports values), so (a) adding values across separate source geos doesn't make sense without some sort of population adjustment, (b) we need a strategy to handle missingness in source geos. Since the target geo needs to also report a percentage, it has to be normalized after aggregating and we're forced into a modeling choice by our choice of denominator:

if you normalize by the total population of the target geo, but the numerator only added present source geos, then that translates to an implicit zero-fill assumption on the missing source geos
if you normalize by the population of only the present source geos, then that translates to an implicit extrapolation from the present geos

David and I settled on the latter choice, since we figured that assuming nearby missing geos are similar is more correct than assuming they are zero. He wrote the original aggregate_by_weighted_sum() for this purpose and I generalized and ported it to GeoMapper. In hindsight, and as these things tend to go, we spent all our time getting the functionality right and no time choosing a good function name, so we ended up with this too-generic-to-be-descriptive thing 🙃.

melange396 · 2025-03-12T23:05:48Z

Reopening because this still needs some sort of "patching-like" action, either deleting the erroneous data or overwriting it with correct data, depending on available options (im not familiar enough with the issuance and revision behavior of the source data nor with our backups of it off the top of my head).

minhkhul · 2025-03-19T17:19:23Z

todo:

delete old hrr data
- took ~1hr late last night
- DELETE epimetric_x FROM epimetric_x JOIN signal_dim USING (signal_key_id) JOIN geo_dim USING (geo_key_id) WHERE source='nssp' AND geo_type='hrr';
release covidcast indicators to the fix is applied for future runs.
- done in Release covidcast-indicators 0.3.62 #2134
patch old hrr data

melange396 added bug Something isn't working data quality Missing data, weird data, broken data labels Feb 27, 2025

aysim319 self-assigned this Feb 28, 2025

aysim319 linked a pull request Mar 3, 2025 that will close this issue

2129 fix hrr for nssp #2131

Merged

aysim319 mentioned this issue Mar 3, 2025

2129 fix hrr for nssp #2131

Merged

aysim319 closed this as completed in #2131 Mar 11, 2025

melange396 reopened this Mar 12, 2025

minhkhul closed this as completed Mar 24, 2025

aysim319 assigned minhkhul Mar 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

NSSP data for HRRs is not summed properly #2129

NSSP data for HRRs is not summed properly #2129

melange396 commented Feb 27, 2025

minhkhul commented Feb 28, 2025 •

edited

Loading

Uh oh!

melange396 commented Feb 28, 2025

Uh oh!

dshemetov commented Feb 28, 2025 •

edited

Loading

Uh oh!

melange396 commented Mar 12, 2025

Uh oh!

minhkhul commented Mar 19, 2025 •

edited

Loading

Uh oh!

NSSP data for HRRs is not summed properly #2129

NSSP data for HRRs is not summed properly #2129

Comments

melange396 commented Feb 27, 2025

minhkhul commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melange396 commented Feb 28, 2025

Uh oh!

dshemetov commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

melange396 commented Mar 12, 2025

Uh oh!

minhkhul commented Mar 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

minhkhul commented Feb 28, 2025 •

edited

Loading

dshemetov commented Feb 28, 2025 •

edited

Loading

minhkhul commented Mar 19, 2025 •

edited

Loading