Skip to content

GeoMapper doubles the effective population in state-level prop signals #325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
krivard opened this issue Oct 17, 2020 · 5 comments
Closed
Assignees
Labels
bug Something isn't working data quality Missing data, weird data, broken data Engineering Used to filter issues when synching with Asana

Comments

@krivard
Copy link
Contributor

krivard commented Oct 17, 2020

More information in this Slack thread, but the TL;DR is that JHU state-level prop signals are half the magnitude they should be, according to USAFacts and NYT. Since GeoMapper includes population for both state and FIPS, it may be summing both of them when computing state prop instead of just using one or the other.

Might be related to Unassigned handling.

@dshemetov
Copy link
Contributor

Alright, so it's likely this change which switched us over to the new population source file. What's weird is that the previous source file included state FIPS in it as well, so I'm not sure why they're not being filtered out now. Spent a bit too long testing on main today, before realizing that the breaking changes are on deploy-jhu only 🤦.

@dshemetov
Copy link
Contributor

Comparing the new population source file with the old one: in all the non-Puerto Rico counties, the total of all the absolute differences is 0.0 (which is just a verification that we're using the same 2019 estimates). The only change other change is that Puerto Rico FIPS now have population counts and the FIPS code is no longer a float.

@dshemetov
Copy link
Contributor

When I set the state FIPS codes to population 0 in the source file, the overall signal shape returns to normal, however there are still (states, days) that have differences.
image

For example, with local_data being jhu values with the population 0 fix and local_data_pre being the same jhu values from before any GeoMapper changes

>>> local_data[local_data["val"].div(local_data_pre["val"]) > 2.0].reset_index()["geo_id"].unique()
array(['ri', 'ut', 'nm', 'al', 'wy', 'id', 'de'], dtype=object)

These are all expected changes due to the inclusion of the counts in the Out of State and Unassigned categories.

However, I am still not sure why the new source file's state populations all of a sudden got included in the calculations.

@dshemetov
Copy link
Contributor

dshemetov commented Oct 18, 2020

Rhode Island confirmed_incidence_prop has the biggest signal change (remote is USA Facts and local is fixed JHU signal):
image

This is because we are now ingesting Rhode Island's Unassigned data (cumulative), which is a non-monotonic data stream.

84090044,US,USA,840,90044.0,Unassigned,Rhode Island,US,0.0,0.0,"Unassigned, Rhode Island, US",0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,9,0,0,0,0,0,0,10,12,8,5,23,31,33,71,107,57,171,184,160,80,146,241,238,332,353,274,551,408,649,965,965,1551,756,902,790,1146,1030,861,1000,895,1006,1132,1198,1109,1160,1267,1065,1087,1104,1142,1086,1133,1414,1686,2011,2260,2470,2755,1158,1198,1046,1227,1430,1645,1885,2006,2162,1556,1553,1528,1504,1518,1518,1663,1496,1533,1564,1748,1857,1920,2041,2148,2254,2370,2370,2370,1565,1614,1679,1785,1870,1870,1870,1569,1640,1689,1745,1813,1813,1813,1514,1588,1661,1695,1716,1716,1716,1535,1584,1624,1712,1488,1488,1488,1488,1651,1701,1740,1809,1809,1809,1984,2085,1438,1509,1591,1591,1591,1702,1784,1449,1535,1611,1611,1611,1902,2112,1556,1706,1778,1778,1778,2002,2146,1621,1751,1878,1878,1878,2074,2193,1593,1704,1799,1799,1799,2036,2156,1662,1738,1889,1889,1889,2169,2239,1645,1780,1874,1874,1874,2140,2193,1728,1793,1893,1893,1893,1893,2242,1852,1958,2081,2081,2081,2306,2426,1984,2114,2246,2246,2246,2558,2670,2148,2282,2152,2152,2152,2395,2527,2055,2221,2383,2383,2383,2726,2903,2192,2461,2710,2710,2710,2710,3376,2418,2692,2945,2945

@krivard
Copy link
Contributor Author

krivard commented Oct 20, 2020

before realizing that the breaking changes are on deploy-jhu only 🤦.

oh noooo that was me -- I merged #301 a week ago but never propagated into main. Since it's a breaking change though, I'll hold off until we have a fix.

@nmdefries nmdefries added the data quality Missing data, weird data, broken data label Nov 10, 2020
@SumitDELPHI SumitDELPHI added the Engineering Used to filter issues when synching with Asana label Dec 6, 2020
@krivard krivard closed this as completed Dec 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working data quality Missing data, weird data, broken data Engineering Used to filter issues when synching with Asana
Projects
None yet
Development

No branches or pull requests

4 participants