Skip to content

allow state-level FIPS codes for geo_type "county" #1134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

melange396
Copy link
Collaborator

@melange396 melange396 commented Apr 18, 2023

this change is intended to remedy the empty state visualization on the covidcast dashboard (ie, as seen on https://delphi.cmu.edu/covidcast/?date=20221115&region=PA) that was noticed today.

it should make it so state-level FIPS codes are also allowed as valid values for geo_type of "county" (ie, all of Pennsylvania is captured in FIPS code 42000 where individual counties can be found in the range of 42001-42999). it looks like "hhs" would work identically in place of "chng-fips" too, which kind of implies "hhs" and "county" may be redundant geo_types as stored in the epimetric tables.

this is also largely untested

this change is intended to remedy the empty state visualization on the covidcast dashboard (ie, as seen on https://delphi.cmu.edu/covidcast/?date=20221115&region=PA) that was noticed today.  

it should make it so state-level FIPS codes are also allowed as valid values for `geo_type` of "`county'" (ie, all of Pennsylvania is captured in FIPS code 42000 where individual counties can be found in the range of 42001-42999).  it looks like "`hhs`" would work identically in place of "`chng-fips`" too, which kind of implies "hhs" and "county" may be redundant geo_types.

this is also largely untested
@melange396 melange396 requested a review from krivard April 18, 2023 18:04
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@melange396 melange396 requested review from dshemetov and nmdefries and removed request for krivard April 18, 2023 18:39
@melange396
Copy link
Collaborator Author

@nmdefries @dshemetov i thought this was a simple change from pointing to this file instead of this file -- that the first column is used as the mapping keys, and the newly referenced one includes the "nn000" state codes that the original one doesnt. this is not the case, as there appears to be something fancier done when ingesting those CSVs than what happens here. can you suggest a good way to accept state codes as valid county codes?

@dshemetov
Copy link
Contributor

dshemetov commented Apr 18, 2023

I wrote a little about it here. I suggest switching to fips_state_table.csv for the fips levels instead.

Sidenote: the "chng-fips" file seems to be missing some fips codes found in the fips_pop file, probably because the mapping file didn't contain these geos.

>>> GeoMapper().get_geo_values("fips") - GeoMapper().get_geo_values("chng-fips")
{'18115', '35009', '29115', '48345', '31167', '16051', '56033', '40141', '47111', '51091', '51073', '20029', '08023', '17151', '20207', '29035', '48175', '40085', '19133', '29111', '20097', '19073', '16015', '20175', '20199', '38051', '16035', '30097', '05057', '38031', '19159', '32017', '31179', '22081', '13259', '54073', '05013', '13007', '46077', '13087', '08053', '31081', '17127', '50005', '08049', '48389', '17035', '31111', '08015', '16079', '40067', '48173', '31169', '48269', '51159', '51197', '31097', '51049', '48359', '26119', '19193', '08061', '51021', '29093', '38075', '17013', '48151', '19007', '38021', '56031', '08017', '20121', '29063', '38049', '48009', '27007', '30095', '08121', '05129', '50009', '46005', '38033', '46115', '19151', '30021', '17187', '56045', '47003', '13269', '40003', '08095', '29185', '17143', '20005', '16009', '29033', '38003', '08075', '48279', '20135', '13319', '20017', '29205', '54085', '19077', '20193', '27089', '19037', '38005', '38105', '51193', '38095', '37095', '31121', '48435', '13197', '29061', '40093', '31183', '49041', '29227', '48335', '16029', '06003', '29203', '13253', '13201', '48101', '20145', '31147', '27125', '38055', '21223', '27087', '20109', '40043', '46095', '21201', '20095', '31069', '49031', '38039', '46129', '46003', '26097', '48305', '28027', '16063', '06109', '20107', '46051', '05117', '29103', '38077', '48327', '17071', '48211', '13315', '45065', '51029', '38089', '30079', '13235', '13065', '31005', '48011', '20009', '46137', '48117', '20123', '20077', '54021', '08027', '06049', '21075', '29179', '21077', '47039', '26135', '47185', '55003', '08003', '13283', '17153', '47027', '46041', '48237', '51161', '48043', '20007', '05077', '16061', '31173', '46059', '17165', '20177', '45005', '20137', '41063', '46089', '21169', '51045', '31015', '20073', '40151', '48405', '20161', '30093', '46097', '31059', '54015', '20093', '29197', '53003', '31101', '30099', '26095', '46099', '56011', '38035', '05091', '35011', '05095', '20013', '20159', '41035', '20063', '49033', '31051', '38009', '35019', '20125', '31061', '31085', '46037', '48197', '13069', '50011', '55078', '31023', '30073', '30031', '31095', '35007', '31031', '28053', '29005', '29045', '13249', '19029', '16077', '30049', '48195', '20151', '01105', '13205', '13155', '16071', '19053', '19135', '29083', '28069', '48393', '51127', '46119', '31127', '13023', '31125', '21165', '40011', '26131', '48221', '48163', '48243', '30089', '28163', '20105', '30027', '48025', '48317', '20163', '46075', '31177', '35017', '05039', '48115', '26013', '22073', '13125', '13209', '48303', '35037', '08087', '31139', '32003', '48429', '20187', '54017', '13165', '13293', '16023', '40007', '48103', '38063', '38059', '31049', '32015', '31035', '12037', '31001', '48047', '16059', '20127', '08065', '08105', '08079', '05073', '20043', '08007', '30077', '40059', '27077', '08045', '46061', '31003', '20201', '30101', '42123', '19091', '47175', '08063', '08125', '38083', '20025', '31039', '13303', '20031', '48045', '30085', '32510', '48069', '20153', '20117', '29171', '48165', '18041', '48415', '29025', '26153', '28157', '27081', '54007', '48297', '47049', '27073', '31171', '27031', '30033', '35006', '13273', '08085', '46087', '16045', '48381', '31091', '16003', '46067', '19177', '20051', '48223', '13037', '48141', '20185', '48371', '04011', '48437', '38099', '20089', '48135', '32021', '48447', '29181', '20061', '31175', '46125', '13061', '21039', '31011', '20203', '46063', '30041', '20195', '42053', '54087', '38041', '21043', '29129', '20067', '38103', '20183', '40065', '31013', '48275', '38001', '08025', '46047', '05067', '19187', '48443', '38071', '51017', '30057', '46055', '48227', '13265', '28037', '48485', '31017', '20019', '27149', '49025', '29151', '21007', '27167', '37187', '31145', '21053', '19001', '40019', '29085', '40029', '54101', '46053', '19101', '13243', '29057', '51036', '46109', '30069', '48099', '19003', '19025', '31087', '30111', '20047', '20055', '27023', '46007', '41049', '48475', '56023', '48283', '20083', '29117', '01065', '21055', '48017', '21059', '28139', '31099', '46035', '31103', '42023', '13301', '56013', '17185', '30065', '38061', '17171', '38073', '08009', '28009', '20049', '40057', '51187', '54023', '26053', '31037', '08089', '13263', '45047', '19195', '17009', '49021', '48119', '48383', '46085', '32007', '48253', '32019', '20189', '48133', '08047', '48431', '28021', '19063', '05099', '20003', '48307', '53023', '38085', '35021', '31093', '49001', '21041', '05123', '20023', '32011', '48023', '51157', '31133', '48461', '04009', '48333', '56019', '48153', '08101', '16087', '18161', '42113', '13099', '48193', '08033', '49015', '51163', '28001', '41023', '01063', '27105', '41025', '19179', '48411', '32001', '17175', '51101', '21091', '48357', '30011', '13307', '31073', '54107', '16041', '35047', '40009', '31029', '21105', '17003', '40149', '17169', '47169', '41069', '47127', '29081', '48425', '27075', '08073', '31137', '05053', '49019', '31143', '56017', '48391', '13181', '16007', '38079', '38023', '30023', '28155', '29177', '38067', '20147', '12077', '38017', '30039', '30019', '31007', '31165', '08113', '31161', '17059', '05147', '17137', '30017', '48263', '31057', '48483', '08115', '31181', '13141', '20179', '13003', '46073', '48301', '30005', '21023', '48385', '46057', '21063', '32033', '30059', '38047', '08111', '27117', '18007', '46117', '05009', '30001', '38093', '54067', '29065', '40047', '20149', '28119', '46135', '46033', '17069', '55041', '38019', '49009', '35023', '21017', '27011', '54013', '40053', '13133', '46031', '26003', '41055', '20141', '12067', '27155', '41045', '38027', '51115', '37075', '49047', '46065', '48087', '41021', '48403', '54105', '20075', '13033', '20057', '26083', '05097', '13177', '13239', '46105', '48095', '47087', '21237', '56003', '48319', '48323', '30037', '45011', '47137', '48033', '48083', '31157', '06057', '30103', '46091', '16025', '40045', '48107', '55051', '48229', '47083', '35059', '13081', '37177', '21149', '18171', '13317', '48311', '46101', '31107', '30075', '08021', '48105', '13009', '20167', '21189', '46015', '17155', '19039', '40005', '46017', '17065', '28005', '27029', '19175', '48059', '21087', '46029', '19093', '29211', '29067', '36043', '20071', '30091', '48007', '12121', '19051', '30045', '27051', '08103', '54093', '06091', '21181', '08091', '48267', '31083', '49055', '48235', '49017', '55115', '47067', '20119', '21187', '38007', '22123', '48463', '48111', '20027', '20001', '20069', '31149', '48489', '30071', '08099', '46071', '31009', '13279', '48433', '30105', '31033', '30043', '46069', '28019', '40025', '20081', '38015', '29075', '16047', '38101', '20165', '31065', '13107', '13271', '46107', '48155', '40039', '22019', '27101', '13287', '06035', '16069', '46043', '19147', '13101', '21129', '08039', '46045', '22021', '29039', '40073', '36041', '17193', '31163', '47073', '05101', '48179', '53065', '54071', '08011', '19129', '20171', '53019', '56043', '48495', '56035', '20129', '55091', '38057', '41037', '31045', '40075', '05109', '27173', '46019', '38081', '22013', '27107', '31123', '48129', '46011', '22107', '29167', '38037', '46111', '28075', '05089', '29087', '40031', '30051', '31063', '41065', '27151', '31135', '48191', '20205', '31115', '31077', '42015', '35033', '20101', '40055', '19071', '46079', '16033', '46093', '19117', '20053', '13289', '29147', '13189', '18155', '48207', '46025', '31089', '08109', '29125', '21057', '27133', '54075', '55037', '29079', '08057', '48353', '48295', '13017', '20197', '38029', '48075', '40139', '48451', '20065', '48049', '19009', '38043', '42105', '48363', '30015', '13193', '48205', '48271', '16037', '56027', '31113', '46039', '30087', '16053', '30109', '21143', '53049', '20033', '47135', '38025', '19143', '48247', '48501', '47161', '38011', '31075', '46049', '40033', '55011', '46023', '21139', '46123', '48341', '20039', '51097', '49005', '48413', '38013', '20143', '19089', '56015', '38091', '28125', '08043', '29041', '29137', '48261', '41061', '31151', '30061', '37043', '30055', '48169', '19035', '31117', '31021', '08019', '38087', '19185', '53013', '17123', '48109', '13309', '01047', '54095', '53069', '31185', '46121', '31027', '32029', '47045', '17079', '22023', '48125', '20157', '51095', '40129', '48065', '38069', '32009', '38065', '13167', '48233', '48377', '48137', '48079', '31071', '16057', '17047', '30013', '27113', '22041', '31129', '22035', '48417', '27027', '30007', '17083', '46009', '29199', '48081', '40137', '30025', '48265', '29153', '47095', '26061', '51181', '31105', '28063', '32027', '08055', '35003', '48369', '19033', '19173', '28055', '46021', '19141', '27069', '30107', '13053', '50013', '05025', '46013', '21033', '05011', '31131', '38045', '20181', '38097', '48421'}

@krivard
Copy link
Contributor

krivard commented Apr 18, 2023

Sidenote: the "chng-fips" file seems to be missing some fips codes found in the fips_pop file

That is by design. CHNG wants to censor the counts for those fips, and instead report those counts in small groupings of those fips. Each group lies within a single state, and the membership in each group is intended to stay consistent across time. You can get the group codes if you do the opposite set operation:

>>> m.get_geo_values("chng-fips") - m.get_geo_values("fips")
{'42g03', '17000', '16g10', '13g17', '13g01', '16g04', '21g12', ...

lol though that 17000 is the exact bug we're trying to squash!

So is the fix to change GeoMapper to use fips_state_table.csv, release covidcast-indicators, then just rebuild delphi-epidata so that it grabs the new version of delphi-utils?

@melange396
Copy link
Collaborator Author

the difference isnt in those csv files. this grabs the first column from each file, sorts them, and then diffs them. the first diff returns nothing, showing there is nothing in the pop file that isnt in the chng. the second diff returns nothing, indicating that the only things in chng that arent in pop are the "nn000" codes

cd covidcast-indicators/_delphi_utils_python/delphi_utils/data/2020
cut -f1 -d, fips_pop.csv | sort > fips_pop.csv__c1sort
cut -f1 -d, fips_chng-fips_table.csv | sort > fips_chng-fips_table.csv__c1sort
diff fips_chng-fips_table.csv__c1sort fips_pop.csv__c1sort | grep ^\> 
diff fips_chng-fips_table.csv__c1sort fips_pop.csv__c1sort | grep ^\< | grep -v 000$

so i guess the crosswalking is doing something funky

@dshemetov
Copy link
Contributor

@krivard Ah right, the censoring! Makes sense!

And yup, adjusting GeoMapper would be my suggestion.

@dshemetov
Copy link
Contributor

dshemetov commented Apr 18, 2023

@melange396 huh, you're right. Very weird.

So the flow is:

get_geo_values("chng-fips") ->
self._geo_sets["chng-fips"] ->
self._load_geo_values("chng-fips") ->
self._crosswalks["fips"]["chng-fips"] ->
(access the "chng-fips" column of the dataframe and take set)

I see, so get_geo_values("chng-fips") only returns the counties on the right column of fips_chng-fips_table.csv, which censors the counties, like Katie mentioned.

@krivard
Copy link
Contributor

krivard commented Apr 18, 2023

Why weird?

We know the fips -> chng-fips mapping includes xx000 fips values

We know chng-fips values replace some fips with xxgyy group codes

Isn't that consistent with George's findings? or at least, consistent with the shell output he got. it's just that the differences are in the .csv files, but you have to look at all the columns, not just the first column

@dshemetov
Copy link
Contributor

dshemetov commented Apr 18, 2023

Ah sorry that was my initial reaction to George's line diff not showing the same differences I got with GeoMapper().get_geo_values("fips") - GeoMapper().get_geo_values("chng-fips"). It makes sense now, since that diff is probably looking at column 1 of fips_chng-fips_table.csv, yea👍

@melange396
Copy link
Collaborator Author

are you guys at all worried that changing the implementation of GeoMapper will have other unintended consequences?

melange396 added a commit to cmu-delphi/covidcast-indicators that referenced this pull request Apr 18, 2023
so that `delphi_utils.geomap.GeoMapper().get_geo_values('fips')` also includes (for example) "42000" representing all of Pennsylvania, in addition to FIPS codes in the 42001-42999 interval that represent individual counties.

see discussion at cmu-delphi/delphi-epidata#1134
@krivard
Copy link
Contributor

krivard commented Apr 18, 2023

here's everywhere we're using get_geo_values()

  • hhs_hosp: states only; unaffected
  • changehc: upper bound on loc-date pairs; improvement
  • validator: validation; improvement (we can drop the manual megacounty handling there at our leisure)
  • geomap tests: will need repair
  • nchs: states only; unaffected

@dshemetov
Copy link
Contributor

dshemetov commented Apr 18, 2023

Also searched the whole org for uses of that function and it's only used again in that server geo-validator code in _params.py, so we should be safe 👍

@melange396 melange396 closed this Apr 18, 2023
@melange396 melange396 deleted the state_fips_codes_as_counties branch April 18, 2023 21:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants