Skip to content

Zips fips crosswalk #1512

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 4, 2022
8 changes: 4 additions & 4 deletions _delphi_utils_python/data_proc/geomap/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,11 @@ You can see consistency checks and diffs with old sources in ./consistency_check

We support the following geocodes.

- The ZIP code and the FIPS code are the most granular geocodes we support.
- The ZIP code and the FIPS code are the most granular geocodes we support.
- The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros).
- The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information).
- The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html).
- We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
- We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used.
- State codes are a series of equivalent identifiers for US state. They include the state name, the state number (state_id), and the state two-letter abbreviation (state_code). The state number is the state FIPS code. See [here](https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations) for more.
- The Hospital Referral Region (HRR) and the Hospital Service Area (HSA). More information [here](https://www.dartmouthatlas.org/covid-19/hrr-mapping/).
- The JHU signal contains its own geographic identifier, labeled the UID. Documentation is provided at [their repo](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data#uid-lookup-table-logic). Its FIPS codes depart in some special cases, so we produce manual changes listed below.
Expand All @@ -30,7 +30,7 @@ We support the following geocodes.

The source files are requested from a government URL when `geo_data_proc.py` is run (see the top of said script for the URLs). Below we describe the locations to find updated versions of the source files, if they are ever needed.

- ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two.
- ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two. 24 ZIPs did not have population number information associated to them, so we filled those values manually using information available in [zipdatamaps website](www.zipdatamaps.com).
- ZIP -> HRR -> HSA crosswalk file comes from the 2018 version at the [Dartmouth Atlas Project](https://atlasdata.dartmouth.edu/static/supp_research_data).
- FIPS -> MSA crosswalk file comes from the September 2018 version of the delineation files at the [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html).
- State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3). The first two digits of a FIPS codes should match the state code here.
Expand Down Expand Up @@ -60,6 +60,6 @@ The rest of the crosswalk tables are derived from the mappings above. We provide
- MSA tables from March 2020 [here](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data).
- MSA tables from 2019 [here](https://apps.bea.gov/regional/docs/msalist.cfm)

## Notes
## Notes

- The NAs in the coding currently zero-fills.
21 changes: 21 additions & 0 deletions _delphi_utils_python/data_proc/geomap/geo_data_proc.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,7 @@ def create_jhu_uid_fips_crosswalk():
]
)


jhu_df = pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str}).query("Country_Region == 'US'")
jhu_df = jhu_df.rename(columns={"UID": "jhu_uid", "FIPS": "fips"}).dropna(subset=["fips"])

Expand Down Expand Up @@ -336,6 +337,7 @@ def create_hhs_population_table():
state_pop = pd.read_csv(join(OUTPUT_DIR, STATE_POPULATION_OUT_FILENAME), dtype={"state_code": str, "hhs": int}, usecols=["state_code", "pop"])
state_hhs = pd.read_csv(join(OUTPUT_DIR, STATE_HHS_OUT_FILENAME), dtype=str)
hhs_pop = state_pop.merge(state_hhs, on="state_code").groupby("hhs", as_index=False).sum()

hhs_pop.sort_values("hhs").to_csv(join(OUTPUT_DIR, HHS_POPULATION_OUT_FILENAME), index=False)


Expand Down Expand Up @@ -363,6 +365,25 @@ def derive_zip_population_table():
df = census_pop.merge(fz_df, on="fips", how="left")
df["pop"] = df["pop"].multiply(df["weight"], axis=0)
df = df.drop(columns=["fips", "weight"]).groupby("zip").sum().dropna().reset_index()
## filling population NAs for specific zips on zip_pop_missing Issue #0648
## cheking if each zip still missing, and concatenating if True

zip_pop_missing = pd.DataFrame(
{
"zip": ['57756', '57764', '57770', '57772', '57794', '99554', '99563', '99566',
'99573', '99574', '99581', '99585', '99586', '99604', '99620', '99632',
'99650', '99657', '99658', '99662', '99666', '99677', '99686', '99693'],
"pop": [1126, 1923, 5271, 2048, 644, 677, 938, 192,
1115, 2348, 762, 417, 605, 1093, 577, 813,
568, 329, 329, 480, 189, 88, 4005, 248]
}
)

for x_zip in zip_pop_missing['zip']:
if x_zip not in df['zip']:
df = pd.concat([df, zip_pop_missing[zip_pop_missing['zip'] == x_zip]],
ignore_index=True)

df["pop"] = df["pop"].astype(int)
df.sort_values("zip").to_csv(join(OUTPUT_DIR, ZIP_POPULATION_OUT_FILENAME), index=False)

Expand Down
24 changes: 24 additions & 0 deletions _delphi_utils_python/delphi_utils/data/2019/zip_pop.csv
Original file line number Diff line number Diff line change
Expand Up @@ -19549,15 +19549,19 @@ zip,pop
57752,317
57754,4067
57755,119
57756,1126
57758,219
57759,585
57760,1395
57761,1360
57762,547
57763,266
57764,1923
57766,224
57767,167
57769,3915
57770,5271
57772,2048
57773,145
57775,251
57776,15
Expand All @@ -19572,6 +19576,7 @@ zip,pop
57791,213
57792,143
57793,2061
57794,644
57799,707
58001,54
58002,27
Expand Down Expand Up @@ -32756,29 +32761,37 @@ zip,pop
99551,677
99552,373
99553,1092
99554,677
99555,224
99556,2659
99557,784
99558,79
99559,8248
99561,451
99563,938
99564,88
99565,76
99566,183
99566,192
99567,9090
99568,295
99569,64
99571,170
99572,313
99573,1115
99573,1064
99574,2348
99574,2242
99575,113
99576,2640
99577,25433
99578,319
99579,106
99580,116
99581,762
99583,37
99585,417
99586,605
99586,577
99587,2220
99588,958
Expand All @@ -32787,6 +32800,7 @@ zip,pop
99591,107
99602,166
99603,10427
99604,1093
99605,223
99606,457
99607,233
Expand All @@ -32797,6 +32811,7 @@ zip,pop
99613,372
99614,690
99615,12347
99620,577
99621,781
99622,346
99624,86
Expand All @@ -32806,6 +32821,7 @@ zip,pop
99628,449
99630,206
99631,241
99632,813
99633,456
99634,382
99636,517
Expand All @@ -32820,18 +32836,23 @@ zip,pop
99647,42
99648,110
99649,78
99650,568
99651,65
99652,4506
99653,155
99654,63494
99655,723
99656,27
99657,329
99658,329
99659,422
99660,503
99661,1040
99662,480
99663,438
99664,5226
99665,77
99666,189
99667,79
99668,92
99669,15038
Expand All @@ -32840,6 +32861,7 @@ zip,pop
99672,3925
99674,1757
99676,1881
99677,88
99677,84
99678,903
99679,403
Expand All @@ -32850,11 +32872,13 @@ zip,pop
99684,725
99685,4438
99686,3824
99686,4005
99688,3358
99689,579
99690,302
99691,88
99692,159
99693,248
99693,236
99694,1840
99695,6
Expand Down