diff --git a/_delphi_utils_python/data_proc/geomap/README.md b/_delphi_utils_python/data_proc/geomap/README.md index 864946cb3..84fdbefb2 100644 --- a/_delphi_utils_python/data_proc/geomap/README.md +++ b/_delphi_utils_python/data_proc/geomap/README.md @@ -17,11 +17,11 @@ You can see consistency checks and diffs with old sources in ./consistency_check We support the following geocodes. -- The ZIP code and the FIPS code are the most granular geocodes we support. +- The ZIP code and the FIPS code are the most granular geocodes we support. - The [ZIP code](https://en.wikipedia.org/wiki/ZIP_Code) is a US postal code used by the USPS and the [FIPS code](https://en.wikipedia.org/wiki/FIPS_county_code) is an identifier for US counties and other associated territories. The ZIP code is five digit code (with leading zeros). - The FIPS code is a five digit code (with leading zeros), where the first two digits are a two-digit state code and the last three are a three-digit county code (see this [US Census Bureau page](https://www.census.gov/library/reference/code-lists/ansi.html) for detailed information). - The Metropolitan Statistical Area (MSA) code refers to regions around cities (these are sometimes referred to as CBSA codes). More information on these can be found at the [US Census Bureau](https://www.census.gov/programs-surveys/metro-micro/about.html). - - We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used. + - We are reserving 10001-10099 for states codes of the form 100XX where XX is the FIPS code for the state (the current smallest CBSA is 10100). In the case that the CBSA codes change then it should be verified that these are not used. - State codes are a series of equivalent identifiers for US state. They include the state name, the state number (state_id), and the state two-letter abbreviation (state_code). The state number is the state FIPS code. See [here](https://en.wikipedia.org/wiki/List_of_U.S._state_and_territory_abbreviations) for more. - The Hospital Referral Region (HRR) and the Hospital Service Area (HSA). More information [here](https://www.dartmouthatlas.org/covid-19/hrr-mapping/). - The JHU signal contains its own geographic identifier, labeled the UID. Documentation is provided at [their repo](https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data#uid-lookup-table-logic). Its FIPS codes depart in some special cases, so we produce manual changes listed below. @@ -30,7 +30,7 @@ We support the following geocodes. The source files are requested from a government URL when `geo_data_proc.py` is run (see the top of said script for the URLs). Below we describe the locations to find updated versions of the source files, if they are ever needed. -- ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two. +- ZIP -> FIPS (county) population tables available from [US Census](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622). This file contains the population of the intersections between ZIP and FIPS regions, allowing the creation of a population-weighted transform between the two. As of 4 February 2022, this source did not include population information for 24 ZIPs that appear in our indicators. We have added those values manually using information available from the [zipdatamaps website](www.zipdatamaps.com). - ZIP -> HRR -> HSA crosswalk file comes from the 2018 version at the [Dartmouth Atlas Project](https://atlasdata.dartmouth.edu/static/supp_research_data). - FIPS -> MSA crosswalk file comes from the September 2018 version of the delineation files at the [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). - State Code -> State ID -> State Name comes from the ANSI standard at the [US Census](https://www.census.gov/library/reference/code-lists/ansi.html#par_textimage_3). The first two digits of a FIPS codes should match the state code here. @@ -60,6 +60,6 @@ The rest of the crosswalk tables are derived from the mappings above. We provide - MSA tables from March 2020 [here](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data). - MSA tables from 2019 [here](https://apps.bea.gov/regional/docs/msalist.cfm) -## Notes +## Notes - The NAs in the coding currently zero-fills. diff --git a/_delphi_utils_python/data_proc/geomap/geo_data_proc.py b/_delphi_utils_python/data_proc/geomap/geo_data_proc.py index 8ca4057e1..45e4e4ee3 100755 --- a/_delphi_utils_python/data_proc/geomap/geo_data_proc.py +++ b/_delphi_utils_python/data_proc/geomap/geo_data_proc.py @@ -32,6 +32,7 @@ FIPS_POPULATION_URL = f"https://www2.census.gov/programs-surveys/popest/datasets/2010-{YEAR}/counties/totals/co-est{YEAR}-alldata.csv" FIPS_PUERTO_RICO_POPULATION_URL = "https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt?" STATE_HHS_FILE = "hhs.txt" +ZIP_POP_MISSING_FILE = "zip_pop_filling.csv" # Out files FIPS_STATE_OUT_FILENAME = "fips_state_table.csv" @@ -181,6 +182,7 @@ def create_jhu_uid_fips_crosswalk(): ] ) + jhu_df = pd.read_csv(JHU_FIPS_URL, dtype={"UID": str, "FIPS": str}).query("Country_Region == 'US'") jhu_df = jhu_df.rename(columns={"UID": "jhu_uid", "FIPS": "fips"}).dropna(subset=["fips"]) @@ -336,6 +338,7 @@ def create_hhs_population_table(): state_pop = pd.read_csv(join(OUTPUT_DIR, STATE_POPULATION_OUT_FILENAME), dtype={"state_code": str, "hhs": int}, usecols=["state_code", "pop"]) state_hhs = pd.read_csv(join(OUTPUT_DIR, STATE_HHS_OUT_FILENAME), dtype=str) hhs_pop = state_pop.merge(state_hhs, on="state_code").groupby("hhs", as_index=False).sum() + hhs_pop.sort_values("hhs").to_csv(join(OUTPUT_DIR, HHS_POPULATION_OUT_FILENAME), index=False) @@ -363,6 +366,18 @@ def derive_zip_population_table(): df = census_pop.merge(fz_df, on="fips", how="left") df["pop"] = df["pop"].multiply(df["weight"], axis=0) df = df.drop(columns=["fips", "weight"]).groupby("zip").sum().dropna().reset_index() + + ## loading populatoin of some zips- #Issue 0648 + zip_pop_missing = pd.read_csv( + ZIP_POP_MISSING_FILE,sep=",", + dtype={"zip":str,"pop":np.int32} + ) + ## cheking if each zip still missing, and concatenating if True + for x_zip in zip_pop_missing['zip']: + if x_zip not in df['zip']: + df = pd.concat([df, zip_pop_missing[zip_pop_missing['zip'] == x_zip]], + ignore_index=True) + df["pop"] = df["pop"].astype(int) df.sort_values("zip").to_csv(join(OUTPUT_DIR, ZIP_POPULATION_OUT_FILENAME), index=False) diff --git a/_delphi_utils_python/data_proc/geomap/zip_pop_filling.csv b/_delphi_utils_python/data_proc/geomap/zip_pop_filling.csv new file mode 100644 index 000000000..fef98c92e --- /dev/null +++ b/_delphi_utils_python/data_proc/geomap/zip_pop_filling.csv @@ -0,0 +1,25 @@ +zip,pop +57756,1126 +57764,1923 +57770,5271 +57772,2048 +57794,644 +99554,677 +99563,938 +99566,192 +99573,1115 +99574,2348 +99581,762 +99585,417 +99586,605 +99604,1093 +99620,577 +99632,813 +99650,568 +99657,329 +99658,616 +99662,480 +99666,189 +99677,88 +99686,4005 +99693,248 diff --git a/_delphi_utils_python/delphi_utils/data/2019/zip_pop.csv b/_delphi_utils_python/delphi_utils/data/2019/zip_pop.csv index 5c95ac758..4bb1b1e6c 100644 --- a/_delphi_utils_python/delphi_utils/data/2019/zip_pop.csv +++ b/_delphi_utils_python/delphi_utils/data/2019/zip_pop.csv @@ -19549,15 +19549,19 @@ zip,pop 57752,317 57754,4067 57755,119 +57756,1126 57758,219 57759,585 57760,1395 57761,1360 57762,547 57763,266 +57764,1923 57766,224 57767,167 57769,3915 +57770,5271 +57772,2048 57773,145 57775,251 57776,15 @@ -19572,6 +19576,7 @@ zip,pop 57791,213 57792,143 57793,2061 +57794,644 57799,707 58001,54 58002,27 @@ -32756,21 +32761,26 @@ zip,pop 99551,677 99552,373 99553,1092 +99554,677 99555,224 99556,2659 99557,784 99558,79 99559,8248 99561,451 +99563,938 99564,88 99565,76 99566,183 +99566,192 99567,9090 99568,295 99569,64 99571,170 99572,313 +99573,1115 99573,1064 +99574,2348 99574,2242 99575,113 99576,2640 @@ -32778,7 +32788,10 @@ zip,pop 99578,319 99579,106 99580,116 +99581,762 99583,37 +99585,417 +99586,605 99586,577 99587,2220 99588,958 @@ -32787,6 +32800,7 @@ zip,pop 99591,107 99602,166 99603,10427 +99604,1093 99605,223 99606,457 99607,233 @@ -32797,6 +32811,7 @@ zip,pop 99613,372 99614,690 99615,12347 +99620,577 99621,781 99622,346 99624,86 @@ -32806,6 +32821,7 @@ zip,pop 99628,449 99630,206 99631,241 +99632,813 99633,456 99634,382 99636,517 @@ -32820,18 +32836,23 @@ zip,pop 99647,42 99648,110 99649,78 +99650,568 99651,65 99652,4506 99653,155 99654,63494 99655,723 99656,27 +99657,329 +99658,616 99659,422 99660,503 99661,1040 +99662,480 99663,438 99664,5226 99665,77 +99666,189 99667,79 99668,92 99669,15038 @@ -32840,6 +32861,7 @@ zip,pop 99672,3925 99674,1757 99676,1881 +99677,88 99677,84 99678,903 99679,403 @@ -32850,11 +32872,13 @@ zip,pop 99684,725 99685,4438 99686,3824 +99686,4005 99688,3358 99689,579 99690,302 99691,88 99692,159 +99693,248 99693,236 99694,1840 99695,6