-
Notifications
You must be signed in to change notification settings - Fork 16
Add geomap utilities, hosp integration #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 14 commits
6398f0a
9318662
a8f34b1
68fee88
d9d0db0
655dcf6
d9a1cc4
1610ad1
a114175
873a3f7
c473e8b
79c0cd8
c170fd2
647aca3
2fb7f66
fed6bf1
e080035
48de2d4
885155d
a6aa219
206bcc1
8ec3b13
325f154
9854433
78e145e
eee0acf
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,74 @@ | ||||||||||||||
# Geocoding data processing pipeline | ||||||||||||||
|
||||||||||||||
Authors: Jingjing Tang, James Sharpnack | ||||||||||||||
|
||||||||||||||
The data_proc/geomap directory contains original source data, processing scripts, and notes for processing from original source to crosswalk tables in the data directory for the delphi_utils package. | ||||||||||||||
|
||||||||||||||
## Usage | ||||||||||||||
|
||||||||||||||
Requires the following source files below. | ||||||||||||||
|
||||||||||||||
Run the following to write the cross files in the package data dir... | ||||||||||||||
Comment on lines
+9
to
+11
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
|
||||||||||||||
``` | ||||||||||||||
$ python geo_data_proc.py | ||||||||||||||
``` | ||||||||||||||
this will build the following files... | ||||||||||||||
- fips_msa_cross.csv | ||||||||||||||
- zip_fips_cross.csv | ||||||||||||||
- state_codes.csv | ||||||||||||||
|
||||||||||||||
You can see consistency checks and diffs with old sources in ./consistency_checks.ipynb | ||||||||||||||
|
||||||||||||||
## Source files | ||||||||||||||
|
||||||||||||||
1. 03_20_MSAs.xls : [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html) | ||||||||||||||
2. 02_20_uszips.csv : Hand edited file from Jingjing, we only use the fips,zip encoding and also extract the states from these | ||||||||||||||
|
||||||||||||||
## Todo 07/07/2020 | ||||||||||||||
|
||||||||||||||
- go through the trans files | ||||||||||||||
|
||||||||||||||
## Notes | ||||||||||||||
|
||||||||||||||
Some of the source files were constructed by hand, most notably 02_20_uszips.csv. | ||||||||||||||
|
||||||||||||||
The 02_20_uszips.csv file is based on the newest consensus data including 5-digit zipcode, fips code, county name, state, population, HRR, HSA (I downloaded the original file from here https://simplemaps.com/data/us-zips. This file matches best to the most recent (2020) situation in terms of the population. But there still exist some matching problems. I manually checked and corrected those lines (~20) with zip-codes.com (https://www.zip-codes.com/zip-code/58439/zip-code-58439.asp). The mapping from 5-digit zipcode to HRR is based on the file in 2017 version downloaded from https://atlasdata.dartmouth.edu/static/supp_research_data | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How much of this construction history do we need to retain in this repository? Could some of this be replaced with shorter explanatory text (e.g. "We tried incorporating data from the cbsatocountycrosswalk.csv file at https://data.nber.org/data/cbsa-fips-county-crosswalk.html but it was worse" instead of the 4/15 and 4/19 entries)? The population files referenced in the 6/15 log are not included here; can we remove that log entry? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dmitry will be cleaning this README up with his PR. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted. |
||||||||||||||
|
||||||||||||||
transStateToHRR.csv and transfipsToHRR.csv are used to transform data from state level or county level to HRR respectively. For example, x is the horizontal vector of covid cases for different states in 04/10/20, then we have x @ H = y, where H is the table provided in these two csv files and y is a horizontal vector of covid cases for different HRRs. | ||||||||||||||
|
||||||||||||||
HRRs are represented by hrrnum. There are 306 hrrs in total. They are not named as consecutive numbers. | ||||||||||||||
|
||||||||||||||
-Jingjing | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
04/14/20: 'msa_id' and 'msa_name' are added according to the msa_list.csv that Aaron found from https://apps.bea.gov/regional/docs/msalist.cfm (2019) | ||||||||||||||
|
||||||||||||||
04/15/20: | ||||||||||||||
The newly updated(added columns) are based on cbsatocountycrosswalk.csv from https://data.nber.org/data/cbsa-fips-county-crosswalk.html | ||||||||||||||
- 'msa' : MSA ID | ||||||||||||||
- 'msaname': Name of the MSA | ||||||||||||||
- 'cbsa': CBSA ID | ||||||||||||||
- 'cbsaname': Name of the CBSA | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
04/19/20: | ||||||||||||||
Changed to msa_list.csv again. | ||||||||||||||
|
||||||||||||||
05/20/20: Updated msa_list.csv to include MSAs in Puerto Rico, using the delineations file from March 2020: https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html | ||||||||||||||
|
||||||||||||||
06/15/20: | ||||||||||||||
Added file co-est2019-annres.csv, which gives 2019 population estimates for each county by name | ||||||||||||||
|
||||||||||||||
Source: Annual Estimates of the Resident Population for Counties in the United States: April 1, 2010 to July 1, 2019 (CO-EST2019-ANNRES). U.S. Census Bureau, Population Division. Release Date: March 2020 | ||||||||||||||
Note: The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. All geographic boundaries for the 2019 population estimates are as of January 1, 2019. For population estimates methodology statements, see http://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html. | ||||||||||||||
|
||||||||||||||
Note: The 6,222 people in Bedford city, Virginia, which was an independent city as of the 2010 Census, are not included in the April 1, 2010 Census enumerated population presented in the county estimates. In July 2013, the legal status of Bedford changed from a city to a town and it became dependent within (or part of) Bedford County, Virginia. This population of Bedford town is now included in the April 1, 2010 estimates base and all July 1 estimates for Bedford County. Because it is no longer an independent city, Bedford town is not listed in this table. As a result, the sum of the April 1, 2010 census values for Virginia counties and independent cities does not equal the 2010 Census count for Virginia, and the sum of April 1, 2010 census values for all counties and independent cities in the United States does not equal the 2010 Census count for the United States. Substantial geographic changes to counties can be found on the Census Bureau website at https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.html. | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
07/07/2020: | ||||||||||||||
Introduced the March 2020 MSA file, source is [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data). | ||||||||||||||
|
||||||||||||||
07/08/2020: | ||||||||||||||
We are reserving 00001-00099 for states codes of the form 100XX where XX is the fips code for the state. In the case that the CBSA codes change then it should be verified that these are not used. The current smallest CBSA is 10100. | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's a typo somewhere here; either we should reserve _1_0001-_1_0099, or the codes should be of the form _0_00XX. I vote for the former, since the existing API validation code enforces MSAs to be in the range 10000-99999. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was inconsistent here, and the code actually uses 000XX. Will more everything to 100XX convention. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted. |
||||||||||||||
|
||||||||||||||
-James |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
https://www.nrcs.usda.gov/wps/portal/nrcs/detail/pa/home/?cid=nrcs143_013710,, | ||
State,Before,After | ||
CO,8031,8001 | ||
WY,parts of 56029,56047 | ||
WY,56039,56047 | ||
MO,29510,29189 | ||
VA,51540,51003 | ||
VA,51560,51005 | ||
VA,51580,51005 | ||
VA,51790,51015 | ||
VA,�51820,51015 | ||
VA,51515,51019 | ||
VA,51640,51035 | ||
VA,51570,51041 | ||
VA,51013,51059 | ||
VA,51510,51059 | ||
VA,51600,51059 | ||
VA,51610,51059 | ||
VA,51840,�51069 | ||
VA,51595,51081 | ||
VA,51780,51083 | ||
VA,51690,51089 | ||
VA,51830,51095 | ||
VA,51750,51121 | ||
VA,51590,51143 | ||
VA,51670,51149 | ||
VA,51730,51149 | ||
VA,51683,51153 | ||
VA,�51685,51153 | ||
VA,51770,51161 | ||
VA,51775,51161 | ||
VA,51530,51163 | ||
VA,51678,51163 | ||
VA,51660,51165 | ||
VA,51620,51175 | ||
VA,51630,51177 | ||
VA,51520,51191 | ||
VA,51720,51195 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.