Skip to content

Add geomap utilities, hosp integration #137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 26 commits into from
Aug 18, 2020
Merged
Show file tree
Hide file tree
Changes from 14 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
6398f0a
added cross generation, load state and zip -> fips tested
jsharpna Jun 30, 2020
9318662
add GeoMapper, docstrings, tests
jsharpna Jul 1, 2020
a8f34b1
add zip->county mapping
jsharpna Jul 2, 2020
68fee88
added mega v0.1
jsharpna Jul 2, 2020
d9d0db0
debug gmpr use in hosp
jsharpna Jul 6, 2020
655dcf6
add gmpr to util init
jsharpna Jul 6, 2020
d9a1cc4
fix bug from emr
jsharpna Jul 7, 2020
1610ad1
add data processing script, source files, and derived files
jsharpna Jul 8, 2020
a114175
changed filenames in geomap
jsharpna Jul 8, 2020
873a3f7
Update README.md
jsharpna Jul 8, 2020
c473e8b
added consistency checks, rest of msa
jsharpna Jul 8, 2020
79c0cd8
debug msa rest of state
jsharpna Jul 8, 2020
c170fd2
Merge branch 'rf_geo' of https://github.com/cmu-delphi/covidcast-indi…
jsharpna Jul 8, 2020
647aca3
Merge branch 'main' into rf_geo
jsharpna Jul 8, 2020
2fb7f66
added jhu to fips converters
jsharpna Jul 14, 2020
fed6bf1
add jhu uid to county
jsharpna Jul 22, 2020
e080035
refactor jhu, add fips to zip utility
jsharpna Jul 30, 2020
48de2d4
finish hrr, debug jhu, needs stable comp
jsharpna Aug 4, 2020
885155d
finish hrr, debug jhu, needs stable comp
jsharpna Aug 4, 2020
a6aa219
compared msa versions
jsharpna Aug 5, 2020
206bcc1
mod compare rec
jsharpna Aug 5, 2020
8ec3b13
butcher compare_receiving script
jsharpna Aug 5, 2020
325f154
Merge branch 'main' into rf_geo
jsharpna Aug 5, 2020
9854433
2018 msa raw data
jsharpna Aug 6, 2020
78e145e
add mega functionality, struggling with 0.0 bug
jsharpna Aug 7, 2020
eee0acf
fix msa mega
jsharpna Aug 18, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33,100 changes: 33,100 additions & 0 deletions _delphi_utils_python/data_proc/geomap/02_20_uszips.csv

Large diffs are not rendered by default.

Binary file not shown.
74 changes: 74 additions & 0 deletions _delphi_utils_python/data_proc/geomap/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
# Geocoding data processing pipeline

Authors: Jingjing Tang, James Sharpnack

The data_proc/geomap directory contains original source data, processing scripts, and notes for processing from original source to crosswalk tables in the data directory for the delphi_utils package.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The data_proc/geomap directory contains original source data, processing scripts, and notes for processing from original source to crosswalk tables in the data directory for the delphi_utils package.
The `data_proc/geomap` directory contains original source data, processing scripts, and notes for generating the crosswalk tables in the data directory of the delphi_utils package.


## Usage

Requires the following source files below.

Run the following to write the cross files in the package data dir...
Comment on lines +9 to +11
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Requires the following source files below.
Run the following to write the cross files in the package data dir...
First, acquire the source files listed below.
Then run the following to write the crosswalk files in the package data dir...

```
$ python geo_data_proc.py
```
this will build the following files...
- fips_msa_cross.csv
- zip_fips_cross.csv
- state_codes.csv

You can see consistency checks and diffs with old sources in ./consistency_checks.ipynb

## Source files

1. 03_20_MSAs.xls : [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html)
2. 02_20_uszips.csv : Hand edited file from Jingjing, we only use the fips,zip encoding and also extract the states from these

## Todo 07/07/2020

- go through the trans files

## Notes

Some of the source files were constructed by hand, most notably 02_20_uszips.csv.

The 02_20_uszips.csv file is based on the newest consensus data including 5-digit zipcode, fips code, county name, state, population, HRR, HSA (I downloaded the original file from here https://simplemaps.com/data/us-zips. This file matches best to the most recent (2020) situation in terms of the population. But there still exist some matching problems. I manually checked and corrected those lines (~20) with zip-codes.com (https://www.zip-codes.com/zip-code/58439/zip-code-58439.asp). The mapping from 5-digit zipcode to HRR is based on the file in 2017 version downloaded from https://atlasdata.dartmouth.edu/static/supp_research_data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much of this construction history do we need to retain in this repository?

Could some of this be replaced with shorter explanatory text (e.g. "We tried incorporating data from the cbsatocountycrosswalk.csv file at https://data.nber.org/data/cbsa-fips-county-crosswalk.html but it was worse" instead of the 4/15 and 4/19 entries)? The population files referenced in the 6/15 log are not included here; can we remove that log entry?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dmitry will be cleaning this README up with his PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.


transStateToHRR.csv and transfipsToHRR.csv are used to transform data from state level or county level to HRR respectively. For example, x is the horizontal vector of covid cases for different states in 04/10/20, then we have x @ H = y, where H is the table provided in these two csv files and y is a horizontal vector of covid cases for different HRRs.

HRRs are represented by hrrnum. There are 306 hrrs in total. They are not named as consecutive numbers.

-Jingjing


04/14/20: 'msa_id' and 'msa_name' are added according to the msa_list.csv that Aaron found from https://apps.bea.gov/regional/docs/msalist.cfm (2019)

04/15/20:
The newly updated(added columns) are based on cbsatocountycrosswalk.csv from https://data.nber.org/data/cbsa-fips-county-crosswalk.html
- 'msa' : MSA ID
- 'msaname': Name of the MSA
- 'cbsa': CBSA ID
- 'cbsaname': Name of the CBSA


04/19/20:
Changed to msa_list.csv again.

05/20/20: Updated msa_list.csv to include MSAs in Puerto Rico, using the delineations file from March 2020: https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html

06/15/20:
Added file co-est2019-annres.csv, which gives 2019 population estimates for each county by name

Source: Annual Estimates of the Resident Population for Counties in the United States: April 1, 2010 to July 1, 2019 (CO-EST2019-ANNRES). U.S. Census Bureau, Population Division. Release Date: March 2020
Note: The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. All geographic boundaries for the 2019 population estimates are as of January 1, 2019. For population estimates methodology statements, see http://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html.

Note: The 6,222 people in Bedford city, Virginia, which was an independent city as of the 2010 Census, are not included in the April 1, 2010 Census enumerated population presented in the county estimates. In July 2013, the legal status of Bedford changed from a city to a town and it became dependent within (or part of) Bedford County, Virginia. This population of Bedford town is now included in the April 1, 2010 estimates base and all July 1 estimates for Bedford County. Because it is no longer an independent city, Bedford town is not listed in this table. As a result, the sum of the April 1, 2010 census values for Virginia counties and independent cities does not equal the 2010 Census count for Virginia, and the sum of April 1, 2010 census values for all counties and independent cities in the United States does not equal the 2010 Census count for the United States. Substantial geographic changes to counties can be found on the Census Bureau website at https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.html.


07/07/2020:
Introduced the March 2020 MSA file, source is [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data).

07/08/2020:
We are reserving 00001-00099 for states codes of the form 100XX where XX is the fips code for the state. In the case that the CBSA codes change then it should be verified that these are not used. The current smallest CBSA is 10100.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a typo somewhere here; either we should reserve _1_0001-_1_0099, or the codes should be of the form _0_00XX. I vote for the former, since the existing API validation code enforces MSAs to be in the range 10000-99999.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was inconsistent here, and the code actually uses 000XX. Will more everything to 100XX convention.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.


-James
38 changes: 38 additions & 0 deletions _delphi_utils_python/data_proc/geomap/USDA_2015_fips_changes.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
https://www.nrcs.usda.gov/wps/portal/nrcs/detail/pa/home/?cid=nrcs143_013710,,
State,Before,After
CO,8031,8001
WY,parts of 56029,56047
WY,56039,56047
MO,29510,29189
VA,51540,51003
VA,51560,51005
VA,51580,51005
VA,51790,51015
VA,�51820,51015
VA,51515,51019
VA,51640,51035
VA,51570,51041
VA,51013,51059
VA,51510,51059
VA,51600,51059
VA,51610,51059
VA,51840,�51069
VA,51595,51081
VA,51780,51083
VA,51690,51089
VA,51830,51095
VA,51750,51121
VA,51590,51143
VA,51670,51149
VA,51730,51149
VA,51683,51153
VA,�51685,51153
VA,51770,51161
VA,51775,51161
VA,51530,51163
VA,51678,51163
VA,51660,51165
VA,51620,51175
VA,51630,51177
VA,51520,51191
VA,51720,51195
Loading