Skip to content

Fixing JHU deployment #230

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 173 commits into from
Aug 27, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
173 commits
Select commit Hold shift + click to select a range
6398f0a
added cross generation, load state and zip -> fips tested
jsharpna Jun 30, 2020
9318662
add GeoMapper, docstrings, tests
jsharpna Jul 1, 2020
a8f34b1
add zip->county mapping
jsharpna Jul 2, 2020
68fee88
added mega v0.1
jsharpna Jul 2, 2020
d9d0db0
debug gmpr use in hosp
jsharpna Jul 6, 2020
655dcf6
add gmpr to util init
jsharpna Jul 6, 2020
d9a1cc4
fix bug from emr
jsharpna Jul 7, 2020
1610ad1
add data processing script, source files, and derived files
jsharpna Jul 8, 2020
a114175
changed filenames in geomap
jsharpna Jul 8, 2020
873a3f7
Update README.md
jsharpna Jul 8, 2020
c473e8b
added consistency checks, rest of msa
jsharpna Jul 8, 2020
79c0cd8
debug msa rest of state
jsharpna Jul 8, 2020
c170fd2
Merge branch 'rf_geo' of https://github.com/cmu-delphi/covidcast-indi…
jsharpna Jul 8, 2020
647aca3
Merge branch 'main' into rf_geo
jsharpna Jul 8, 2020
2fb7f66
added jhu to fips converters
jsharpna Jul 14, 2020
fed6bf1
add jhu uid to county
jsharpna Jul 22, 2020
36bf5c2
add code for cds testing
Jul 23, 2020
e0b0238
update url
Jul 23, 2020
0d126d7
fixed errors and updated readme
Jul 23, 2020
6434f17
update tests
Jul 23, 2020
46921ea
update details
Jul 23, 2020
145f5a5
fixed error with forward filling
Jul 23, 2020
a85d6c5
Update DETAILS.md
jingjtang Jul 23, 2020
891f87f
update code for cds
Jul 24, 2020
e715a44
delete old folder
Jul 24, 2020
8fd8d62
fixed errors in run
Jul 24, 2020
12857fc
rm test_export
Jul 24, 2020
c3f5ec0
rm export.py
Jul 24, 2020
f868e35
update unittests
Jul 24, 2020
373a5eb
fixed errors in filteration
Jul 24, 2020
9f68d33
add threshold to sample size
Jul 24, 2020
fd29cdc
add sample_size and std for pct_positive
Jul 24, 2020
78c77a0
fixed errors
Jul 24, 2020
63c217a
Update DETAILS.md
jingjtang Jul 24, 2020
6cccb07
Update DETAILS.md
jingjtang Jul 24, 2020
f324b81
fixed an error
Jul 24, 2020
8ab0c93
add gitignore in receiving
Jul 24, 2020
29b669b
delete receiving
Jul 24, 2020
097021f
add gitignore
Jul 24, 2020
bc53857
update mapping file due to the changes in the raw files
Jul 27, 2020
0ff6577
update pull.py according to the changes in the raw data
Jul 27, 2020
95171d3
update unittests due to the changes in the raw data
Jul 27, 2020
cd3fe16
add wip to all of the signals
Jul 27, 2020
c24204f
Fix test affected by backfill
eujing Jul 28, 2020
57bb1f6
change the signal names for pct_positive; ignore .gitignore when remo…
Jul 29, 2020
63a8cae
update signal names
Jul 29, 2020
e55e3a6
Update REVIEW.md
AlSaeed Jul 29, 2020
9c8e7e7
add description to tested
Jul 29, 2020
d58f08c
Set target python version
eujing Jul 30, 2020
e080035
refactor jhu, add fips to zip utility
jsharpna Jul 30, 2020
9e398ae
Update: added new use case in safegraph
vishakha1812 Aug 3, 2020
48de2d4
finish hrr, debug jhu, needs stable comp
jsharpna Aug 4, 2020
885155d
finish hrr, debug jhu, needs stable comp
jsharpna Aug 4, 2020
a6aa219
compared msa versions
jsharpna Aug 5, 2020
206bcc1
mod compare rec
jsharpna Aug 5, 2020
8ec3b13
butcher compare_receiving script
jsharpna Aug 5, 2020
325f154
Merge branch 'main' into rf_geo
jsharpna Aug 5, 2020
06bfbcd
change in params.json.template
vishakha1812 Aug 6, 2020
ba64ad4
update in params.json.template: wip_prefix to wip_signal
vishakha1812 Aug 6, 2020
9854433
2018 msa raw data
jsharpna Aug 6, 2020
b6d8b7b
Changed func name from epidata_signal() -- > public_signal(), signal_…
vishakha1812 Aug 7, 2020
bdd38dc
cleaner code,
vishakha1812 Aug 7, 2020
78e145e
add mega functionality, struggling with 0.0 bug
jsharpna Aug 7, 2020
cc92195
handling wip signal in python template
vishakha1812 Aug 7, 2020
2e580d1
Minor changes in process.py, updated test cases
vishakha1812 Aug 7, 2020
d3270c9
signal naming using wip_signal parameter
vishakha1812 Aug 9, 2020
0100dcd
handling signal names in ght
vishakha1812 Aug 10, 2020
e28d224
Supply missing params.json.template in tests
krivard Aug 10, 2020
4d3bc3e
Switch from raw Epidata client to COVIDcast client
krivard Aug 10, 2020
eb19aa4
Move signal name constants to their own file, referenced from both ru…
krivard Aug 10, 2020
ef88218
Merge pull request #204 from cmu-delphi/kmm/rename_signals
vishakha1812 Aug 10, 2020
fb97ec2
Corrected logic to comply with #113
krivard Aug 10, 2020
2634f00
Merge pull request #205 from cmu-delphi/kmm/rename_signals
vishakha1812 Aug 10, 2020
b61ad2e
fixed errors in positivity rate
Aug 10, 2020
9dca4e2
fixed errors in 7day avg
Aug 10, 2020
fe3efab
changes in compliance with #205
vishakha1812 Aug 10, 2020
b42fa14
Missing comma in setup.py
vishakha1812 Aug 10, 2020
b972968
Linter suggested changes, added HOME_DWELL signal computation
vishakha1812 Aug 10, 2020
2b88a7a
changes in compliance with #205
vishakha1812 Aug 11, 2020
24e8f6d
Merge pull request #167 from cmu-delphi/fix-cdc-covidnet-test
krivard Aug 11, 2020
7d9a59a
default prefix value
vishakha1812 Aug 11, 2020
9e637c9
Apply suggestions from code review
vishakha1812 Aug 11, 2020
c2643e5
Removed unnecessary comment
vishakha1812 Aug 11, 2020
eed8564
Merge pull request #195 from cmu-delphi/handle_wip_signals
krivard Aug 11, 2020
31d882b
Merge pull request #168 from cmu-delphi/rename_signals
krivard Aug 11, 2020
b9a86ef
Added new constants.py file
vishakha1812 Aug 12, 2020
36abeab
Updated use cases to handle wip signals
vishakha1812 Aug 12, 2020
fa2ec51
Updated setup.py
vishakha1812 Aug 12, 2020
a3ec382
updated params.json.template for tests
vishakha1812 Aug 12, 2020
ba832b9
Added test case for wip signal names logic
vishakha1812 Aug 12, 2020
5481ed9
Updated test cases to handle wip file naming
vishakha1812 Aug 12, 2020
ea1a892
New test files for checking output file names in receiving_test folder
vishakha1812 Aug 12, 2020
2fb5ce9
My message
vishakha1812 Aug 12, 2020
0ea004a
Updated: Misssing comma
vishakha1812 Aug 12, 2020
ac80f07
Fixed typo in constants.py
vishakha1812 Aug 12, 2020
25a3001
Updated constants.py
vishakha1812 Aug 12, 2020
145d229
Updated README.md
vishakha1812 Aug 12, 2020
9d8a497
Updated update_sensor.py
vishakha1812 Aug 12, 2020
ac3ba9b
updated params.json.template
vishakha1812 Aug 12, 2020
851e14d
updated tests/params.json.template
vishakha1812 Aug 12, 2020
66ae7ee
New test case file for handling wip signal
vishakha1812 Aug 12, 2020
0a41df5
Updated test_update_sensor.py
vishakha1812 Aug 12, 2020
248ec99
Update constants.py
vishakha1812 Aug 12, 2020
120bd1c
Update update_sensor.py
vishakha1812 Aug 12, 2020
69b2648
Update test_update_sensor.py
vishakha1812 Aug 12, 2020
dab2ad7
Update test_update_sensor.py
vishakha1812 Aug 12, 2020
d76e61b
Update update_sensor.py
vishakha1812 Aug 12, 2020
fc1b1e3
Update update_sensor.py
vishakha1812 Aug 12, 2020
3a9c286
add code
Aug 13, 2020
65ab6a1
Update DETAILS.md
jingjtang Aug 13, 2020
2d377ec
Merge pull request #207 from cmu-delphi/handle_wip_signal_cdc
krivard Aug 13, 2020
7f15f35
Set up initial google_health-deploy branch
korlaxxalrok Jul 24, 2020
2bf1936
Finalize production config for google_health deployment (#165)
korlaxxalrok Aug 6, 2020
6ae91bd
Use brace instead of bracket (#189)
korlaxxalrok Aug 6, 2020
a0b18b9
Add path parameter (#190)
korlaxxalrok Aug 6, 2020
5b13efa
Allow `""` for `start_date` to get "latest" data (#194)
korlaxxalrok Aug 7, 2020
99cb100
Set start date to empty string (#203)
korlaxxalrok Aug 10, 2020
ee99978
Add a cache set that this identical to production (#211)
korlaxxalrok Aug 11, 2020
a86c0f0
Sync cache
huisaddison Jun 2, 2020
bd57752
Remove end date from params
korlaxxalrok Jul 27, 2020
11349b5
Merge pull request #214 from cmu-delphi/deploy-google_health
krivard Aug 13, 2020
9747162
update output format, to YYYYDD
Aug 14, 2020
2e2a1c4
update output format, to YYYYDD
Aug 14, 2020
a082d25
update README and DETAILS and fix signal names for weekly reports
Aug 14, 2020
8317a3b
Update DETAILS.md
jingjtang Aug 14, 2020
f84eb19
Update DETAILS.md
jingjtang Aug 14, 2020
c7bbad9
Merge pull request #162 from cmu-delphi/run-cds
krivard Aug 14, 2020
0f0320d
Merge pull request #218 from cmu-delphi/nchs_mortality
krivard Aug 17, 2020
ae7637a
Added new constants.py
vishakha1812 Aug 17, 2020
2ab9dc0
Added new file to check for wip signals
vishakha1812 Aug 17, 2020
c53b886
Updated params.json.template
vishakha1812 Aug 17, 2020
ab79ab1
Updated run.py
vishakha1812 Aug 17, 2020
5c20e7e
Updated test_run.py
vishakha1812 Aug 17, 2020
400bed8
Merge pull request #212 from cmu-delphi/handle_wip_signal_ght
krivard Aug 18, 2020
eee0acf
fix msa mega
jsharpna Aug 18, 2020
73308d9
Merge pull request #137 from cmu-delphi/rf_geo
krivard Aug 18, 2020
b55ff8a
debug jhu, still msa issue
jsharpna Aug 19, 2020
88fa1e6
restore .gitignore
jsharpna Aug 19, 2020
87f5234
Merge pull request #226 from cmu-delphi/rf_geo
krivard Aug 19, 2020
c016072
Merge pull request #223 from cmu-delphi/handle_wip_signal_ccad
krivard Aug 19, 2020
814fe48
Add AWS credentials to vault (#220)
eujing Aug 11, 2020
69ff86b
Fixed jhu tests to mock s3 too
eujing Aug 10, 2020
d6eedf5
Merge pull request #229 from cmu-delphi/dev/rebase-jhu-deploy
krivard Aug 19, 2020
bf73c71
Merge branch 'deploy-jhu' into main
krivard Aug 19, 2020
9193908
Add missing whitespace
korlaxxalrok Aug 20, 2020
b6d838a
Remove trailing newlines
korlaxxalrok Aug 20, 2020
522df62
Add spaces after commas
korlaxxalrok Aug 20, 2020
05b9b0d
Add space after comma
korlaxxalrok Aug 20, 2020
5906ce2
Fix import order
korlaxxalrok Aug 20, 2020
9149aed
Fix jhu linter issues (#231)
korlaxxalrok Aug 20, 2020
085ffed
Added missing prefix r
vishakha1812 Aug 20, 2020
b11a8cb
Added missing prefix r
vishakha1812 Aug 20, 2020
9300b32
Merge branch 'main' into fix-jhu-linter-issues
vishakha1812 Aug 20, 2020
2d61a87
Removed unused stmt. and corrected import order
vishakha1812 Aug 21, 2020
3c15428
Removed unused argument
vishakha1812 Aug 21, 2020
13ad138
Merge pull request #232 from cmu-delphi/fix-jhu-linter-issues
krivard Aug 21, 2020
e750a3a
Moved cache files to a new directory 'data'
vishakha1812 Aug 24, 2020
1da2822
Update conftest.py
vishakha1812 Aug 24, 2020
fbc9eb4
Update constants.py.py
vishakha1812 Aug 24, 2020
ead2164
Update params.json.template
vishakha1812 Aug 24, 2020
4be0ecf
Update test cases
vishakha1812 Aug 24, 2020
c5d1d4d
Update run.py
vishakha1812 Aug 24, 2020
e8344a6
Update setup.py
vishakha1812 Aug 24, 2020
fe0e6c5
Merge branch 'main' into diff_ght
vishakha1812 Aug 24, 2020
e17f07d
Update params.json.template.py
vishakha1812 Aug 25, 2020
b4379dd
Update params.json.template
vishakha1812 Aug 25, 2020
9869b67
Update run.py
vishakha1812 Aug 25, 2020
9fd0521
add .gitignore
vishakha1812 Aug 25, 2020
ff6a57e
Added static files to tests, configured .gitignore to handle cache, u…
vishakha1812 Aug 26, 2020
846b804
add wip_signal to ansible
vishakha1812 Aug 27, 2020
eca83c8
Update .j2 files in Ansible
vishakha1812 Aug 27, 2020
8693505
Update .j2 files in Ansible
vishakha1812 Aug 27, 2020
049f0a5
Merge pull request #240 from cmu-delphi/diff_ght
krivard Aug 27, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
33,100 changes: 33,100 additions & 0 deletions _delphi_utils_python/data_proc/geomap/02_20_uszips.csv

Large diffs are not rendered by default.

Binary file not shown.
Binary file not shown.
Binary file not shown.
115 changes: 115 additions & 0 deletions _delphi_utils_python/data_proc/geomap/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# Geocoding data processing pipeline

Authors: Jingjing Tang, James Sharpnack

The data_proc/geomap directory contains original source data, processing scripts, and notes for processing from original source to crosswalk tables in the data directory for the delphi_utils package.

## Usage

Requires the following source files below.

Run the following to write the cross files in the package data dir...
```
$ python geo_data_proc.py
```
this will build the following files...
- fips_msa_cross.csv
- zip_fips_cross.csv
- state_codes.csv

You can see consistency checks and diffs with old sources in ./consistency_checks.ipynb

## Source files

- 03_20_MSAs.xls : [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html)
- 02_20_uszips.csv : Hand edited file from Jingjing, we only use the fips,zip encoding and also extract the states from these
- Crosswalk files from https://www.huduser.gov/portal/datasets/usps_crosswalk.html
- JHU crosswalk table: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data#uid-lookup-table-logic
- ZIP/County population: https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.html#par_textimage_674173622, https://www2.census.gov/geo/docs/maps-data/data/rel/zcta_county_rel_10.txt?#

## Todo

- make direct cross tables for fips -> hrr and zip -> msa / state
- use hud for zip -> fips?

## Notes

Some of the source files were constructed by hand, most notably 02_20_uszips.csv.

The 02_20_uszips.csv file is based on the newest consensus data including 5-digit zipcode, fips code, county name, state, population, HRR, HSA (I downloaded the original file from here https://simplemaps.com/data/us-zips. This file matches best to the most recent (2020) situation in terms of the population. But there still exist some matching problems. I manually checked and corrected those lines (~20) with zip-codes.com (https://www.zip-codes.com/zip-code/58439/zip-code-58439.asp). The mapping from 5-digit zipcode to HRR is based on the file in 2017 version downloaded from https://atlasdata.dartmouth.edu/static/supp_research_data

transStateToHRR.csv and transfipsToHRR.csv are used to transform data from state level or county level to HRR respectively. For example, x is the horizontal vector of covid cases for different states in 04/10/20, then we have x @ H = y, where H is the table provided in these two csv files and y is a horizontal vector of covid cases for different HRRs.

HRRs are represented by hrrnum. There are 306 hrrs in total. They are not named as consecutive numbers.

-Jingjing


04/14/20: 'msa_id' and 'msa_name' are added according to the msa_list.csv that Aaron found from https://apps.bea.gov/regional/docs/msalist.cfm (2019)

04/15/20:
The newly updated(added columns) are based on cbsatocountycrosswalk.csv from https://data.nber.org/data/cbsa-fips-county-crosswalk.html
- 'msa' : MSA ID
- 'msaname': Name of the MSA
- 'cbsa': CBSA ID
- 'cbsaname': Name of the CBSA


04/19/20:
Changed to msa_list.csv again.

05/20/20: Updated msa_list.csv to include MSAs in Puerto Rico, using the delineations file from March 2020: https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html

06/15/20:
Added file co-est2019-annres.csv, which gives 2019 population estimates for each county by name

Source: Annual Estimates of the Resident Population for Counties in the United States: April 1, 2010 to July 1, 2019 (CO-EST2019-ANNRES). U.S. Census Bureau, Population Division. Release Date: March 2020
Note: The estimates are based on the 2010 Census and reflect changes to the April 1, 2010 population due to the Count Question Resolution program and geographic program revisions. All geographic boundaries for the 2019 population estimates are as of January 1, 2019. For population estimates methodology statements, see http://www.census.gov/programs-surveys/popest/technical-documentation/methodology.html.

Note: The 6,222 people in Bedford city, Virginia, which was an independent city as of the 2010 Census, are not included in the April 1, 2010 Census enumerated population presented in the county estimates. In July 2013, the legal status of Bedford changed from a city to a town and it became dependent within (or part of) Bedford County, Virginia. This population of Bedford town is now included in the April 1, 2010 estimates base and all July 1 estimates for Bedford County. Because it is no longer an independent city, Bedford town is not listed in this table. As a result, the sum of the April 1, 2010 census values for Virginia counties and independent cities does not equal the 2010 Census count for Virginia, and the sum of April 1, 2010 census values for all counties and independent cities in the United States does not equal the 2010 Census count for the United States. Substantial geographic changes to counties can be found on the Census Bureau website at https://www.census.gov/programs-surveys/geography/technical-documentation/county-changes.html.


07/07/2020:
Introduced the March 2020 MSA file, source is [US Census Bureau](https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html). This file seems to differ in a few fips codes from the source for the 02_20_uszip file which Jingjing constructed. There are at least 10 additional fips in 03_20_msa that are not in the uszip file, and one of the msa codes seems to be incorrect: 49020 (a google search confirms that it is incorrect in uszip and correct in the census data).

07/08/2020:
We are reserving 00001-00099 for states codes of the form 100XX where XX is the fips code for the state. In the case that the CBSA codes change then it should be verified that these are not used. The current smallest CBSA is 10100.

-James

07/22/2020:
- Introducing the COUNTY_ZIP and ZIP_COUNTY crosswalk files from https://www.huduser.gov/portal/datasets/usps_crosswalk.html
- Also the ZIP to HRR Crosswalk file (from 2018) from https://atlasdata.dartmouth.edu/static/supp_research_data
- Added the JHU crosswalk table and created a jhu_uid to fips crosswalk table: https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data#uid-lookup-table-logic

There are NaN fips in the JHU tables, so to resolve this we are moving over to using the JHU unique id.
We have to deal with the NaN fips by hand, which are
```
748 US
887 Recovered, US
888 Dukes and Nantucket, Massachusetts, US
889 Kansas City, Missouri, US
890 Michigan Department of Corrections (MDOC), Mic...
891 Federal Correctional Institution (FCI), Michig...
892 Air Force, US Military, US
893 Army, US Military, US
894 Marine Corps, US Military, US
895 Navy, US Military, US
896 Unassigned, US Military, US
897 US Military, US
898 Inmates, Federal Bureau of Prisons, US
899 Staff, Federal Bureau of Prisons, US
900 Federal Bureau of Prisons, US
901 Bear River, Utah, US
902 Central Utah, Utah, US
903 Southeast Utah, Utah, US
904 Southwest Utah, Utah, US
905 TriCounty, Utah, US
906 Weber-Morgan, Utah, US
907 Veteran Hospitals, US
```
Is you look at geo_data.py::

08/04/2020:
Large changes in MSA from 2018 version from bea.gov (msa_list.csv), and the new 2020 version from census bureau (03_20_MSAs.xls).
Trying to use 2018 version instead from https://www.census.gov/geographies/reference-files/time-series/demo/metro-micro/delineation-files.html
Loading