Skip to content

Release covidcast-indicators 0.2.22 #1481

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Jan 20, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
431d8c1
Initial add of community profile indicator
krivard Nov 11, 2021
ff3b7da
Changes to support all known backissues
krivard Nov 12, 2021
af692a6
Documentation
krivard Nov 12, 2021
07559f3
[dsew] Add suggested dynamic validator params
krivard Nov 19, 2021
ac57ecc
Fixes from code review
krivard Nov 29, 2021
e363c06
Actually fix rate aggregation this time
krivard Jan 10, 2022
691e3e3
remove whitespace
krivard Jan 10, 2022
4ac71f7
Merge pull request #1463 from cmu-delphi/bot/sync-prod-main
krivard Jan 12, 2022
ecd4e4c
initial add hospital admissions
nmdefries Jan 11, 2022
f93c2ed
simplify date keys logic
nmdefries Jan 13, 2022
2ad52eb
fix state to nation calc
nmdefries Jan 13, 2022
cfbb8a9
linting and tests
nmdefries Jan 13, 2022
691727e
automate tests
nmdefries Jan 13, 2022
3564a39
keep most recent publish date
nmdefries Jan 14, 2022
9e2ab62
fix rate weight calculation when multiple days present
nmdefries Jan 14, 2022
1546af6
add production params
nmdefries Jan 14, 2022
671ae9c
make weight normalization more robust to number states per day
nmdefries Jan 14, 2022
0090c01
prevent errors from missing hosp admissions before early jan 2021
nmdefries Jan 18, 2022
f07792a
Merge pull request #1466 from cmu-delphi/ndefries/community_profile_hosp
krivard Jan 18, 2022
d7cc1be
Merge branch 'main' into krivard/community_profile
nmdefries Jan 18, 2022
481d875
[cpr] fix bugs in export date handling
krivard Jan 18, 2022
7c41663
Merge branch 'krivard/community_profile' of github.com:cmu-delphi/cov…
krivard Jan 18, 2022
a58388c
Update Makefile for new ci
krivard Jan 18, 2022
6b83048
Mock covidcast/meta
qx-teo Jan 19, 2022
aa16b00
Remove hard-coding from get_geo_signal_combos
qx-teo Jan 19, 2022
22d0491
Fix lint
qx-teo Jan 19, 2022
dcd0acc
Swopping keys and values for source_signal_mappings dict
qx-teo Jan 19, 2022
2be28c5
Update package description
krivard Jan 19, 2022
03c6f45
Merge pull request #1363 from cmu-delphi/krivard/community_profile
krivard Jan 19, 2022
89c9cf7
Productionize!
korlaxxalrok Jan 19, 2022
e5aeefd
Merge pull request #1472 from cmu-delphi/productionize-dsew-cpr
korlaxxalrok Jan 19, 2022
875a0e1
Rename params file so it gets picked up by Jenkins
korlaxxalrok Jan 19, 2022
f20a6e0
Merge pull request #1474 from cmu-delphi/fix-productionize-dsew-cpr
korlaxxalrok Jan 19, 2022
7bbbb3c
update pipeline to support Wave 13
nmdefries Jan 19, 2022
6141756
Merge pull request #1470 from cmu-delphi/fix_covidcast_mock
krivard Jan 19, 2022
88476b1
Merge pull request #1471 from cmu-delphi/fix_geosigcombo_hardcoding
krivard Jan 19, 2022
e8dac87
Fix requests.get mock
qx-teo Jan 20, 2022
027769b
deduplicate microdata output fields
nmdefries Jan 20, 2022
6f89e0d
V17 actually reported as two fields
nmdefries Jan 20, 2022
6ef4968
Merge pull request #1477 from cmu-delphi/fix_datafetcher_mock
krivard Jan 20, 2022
521bf20
Merge pull request #1475 from cmu-delphi/survey-wave-13
krivard Jan 20, 2022
7fc264b
chore: bump delphi_utils to 0.2.10
Jan 20, 2022
3af5cdc
chore: bump covidcast-indicators to 0.2.22
Jan 20, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.2.21
current_version = 0.2.22
commit = True
message = chore: bump covidcast-indicators to {new_version}
tag = False
2 changes: 1 addition & 1 deletion .github/workflows/python-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ jobs:
if: github.event.pull_request.draft == false
strategy:
matrix:
packages: [_delphi_utils_python, changehc, claims_hosp, combo_cases_and_deaths, doctor_visits, google_symptoms, hhs_hosp, hhs_facilities, jhu, nchs_mortality, nowcast, quidel, quidel_covidtest, safegraph_patterns, sir_complainsalot, usafacts]
packages: [_delphi_utils_python, changehc, claims_hosp, combo_cases_and_deaths, doctor_visits, dsew_community_profile, google_symptoms, hhs_hosp, hhs_facilities, jhu, nchs_mortality, nowcast, quidel, quidel_covidtest, safegraph_patterns, sir_complainsalot, usafacts]
defaults:
run:
working-directory: ${{ matrix.packages }}
Expand Down
2 changes: 1 addition & 1 deletion Jenkinsfile
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
- Keep in sync with '.github/workflows/python-ci.yml'.
- TODO: #527 Get this list automatically from python-ci.yml at runtime.
*/
def indicator_list = ["changehc", "claims_hosp", "facebook", "google_symptoms", "hhs_hosp", "jhu", "nchs_mortality", "quidel", "quidel_covidtest", "safegraph_patterns", "sir_complainsalot", "usafacts"]
def indicator_list = ["changehc", "claims_hosp", "facebook", "google_symptoms", "hhs_hosp", "jhu", "nchs_mortality", "quidel", "quidel_covidtest", "safegraph_patterns", "sir_complainsalot", "usafacts", "dsew_community_profile"]
def build_package = [:]
def deploy_staging = [:]
def deploy_production = [:]
Expand Down
2 changes: 1 addition & 1 deletion _delphi_utils_python/.bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 0.2.9
current_version = 0.2.10
commit = True
message = chore: bump delphi_utils to {new_version}
tag = False
Expand Down
2 changes: 1 addition & 1 deletion _delphi_utils_python/delphi_utils/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@
from .nancodes import Nans
from .weekday import Weekday

__version__ = "0.2.9"
__version__ = "0.2.10"
16 changes: 6 additions & 10 deletions _delphi_utils_python/delphi_utils/validator/datafetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -111,14 +111,9 @@ def get_geo_signal_combos(data_source):
Cross references based on combinations reported available by COVIDcast metadata.
"""
# Maps data_source name with what's in the API, lists used in case of multiple names
# pylint: disable=fixme
# TODO: Extract this mapping from meta response instead of hard-coding
# https://github.com/cmu-delphi/covidcast-indicators/issues/1457
source_signal_mappings = {
'indicator-combination': ['indicator-combination-cases-deaths'],
'quidel': ['quidel-covid-ag'],
'safegraph': ['safegraph-weekly']
}

source_signal_mappings = {i['source']:i['db_source'] for i in
requests.get("https://api.covidcast.cmu.edu/epidata/covidcast/meta").json()}
meta = covidcast.metadata()
source_meta = meta[meta['data_source'] == data_source]
# Need to convert np.records to tuples so they are hashable and can be used in sets and dicts.
Expand All @@ -130,8 +125,9 @@ def get_geo_signal_combos(data_source):
# True/False indicate if status is active, "unknown" means we should check
sig_combo_seen = dict()
for combo in geo_signal_combos:
if source_signal_mappings.get(data_source):
src_list = source_signal_mappings.get(data_source)
if data_source in source_signal_mappings.values():
src_list = [key for (key, value) in source_signal_mappings.items()
if value == data_source]
else:
src_list = [data_source]
for src in src_list:
Expand Down
2 changes: 1 addition & 1 deletion _delphi_utils_python/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@

setup(
name="delphi_utils",
version="0.2.9",
version="0.2.10",
description="Shared Utility Functions for Indicators",
long_description=long_description,
long_description_content_type="text/markdown",
Expand Down
33 changes: 26 additions & 7 deletions _delphi_utils_python/tests/validator/test_datafetcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,25 +21,44 @@ def test_make_date_filter(self):
assert not date_filter(FILENAME_REGEX.match("20200620_a_b.csv"))
assert not date_filter(FILENAME_REGEX.match("202006_a_b.csv"))

# pylint: disable=fixme
# TODO: mock out the advanced meta endpoint /covidcast/meta as well
# https://github.com/cmu-delphi/covidcast-indicators/issues/1456
# Solution from https://stackoverflow.com/questions/15753390/
#how-can-i-mock-requests-and-the-response
def mocked_requests_get(*args, **kwargs):
class MockResponse:
def __init__(self, json_data, status_code):
self.json_data = json_data
self.status_code = status_code

def json(self):
return self.json_data
if len(kwargs) == 0:
return MockResponse([{'source': 'chng', 'db_source': 'chng'},
{'source': 'covid-act-now', 'db_source': 'covid-act-now'}], 200)
elif kwargs["params"] == {'signal': 'chng:inactive'}:
return MockResponse([{"signals": [{"active": False}]}], 200)
else:
return MockResponse([{"signals": [{"active": True}]}], 200)
@mock.patch('requests.get', side_effect=mocked_requests_get)
@mock.patch("covidcast.metadata")
def test_get_geo_signal_combos(self, mock_metadata):
def test_get_geo_signal_combos(self, mock_metadata, mock_get):
"""Test that the geo signal combos are correctly pulled from the covidcast metadata."""
# Need to use actual data_source and signal names since we reference the API
# We let the chng signal "inactive" be an inactive signal
mock_metadata.return_value = pd.DataFrame({"data_source": ["chng", "chng", "chng",
"covid-act-now",
"covid-act-now",
"covid-act-now"],
"covid-act-now",
"chng"],
"signal": ["smoothed_outpatient_cli",
"smoothed_outpatient_covid",
"smoothed_outpatient_covid",
"pcr_specimen_positivity_rate",
"pcr_specimen_positivity_rate",
"pcr_specimen_total_tests"],
"pcr_specimen_total_tests",
"inactive"],
"geo_type": ["state", "state", "county",
"hrr", "msa", "msa"]
"hrr", "msa", "msa",
"state"]
})

assert set(get_geo_signal_combos("chng")) == set(
Expand Down
32 changes: 32 additions & 0 deletions ansible/templates/dsew_community_profile-params-prod.json.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
{
"common": {
"export_dir": "/common/covidcast/receiving/dsew-cpr",
"log_filename": "/var/log/indicators/dsew_cpr.log"
},
"indicator": {
"input_cache": "./input_cache",
"reports": "new"
},
"validation": {
"common": {
"data_source": "dsew-cpr",
"span_length": 14,
"min_expected_lag": {"all": "5"},
"max_expected_lag": {"all": "9"},
"dry_run": true,
"suppressed_errors": []
},
"static": {
"minimum_sample_size": 0,
"missing_se_allowed": true,
"missing_sample_size_allowed": true
},
"dynamic": {
"ref_window_size": 7,
"smoothed_signals": [
"naats_total_7dav",
"naats_positivity_7dav"
]
}
}
}
1 change: 1 addition & 0 deletions ansible/templates/facebook-params-prod.json.j2
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@
"Survey of COVID-Like Illness - Wave 11": "fb-survey",
"Survey of COVID-Like Illness - Wave 12": "fb-survey",
"Survey of COVID-Like Illness - Wave 12 - Full Launch": "fb-survey",
"Survey of COVID-Like Illness - Wave 13": "fb-survey",
"Survey of COVID-Like Illness - Wave 4": "fb-survey",
"Survey of COVID-Like Illness - Wave 5": "fb-survey",
"Survey of COVID-Like Illness - Wave 6": "fb-survey",
Expand Down
35 changes: 34 additions & 1 deletion ansible/templates/sir_complainsalot-params-prod.json.j2
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,40 @@
["smoothed_dontneed_reason_not_high_risk", "hrr"], ["smoothed_wdontneed_reason_not_high_risk", "hrr"],
["smoothed_dontneed_reason_not_serious", "hrr"], ["smoothed_wdontneed_reason_not_serious", "hrr"],
["smoothed_dontneed_reason_other", "hrr"], ["smoothed_wdontneed_reason_other", "hrr"],
["smoothed_dontneed_reason_precautions", "hrr"], ["smoothed_wdontneed_reason_precautions", "hrr"]
["smoothed_dontneed_reason_precautions", "hrr"], ["smoothed_wdontneed_reason_precautions", "hrr"],
"smoothed_screening_tested_positive_14d", "smoothed_wscreening_tested_positive_14d",
"smoothed_travel_outside_state_7d", "smoothed_wtravel_outside_state_7d",
"smoothed_belief_vaccinated_mask_unnecessary", "smoothed_wbelief_vaccinated_mask_unnecessary",
"smoothed_belief_children_immune", "smoothed_wbelief_children_immune",
"smoothed_received_2_vaccine_doses", "smoothed_wreceived_2_vaccine_doses",
"smoothed_vaccine_barrier_eligible", "smoothed_wvaccine_barrier_eligible",
"smoothed_vaccine_barrier_no_appointments", "smoothed_wvaccine_barrier_no_appointments",
"smoothed_vaccine_barrier_appointment_time", "smoothed_wvaccine_barrier_appointment_time",
"smoothed_vaccine_barrier_technical_difficulties", "smoothed_wvaccine_barrier_technical_difficulties",
"smoothed_vaccine_barrier_document", "smoothed_wvaccine_barrier_document",
"smoothed_vaccine_barrier_technology_access", "smoothed_wvaccine_barrier_technology_access",
"smoothed_vaccine_barrier_travel", "smoothed_wvaccine_barrier_travel",
"smoothed_vaccine_barrier_language", "smoothed_wvaccine_barrier_language",
"smoothed_vaccine_barrier_childcare", "smoothed_wvaccine_barrier_childcare",
"smoothed_vaccine_barrier_time", "smoothed_wvaccine_barrier_time",
"smoothed_vaccine_barrier_type", "smoothed_wvaccine_barrier_type",
"smoothed_vaccine_barrier_none", "smoothed_wvaccine_barrier_none",
"smoothed_vaccine_barrier_appointment_location", "smoothed_wvaccine_barrier_appointment_location",
"smoothed_vaccine_barrier_other", "smoothed_wvaccine_barrier_other",
"smoothed_vaccine_barrier_eligible_has", "smoothed_wvaccine_barrier_eligible_has",
"smoothed_vaccine_barrier_no_appointments_has", "smoothed_wvaccine_barrier_no_appointments_has",
"smoothed_vaccine_barrier_appointment_time_has", "smoothed_wvaccine_barrier_appointment_time_has",
"smoothed_vaccine_barrier_technical_difficulties_has", "smoothed_wvaccine_barrier_technical_difficulties_has",
"smoothed_vaccine_barrier_document_has", "smoothed_wvaccine_barrier_document_has",
"smoothed_vaccine_barrier_technology_access_has", "smoothed_wvaccine_barrier_technology_access_has",
"smoothed_vaccine_barrier_travel_has", "smoothed_wvaccine_barrier_travel_has",
"smoothed_vaccine_barrier_language_has", "smoothed_wvaccine_barrier_language_has",
"smoothed_vaccine_barrier_childcare_has", "smoothed_wvaccine_barrier_childcare_has",
"smoothed_vaccine_barrier_time_has", "smoothed_wvaccine_barrier_time_has",
"smoothed_vaccine_barrier_type_has", "smoothed_wvaccine_barrier_type_has",
"smoothed_vaccine_barrier_none_has", "smoothed_wvaccine_barrier_none_has",
"smoothed_vaccine_barrier_appointment_location_has", "smoothed_wvaccine_barrier_appointment_location_has",
"smoothed_vaccine_barrier_other_has", "smoothed_wvaccine_barrier_other_has"
]
},
"quidel": {
Expand Down
22 changes: 22 additions & 0 deletions dsew_community_profile/.pylintrc
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@

[MESSAGES CONTROL]

disable=logging-format-interpolation,
too-many-locals,
too-many-arguments,
# Allow pytest functions to be part of a class.
no-self-use,
# Allow pytest classes to have one test.
too-few-public-methods

[BASIC]

# Allow arbitrarily short-named variables.
variable-rgx=[a-z_][a-z0-9_]*
argument-rgx=[a-z_][a-z0-9_]*
attr-rgx=[a-z_][a-z0-9_]*

[DESIGN]

# Don't complain about pytest "unused" arguments.
ignored-argument-names=(_.*|run_as_module)
133 changes: 133 additions & 0 deletions dsew_community_profile/DETAILS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
# Dataset layout

The Data Strategy and Execution Workgroup (DSEW) publishes a Community Profile
Report each weekday, comprising a pair of files: an Excel workbook (.xlsx) and a
PDF which shows select metrics from the workbook as time series charts and
choropleth maps. These files are listed as attachments on the healthdata.gov
site:

https://healthdata.gov/Health/COVID-19-Community-Profile-Report/gqxm-d9w9

Each Excel file attachment has a filename. The filename contains a date,
presumably the publish date. The attachment also has an alphanumeric
assetId. Both the filename and the assetId are required for downloading the
file. Whether this means that updated versions of a particular file may be
uploaded by DSEW at later times is not known. The attachment does not explicitly
list an upload timestamp. To be safe, we cache our downloads using both the
assetId and the filename.

# Workbook layout

Each Excel file is a workbook with multiple sheets. The exemplar file used in
writing this indicator is "Community Profile Report 20211102.xlsx". The sheets
include:

- User Notes: Instructions for using the workbook
- Overview: US National figures for the last 5 weeks, plus monthly peaks back to
April 2020
- Regions*: Figures for FEMA regions (double-checked: they match HHS regions
except that FEMA 2 does not include Palau while HHS 2 does)
- States*: Figures for US states and territories
- CBSAs*: Figures for US Census Block Statistical Areas
- Counties*: Figures for US counties
- Weekly Transmission Categories: Lists of high, substantial, and moderate
transmission states and territories
- National Peaks: Monthly national peaks back to April 2020
- National Historic: Daily national figures back to January 22 2020
- Data Notes: Source and methods information for all metrics
- Color Thresholds: Color-coding is used extensively in all sheets; these are
the keys

The starred sheets above have nearly-identical column layouts, and together
cover the county, MSA, state, and HHS geographical levels used in
covidcast. Rather than aggregate them ourselves and risk a mismatch, this
indicator lifts these geographical aggregations directly from the corresponding
sheets of the workbook.

GeoMapper _is_ used to generate national figures from
state, due to architectural differences between the starred sheets and the
Overview sheet. If we discover that our nation-level figures differ too much
from those listed in the Overview sheet, we can add dedicated parsing for the
Overview sheet and remove GeoMapper from this indicator altogether.

# Sheet layout

## Headers

Each starred sheet has two rows of headers. The first row uses merged cells to
group several columns together under a single "overheader". This overheader
often includes the reference period for that group of columns, such as:

- CASES/DEATHS: LAST WEEK (October 26-November 1)
- TESTING: LAST WEEK (October 24-30, Test Volume October 20-26)
- TESTING: PREVIOUS WEEK (October 17-23, Test Volume October 13-19)

Overheaders have changed periodically since the first report. For example, the
"TESTING: LAST WEEK" overheader above has also appeared as "VIRAL (RT-PCR) LAB
TESTING: LAST WEEK", with and without a separate reference date for Test
Volume. All known overheader forms are checked in test_pull.py.

The second row contains a header for each column. The headers uniquely identify
each column included in the sheet. Column headers include spaces, and typically
specify both the metric and the reference period over which it was calculated,
such as:

- Total NAATs - last 7 days (may be an underestimate due to delayed reporting)
- NAAT positivity rate - previous 7 days (may be an underestimate due to delayed
reporting)

Columns headers have also changed periodically since the first report. For
example, the "Total NAATs - last 7 days" header above has also appeared as
"Total RT-PCR diagnostic tests - last 7 days".

## Contents

Each starred sheet contains test positivity and total test volume figures for
two reference periods, "last [week]" and "previous [week]". In some reports, the
reference periods for test positivity and total test volume are the same; in
others, they are different, such that the report contains figures for four
distinct reference periods, two for each metric we extract.

# Time series conversions and parsing notes

## Reference date

The reference period in the overheader never includes the year. We guess the
reference year by picking the same year as the publish date (i.e., the date
extracted from the filename), and if the reference month is greater than the
publish month, subtract 1 from the reference year. This adequately covers the
December-January boundary.

We select as reference date the end date of the reference period for each
metric. Reference periods are always 7 days, so this indicator produces
seven-day averages. We divide the total testing volume by seven and leave the
test positivity alone.

## Geo ID

The Counties sheet lists FIPS codes numerically, such that FIPS with a leading
zero only have four digits. We fix this by zero-filling to five characters.

MSAs are a subset of CBSAs. We fix this by selecting only CBSAs with type
"Metropolitan".

Most of the starred sheets have the geo id as the first non-index column. The
Region sheet has no such column. We fix this by generating the HHS ids from the
index column instead.

## Combining multiple reports

Each report file generates two reference dates for each metric, up to four
reference dates total. Since it's not clear whether new versions of past files
are ever made available, the default mode (params.indicator.reports="new")
fetches any files that are not already in the input cache, then combines the
results into a single data frame before exporting. This will generate correct
behavior should (for instance) a previously-downloaded file get a new assetId.

For the initial run on an empty input cache, and for runs configured to process
a range of reports (using params.indicator.reports=YYYY-mm-dd--YYYY-mm-dd), this
indicator makes no distinction between figures that came from different
reports. That may not be what you want. If the covidcast issue date needs to
match the date on the report filename, then the indicator must instead be run
repeatedly, with equal start and end dates, keeping the output of each run
separate.
Loading