Skip to content

Commit 3fffd54

Browse files
authored
Merge pull request #184 from cmu-delphi/refactor/combo_cases_and_deaths
Combo cases and deaths refactor
2 parents fae9e1d + 4e80fa5 commit 3fffd54

File tree

15 files changed

+349
-256
lines changed

15 files changed

+349
-256
lines changed

combo_cases_and_deaths/README.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
# MODULE NAME
2+
3+
4+
5+
## Running the Indicator
6+
7+
The indicator is run by directly executing the Python module contained in this
8+
directory. The safest way to do this is to create a virtual environment,
9+
installed the common DELPHI tools, and then install the module and its
10+
dependencies. To do this, run the following code from this directory:
11+
12+
```
13+
python -m venv env
14+
source env/bin/activate
15+
pip install ../_delphi_utils_python/.
16+
pip install .
17+
```
18+
19+
All of the user-changable parameters are stored in `params.json`. To execute
20+
the module and produce the output datasets (by default, in `receiving`), run
21+
the following:
22+
23+
```
24+
env/bin/python -m delphi_combo_cases_and_deaths
25+
```
26+
27+
Once you are finished with the code, you can deactivate the virtual environment
28+
and (optionally) remove the environment itself.
29+
30+
```
31+
deactivate
32+
rm -r env
33+
```
34+
35+
## Testing the code
36+
37+
To do a static test of the code style, it is recommended to run **pylint** on
38+
the module. To do this, run the following from the main module directory:
39+
40+
```
41+
env/bin/pylint delphi_combo_cases_and_deaths
42+
```
43+
44+
The most aggressive checks are turned off; only relatively important issues
45+
should be raised and they should be manually checked (or better, fixed).
46+
47+
Unit tests are also included in the module. To execute these, run the following
48+
command from this directory:
49+
50+
```
51+
(cd tests && ../env/bin/pytest --cov=delphi_combo_cases_and_deaths --cov-report=term-missing)
52+
```
53+
54+
The output will show the number of unit tests that passed and failed, along
55+
with the percentage of code covered by the tests. None of the tests should
56+
fail and the code lines that are not covered by unit tests should be small and
57+
should not include critical sub-routines.

combo_cases_and_deaths/REVIEW.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## Code Review (Python)
2+
3+
A code review of this module should include a careful look at the code and the
4+
output. To assist in the process, but certainly not in replace of it, please
5+
check the following items.
6+
7+
**Documentation**
8+
9+
- [ ] the README.md file template is filled out and currently accurate; it is
10+
possible to load and test the code using only the instructions given
11+
- [ ] minimal docstrings (one line describing what the function does) are
12+
included for all functions; full docstrings describing the inputs and expected
13+
outputs should be given for non-trivial functions
14+
15+
**Structure**
16+
17+
- [ ] code should use 4 spaces for indentation; other style decisions are
18+
flexible, but be consistent within a module
19+
- [ ] any required metadata files are checked into the repository and placed
20+
within the directory `static`
21+
- [ ] any intermediate files that are created and stored by the module should
22+
be placed in the directory `cache`
23+
- [ ] final expected output files to be uploaded to the API are placed in the
24+
`receiving` directory; output files should not be committed to the respository
25+
- [ ] all options and API keys are passed through the file `params.json`
26+
- [ ] template parameter file (`params.json.template`) is checked into the
27+
code; no personal (i.e., usernames) or private (i.e., API keys) information is
28+
included in this template file
29+
30+
**Testing**
31+
32+
- [ ] module can be installed in a new virtual environment
33+
- [ ] pylint with the default `.pylint` settings run over the module produces
34+
minimal warnings; warnings that do exist have been confirmed as false positives
35+
- [ ] reasonably high level of unit test coverage covering all of the main logic
36+
of the code (e.g., missing coverage for raised errors that do not currently seem
37+
possible to reach are okay; missing coverage for options that will be needed are
38+
not)
39+
- [ ] all unit tests run without errors

combo_cases_and_deaths/cache/.gitignore

Whitespace-only changes.
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# -*- coding: utf-8 -*-
2+
"""Module to combine the JHU and USA Facts indicators.
3+
4+
This file defines the functions that are made public by the module. As the
5+
module is intended to be executed though the main method, these are primarily
6+
for testing.
7+
"""
8+
9+
from __future__ import absolute_import
10+
11+
from . import run
12+
13+
__version__ = "0.1.0"
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# -*- coding: utf-8 -*-
2+
"""Call the function run_module when executed.
3+
4+
This file indicates that calling the module (`python -m delphi_combo_cases_and_deaths`) will
5+
call the function `run_module` found within the run.py file. There should be
6+
no need to change this template.
7+
"""
8+
9+
from .run import run_module # pragma: no cover
10+
11+
run_module() # pragma: no cover
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# -*- coding: utf-8 -*-
2+
"""Functions to call when running the function.
3+
4+
This module should contain a function called `run_module`, that is executed when
5+
the module is run with `python -m delphi_combo_cases_and_deaths`.
6+
7+
This module produces a combined signal for jhu-csse and usa-facts. This signal
8+
is only used for visualization. It sources Puerto Rico from jhu-csse and
9+
everything else from usa-facts.
10+
11+
"""
12+
from datetime import date, timedelta, datetime
13+
from itertools import product
14+
import re
15+
import sys
16+
17+
import covidcast
18+
import pandas as pd
19+
20+
from delphi_utils import read_params, create_export_csv
21+
22+
23+
METRICS = [
24+
"confirmed",
25+
"deaths",
26+
]
27+
SMOOTH_TYPES = [
28+
"",
29+
"7dav",
30+
]
31+
SENSORS = [
32+
"incidence_num",
33+
"cumulative_num",
34+
"incidence_prop",
35+
"cumulative_prop",
36+
]
37+
GEO_RESOLUTIONS = [
38+
"county",
39+
"state",
40+
"msa",
41+
"hrr",
42+
]
43+
44+
def check_not_none(data_frame, label, date_range):
45+
"""Exit gracefully if a data frame we attempted to retrieve is empty"""
46+
if data_frame is None:
47+
print(f"{label} not available in range {date_range}")
48+
sys.exit(1)
49+
50+
def combine_usafacts_and_jhu(signal, geo, date_range):
51+
"""
52+
Add rows for PR from JHU signals to USA-FACTS signals
53+
"""
54+
usafacts_df = covidcast.signal("usa-facts", signal, date_range[0], date_range[1], geo)
55+
jhu_df = covidcast.signal("jhu-csse", signal, date_range[0], date_range[1], geo)
56+
check_not_none(usafacts_df, "USA-FACTS", date_range)
57+
check_not_none(jhu_df, "JHU", date_range)
58+
59+
# State level
60+
if geo == 'state':
61+
combined_df = usafacts_df.append(jhu_df[jhu_df["geo_value"] == 'pr'])
62+
# County level
63+
elif geo == 'county':
64+
combined_df = usafacts_df.append(jhu_df[jhu_df["geo_value"] == '72000'])
65+
# For MSA and HRR level, they are the same
66+
else:
67+
combined_df = usafacts_df
68+
69+
combined_df = combined_df.drop(["direction"], axis=1)
70+
combined_df = combined_df.rename({"time_value": "timestamp",
71+
"geo_value": "geo_id",
72+
"value": "val",
73+
"stderr": "se"},
74+
axis=1)
75+
return combined_df
76+
77+
def extend_raw_date_range(params, sensor_name):
78+
"""A complete issue includes smoothed signals as well as all raw data
79+
that contributed to the smoothed values, so that it's possible to use
80+
the raw values in the API to reconstruct the smoothed signal at will.
81+
82+
The smoother we're currently using incorporates the previous 7
83+
days of data, so we must extend the date range of the raw data
84+
backwards by 7 days.
85+
"""
86+
if sensor_name.find("7dav") < 0:
87+
return [
88+
params['date_range'][0] - timedelta(days=7),
89+
params['date_range'][-1]
90+
]
91+
return params['date_range']
92+
93+
def next_missing_day(source, signals):
94+
"""Fetch the first day for which we want to generate new data."""
95+
meta_df = covidcast.metadata()
96+
meta_df = meta_df[meta_df["data_source"] == source]
97+
meta_df = meta_df[meta_df["signal"].isin(signals)]
98+
# min: use the max_time of the most lagged signal, in case they differ
99+
# +timedelta: the subsequent day is the first day of new data to generate
100+
day = min(meta_df["max_time"]) + timedelta(days=1)
101+
return day
102+
103+
def sensor_signal(metric, sensor, smoother):
104+
"""Generate the signal name for a particular configuration"""
105+
if smoother == "7dav":
106+
sensor_name = "_".join([smoother, sensor])
107+
else:
108+
sensor_name = sensor
109+
signal = "_".join([metric, sensor_name])
110+
return sensor_name, signal
111+
112+
def run_module():
113+
"""Produce a combined cases and deaths signal using data from JHU and USA Facts"""
114+
variants = [tuple((metric, geo_res)+sensor_signal(metric, sensor, smoother))
115+
for (metric, geo_res, sensor, smoother) in
116+
product(METRICS, GEO_RESOLUTIONS, SENSORS, SMOOTH_TYPES)]
117+
118+
params = read_params()
119+
params['export_start_date'] = date(*params['export_start_date'])
120+
yesterday = date.today() - timedelta(days=1)
121+
if params['date_range'] == 'new':
122+
# only create combined file for the newest update
123+
# (usually for yesterday, but check just in case)
124+
params['date_range'] = [
125+
min(
126+
yesterday,
127+
next_missing_day(
128+
params["source"],
129+
set(signal[-1] for signal in variants)
130+
)
131+
),
132+
yesterday
133+
]
134+
elif params['date_range'] == 'all':
135+
# create combined files for all of the historical reports
136+
params['date_range'] = [params['export_start_date'], yesterday]
137+
else:
138+
pattern = re.compile(r'^\d{8}-\d{8}$')
139+
match_res = re.findall(pattern, params['date_range'])
140+
if len(match_res) == 0:
141+
raise ValueError(
142+
"Invalid date_range parameter. Please choose from (new, all, yyyymmdd-yyyymmdd).")
143+
try:
144+
date1 = datetime.strptime(params['date_range'][:8], '%Y%m%d').date()
145+
except ValueError:
146+
raise ValueError("Invalid date_range parameter. Please check the first date.")
147+
try:
148+
date2 = datetime.strptime(params['date_range'][-8:], '%Y%m%d').date()
149+
except ValueError:
150+
raise ValueError("Invalid date_range parameter. Please check the second date.")
151+
152+
#The the valid start date
153+
if date1 < params['export_start_date']:
154+
date1 = params['export_start_date']
155+
params['date_range'] = [date1, date2]
156+
157+
for metric, geo_res, sensor_name, signal in variants:
158+
create_export_csv(
159+
combine_usafacts_and_jhu(signal, geo_res, extend_raw_date_range(params, sensor_name)),
160+
export_dir=params['export_dir'],
161+
start_date=pd.to_datetime(params['export_start_date']),
162+
metric=metric,
163+
geo_res=geo_res,
164+
sensor=sensor_name,
165+
)
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
{
2+
"static_file_dir": "./static",
3+
"export_dir": "./receiving",
4+
"cache_dir": "./cache",
5+
"export_start_date":[2020,4,1],
6+
"date_range":"new",
7+
"source":"indicator-combination"
8+
}
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
*.csv

combo_cases_and_deaths/setup.py

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
from setuptools import setup
2+
from setuptools import find_packages
3+
4+
required = [
5+
"pandas",
6+
"pytest",
7+
"pytest-cov",
8+
"pylint",
9+
"delphi-utils",
10+
"covidcast"
11+
]
12+
13+
setup(
14+
name="delphi_combo_cases_and_deaths",
15+
version="0.1.0",
16+
description="A combined signal for cases and deaths using JHU for Puerto Rico and USA Facts everywhere else",
17+
author="Jingjing Tang, Kathryn Mazaitis",
18+
author_email="[email protected]",
19+
url="https://github.com/cmu-delphi/covidcast-indicators",
20+
install_requires=required,
21+
classifiers=[
22+
"Development Status :: 5 - Production/Stable",
23+
"Intended Audience :: Developers",
24+
"Programming Language :: Python :: 3.7",
25+
],
26+
packages=find_packages(),
27+
)

combo_cases_and_deaths/static/.gitignore

Whitespace-only changes.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"static_file_dir": "../static",
3+
"export_dir": "./receiving",
4+
"cache_dir": "./cache"
5+
}
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
from datetime import date
2+
from itertools import product
3+
import pytest
4+
5+
from delphi_combo_cases_and_deaths.run import extend_raw_date_range,sensor_signal,METRICS,SENSORS,SMOOTH_TYPES
6+
7+
def test_issue_dates():
8+
reference_dr = [date.today(),date.today()]
9+
params = {'date_range': reference_dr}
10+
n_changed = 0
11+
variants = [sensor_signal(metric, sensor, smoother) for
12+
metric, sensor, smoother in
13+
product(METRICS,SENSORS,SMOOTH_TYPES)]
14+
variants_changed = []
15+
for sensor_name,signal in variants:
16+
dr = extend_raw_date_range(params, sensor_name)
17+
if dr[0] != reference_dr[0]:
18+
n_changed += 1
19+
variants_changed.append(sensor_name)
20+
assert n_changed == len(variants) / 2, f"""Raw variants should post more days than smoothed.
21+
All variants: {variants}
22+
Date-extended variants: {variants_changed}
23+
"""

0 commit comments

Comments
 (0)