Skip to content

Commit 554d0cf

Browse files
authored
Merge pull request #344 from cmu-delphi/run-google-symptoms
Pipeline for the new google symptoms indicator
2 parents 92a750c + 5a1b9a4 commit 554d0cf

File tree

76 files changed

+12330
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+12330
-0
lines changed

google_symptoms/.gitignore

Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# You should hard commit a prototype for this file, but we
2+
# want to avoid accidental adding of API tokens and other
3+
# private data parameters
4+
params.json
5+
6+
# Do not commit output files
7+
receiving/*.csv
8+
tests/receiving/*.csv
9+
10+
# Remove macOS files
11+
.DS_Store
12+
13+
# virtual environment
14+
dview/
15+
16+
# Byte-compiled / optimized / DLL files
17+
__pycache__/
18+
*.py[cod]
19+
*$py.class
20+
21+
# C extensions
22+
*.so
23+
24+
# Distribution / packaging
25+
coverage.xml
26+
.Python
27+
build/
28+
develop-eggs/
29+
dist/
30+
downloads/
31+
eggs/
32+
.eggs/
33+
lib/
34+
lib64/
35+
parts/
36+
sdist/
37+
var/
38+
wheels/
39+
*.egg-info/
40+
.installed.cfg
41+
*.egg
42+
MANIFEST
43+
44+
# PyInstaller
45+
# Usually these files are written by a python script from a template
46+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
47+
*.manifest
48+
*.spec
49+
50+
# Installer logs
51+
pip-log.txt
52+
pip-delete-this-directory.txt
53+
54+
# Unit test / coverage reports
55+
htmlcov/
56+
.tox/
57+
.coverage
58+
.coverage.*
59+
.cache
60+
nosetests.xml
61+
coverage.xml
62+
*.cover
63+
.hypothesis/
64+
.pytest_cache/
65+
66+
# Translations
67+
*.mo
68+
*.pot
69+
70+
# Django stuff:
71+
*.log
72+
.static_storage/
73+
.media/
74+
local_settings.py
75+
76+
# Flask stuff:
77+
instance/
78+
.webassets-cache
79+
80+
# Scrapy stuff:
81+
.scrapy
82+
83+
# Sphinx documentation
84+
docs/_build/
85+
86+
# PyBuilder
87+
target/
88+
89+
# Jupyter Notebook
90+
.ipynb_checkpoints
91+
92+
# pyenv
93+
.python-version
94+
95+
# celery beat schedule file
96+
celerybeat-schedule
97+
98+
# SageMath parsed files
99+
*.sage.py
100+
101+
# Environments
102+
.env
103+
.venv
104+
env/
105+
venv/
106+
ENV/
107+
env.bak/
108+
venv.bak/
109+
110+
# Spyder project settings
111+
.spyderproject
112+
.spyproject
113+
114+
# Rope project settings
115+
.ropeproject
116+
117+
# mkdocs documentation
118+
/site
119+
120+
# mypy
121+
.mypy_cache/

google_symptoms/.pylintrc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[DESIGN]
2+
3+
min-public-methods=1
4+
5+
6+
[MESSAGES CONTROL]
7+
8+
disable=R0801, C0330, E1101, E0611, C0114, C0116, C0103, R0913, R0914, W0702, W0707

google_symptoms/DETAILS.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
# Google Symptoms
2+
3+
We import the normalized symptom search term popularity data from the Google
4+
Research's Open COVID-19 Data project and export the county-level and state-level
5+
data as-is. We also aggregate the data to the MSA and HRR levels.
6+
7+
## Geographical Levels (`geo`)
8+
* `county`: reported using zero-padded FIPS codes. The county level data is derived
9+
from `/subregions/state/2020_US_state_daily_symptoms_dataset.csv`.
10+
* `msa`: reported using cbsa (consistent with all other COVIDcast sensors). The msa
11+
level data is derived from county level data using population weighted average.
12+
* `hrr`: reported using HRR number (consistent with all other COVIDcast sensors). The
13+
hrr level data is derived from county level data using population weighted average.
14+
* `state`: reported using two-letter postal code. The state level data is derived from
15+
`2020_US_daily_symptoms_dataset.csv` which includes data for District of Columbia.
16+
17+
## Metrics, Level 1 (`m1`)
18+
* `Anosmia`: Google search volume for Anosmia-related searches
19+
* `Ageusia`: Google search volume for Ageusia-related searches
20+
21+
## Metrics, Level 2 (`m2`)
22+
* `raw_search`: Google search volume reported as-is
23+
* `smoothed_search`: Google search volume using 7-day moving average
24+
25+
This data reflects the volume of Google searches mapped to symptoms such Anosmia
26+
and Ageusia. The resulting daily dataset for each region showing the relative frequency
27+
of searches for each symptom. This signal is measured in arbitrary units that are normalized
28+
for population and for the most popular symptom search term within a geographic region. Thus,
29+
values are not comparable between geographic regions. Larger numbers represent higher
30+
numbers of symptom-related searches.

google_symptoms/README.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Google Symptoms
2+
3+
We import the normalized symptom search term popularity data from the Google
4+
Research's Open COVID-19 Data project and export the county-level and state-level
5+
data as-is. We also aggregate the data to the MSA and HRR levels. For detailed
6+
information see the files `DETAILS.md` contained in this directory.
7+
8+
## Running the Indicator
9+
10+
The indicator is run by directly executing the Python module contained in this
11+
directory. The safest way to do this is to create a virtual environment,
12+
installed the common DELPHI tools, and then install the module and its
13+
dependencies. To do this, run the following code from this directory:
14+
15+
```
16+
python -m venv env
17+
source env/bin/activate
18+
pip install ../_delphi_utils_python/.
19+
pip install .
20+
```
21+
22+
All of the user-changable parameters are stored in `params.json`. To execute the module
23+
and produce the output datasets (by default, in `receiving`), run the following.
24+
25+
```
26+
env/bin/python -m delphi_google_symptoms
27+
```
28+
29+
Once you are finished with the code, you can deactivate the virtual environment
30+
and (optionally) remove the environment itself.
31+
32+
```
33+
deactivate
34+
rm -r env
35+
```
36+
37+
## Testing the code
38+
39+
To do a static test of the code style, it is recommended to run **pylint** on
40+
the module. To do this, run the following from the main module directory:
41+
42+
```
43+
env/bin/pylint delphi_google_symptoms
44+
```
45+
46+
The most aggressive checks are turned off; only relatively important issues
47+
should be raised and they should be manually checked (or better, fixed).
48+
49+
Unit tests are also included in the module. To execute these, run the following
50+
command from this directory:
51+
52+
```
53+
(cd tests && ../env/bin/pytest --cov=delphi_google_symptoms --cov-report=term-missing)
54+
```
55+
56+
The output will show the number of unit tests that passed and failed, along
57+
with the percentage of code covered by the tests. None of the tests should
58+
fail and the code lines that are not covered by unit tests should be small and
59+
should not include critical sub-routines.
60+
61+
- Jenkins test #1

google_symptoms/REVIEW.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## Code Review (Python)
2+
3+
A code review of this module should include a careful look at the code and the
4+
output. To assist in the process, but certainly not in replace of it, please
5+
check the following items.
6+
7+
**Documentation**
8+
9+
- [ ] the README.md file template is filled out and currently accurate; it is
10+
possible to load and test the code using only the instructions given
11+
- [ ] minimal docstrings (one line describing what the function does) are
12+
included for all functions; full docstrings describing the inputs and expected
13+
outputs should be given for non-trivial functions
14+
15+
**Structure**
16+
17+
- [ ] code should use 4 spaces for indentation; other style decisions are
18+
flexible, but be consistent within a module
19+
- [ ] any required metadata files are checked into the repository and placed
20+
within the directory `static`
21+
- [ ] any intermediate files that are created and stored by the module should
22+
be placed in the directory `cache`
23+
- [ ] final expected output files to be uploaded to the API are placed in the
24+
`receiving` directory; output files should not be committed to the respository
25+
- [ ] all options and API keys are passed through the file `params.json`
26+
- [ ] template parameter file (`params.json.template`) is checked into the
27+
code; no personal (i.e., usernames) or private (i.e., API keys) information is
28+
included in this template file
29+
30+
**Testing**
31+
32+
- [ ] module can be installed in a new virtual environment
33+
- [ ] pylint with the default `.pylint` settings run over the module produces
34+
minimal warnings; warnings that do exist have been confirmed as false positives
35+
- [ ] reasonably high level of unit test coverage covering all of the main logic
36+
of the code (e.g., missing coverage for raised errors that do not currently seem
37+
possible to reach are okay; missing coverage for options that will be needed are
38+
not)
39+
- [ ] all unit tests run without errors

google_symptoms/cache/.gitignore

Whitespace-only changes.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# -*- coding: utf-8 -*-
2+
"""Module to pull and clean indicators from the Google Research's Open
3+
COVID-19 Data project.
4+
This file defines the functions that are made public by the module. As the
5+
module is intended to be executed though the main method, these are primarily
6+
for testing.
7+
"""
8+
9+
from __future__ import absolute_import
10+
11+
from . import pull
12+
from . import run
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# -*- coding: utf-8 -*-
2+
"""Call the function run_module when executed.
3+
4+
This file indicates that calling the module (`python -m delphi_google_symptoms`) will
5+
call the function `run_module` found within the run.py file. There should be
6+
no need to change this template.
7+
"""
8+
9+
from .run import run_module # pragma: no cover
10+
11+
run_module() # pragma: no cover
Lines changed: 83 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,83 @@
1+
"""Registry for constants"""
2+
from functools import partial
3+
from datetime import timedelta
4+
5+
from .smooth import (
6+
identity,
7+
kday_moving_average,
8+
)
9+
10+
# global constants
11+
METRICS = ["Anosmia", "Ageusia"]
12+
SMOOTHERS = ["raw", "smoothed"]
13+
GEO_RESOLUTIONS = [
14+
"state",
15+
"county",
16+
"msa",
17+
"hrr"
18+
]
19+
20+
seven_day_moving_average = partial(kday_moving_average, k=7)
21+
SMOOTHERS_MAP = {
22+
"raw": (identity, lambda d: d - timedelta(days=7)),
23+
"smoothed": (seven_day_moving_average, lambda d: d),
24+
}
25+
26+
STATE_TO_ABBREV = {'Alabama':'al',
27+
'Alaska': 'ak',
28+
# 'American Samoa': 'as',
29+
'Arizona': 'az',
30+
'Arkansas': 'ar',
31+
'California': 'ca',
32+
'Colorado': 'co',
33+
'Connecticut': 'ct',
34+
'Delaware': 'de',
35+
# 'District of Columbia': 'dc',
36+
'Florida': 'fl',
37+
'Georgia': 'ga',
38+
# 'Guam': 'gu',
39+
'Hawaii': 'hi',
40+
'Idaho': 'id',
41+
'Illinois': 'il',
42+
'Indiana': 'in',
43+
'Iowa': 'ia',
44+
'Kansas': 'ks',
45+
'Kentucky': 'ky',
46+
'Louisiana': 'la',
47+
'Maine': 'me',
48+
'Maryland': 'md',
49+
'Massachusetts': 'ma',
50+
'Michigan': 'mi',
51+
'Minnesota': 'mn',
52+
'Mississippi': 'ms',
53+
'Missouri': 'mo',
54+
'Montana': 'mt',
55+
'Nebraska': 'ne',
56+
'Nevada': 'nv',
57+
'New_Hampshire': 'nh',
58+
'New_Jersey': 'nj',
59+
'New_Mexico':'nm',
60+
'New_York': 'ny',
61+
'North_Carolina': 'nc',
62+
'North_Dakota': 'nd',
63+
# 'Northern Mariana Islands': 'mp',
64+
'Ohio': 'oh',
65+
'Oklahoma': 'ok',
66+
'Oregon': 'or',
67+
'Pennsylvania': 'pa',
68+
# 'Puerto Rico': 'pr',
69+
'Rhode_Island': 'ri',
70+
'South_Carolina': 'sc',
71+
'South_Dakota': 'sd',
72+
'Tennessee': 'tn',
73+
'Texas': 'tx',
74+
'Utah': 'ut',
75+
'Vermont': 'vt',
76+
# 'Virgin Islands': 'vi',
77+
'Virginia': 'va',
78+
'Washington': 'wa',
79+
'West_Virginia': 'wv',
80+
'Wisconsin': 'wi',
81+
'Wyoming': 'wy'}
82+
83+
DC_FIPS = "11001"

0 commit comments

Comments
 (0)