-
Notifications
You must be signed in to change notification settings - Fork 16
pipeline for Safegraph patterns #225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a24dc4d
e833515
5f7ffa1
870fd20
b9a788f
ad1d4d5
bcda25f
81be4f4
eec5f50
0c502f7
37edbf5
3a0f9a9
a2d5768
4563d1f
1633036
6358975
0909ca7
c4808fa
424ceb4
3495bee
7b7934b
986e585
b0ae5bb
4ce4630
3d76de4
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
# You should hard commit a prototype for this file, but we | ||
# want to avoid accidental adding of API tokens and other | ||
# private data parameters | ||
params.json | ||
|
||
# Do not commit output files | ||
receiving/*.csv | ||
|
||
# Remove macOS files | ||
.DS_Store | ||
|
||
# virtual environment | ||
dview/ | ||
|
||
# Byte-compiled / optimized / DLL files | ||
__pycache__/ | ||
*.py[cod] | ||
*$py.class | ||
|
||
# C extensions | ||
*.so | ||
|
||
# Distribution / packaging | ||
coverage.xml | ||
.Python | ||
build/ | ||
develop-eggs/ | ||
dist/ | ||
downloads/ | ||
eggs/ | ||
.eggs/ | ||
lib/ | ||
lib64/ | ||
parts/ | ||
sdist/ | ||
var/ | ||
wheels/ | ||
*.egg-info/ | ||
.installed.cfg | ||
*.egg | ||
MANIFEST | ||
|
||
# PyInstaller | ||
# Usually these files are written by a python script from a template | ||
# before PyInstaller builds the exe, so as to inject date/other infos into it. | ||
*.manifest | ||
*.spec | ||
|
||
# Installer logs | ||
pip-log.txt | ||
pip-delete-this-directory.txt | ||
|
||
# Unit test / coverage reports | ||
htmlcov/ | ||
.tox/ | ||
.coverage | ||
.coverage.* | ||
.cache | ||
nosetests.xml | ||
coverage.xml | ||
*.cover | ||
.hypothesis/ | ||
.pytest_cache/ | ||
|
||
# Translations | ||
*.mo | ||
*.pot | ||
|
||
# Django stuff: | ||
*.log | ||
.static_storage/ | ||
.media/ | ||
local_settings.py | ||
|
||
# Flask stuff: | ||
instance/ | ||
.webassets-cache | ||
|
||
# Scrapy stuff: | ||
.scrapy | ||
|
||
# Sphinx documentation | ||
docs/_build/ | ||
|
||
# PyBuilder | ||
target/ | ||
|
||
# Jupyter Notebook | ||
.ipynb_checkpoints | ||
|
||
# pyenv | ||
.python-version | ||
|
||
# celery beat schedule file | ||
celerybeat-schedule | ||
|
||
# SageMath parsed files | ||
*.sage.py | ||
|
||
# Environments | ||
.env | ||
.venv | ||
env/ | ||
venv/ | ||
ENV/ | ||
env.bak/ | ||
venv.bak/ | ||
|
||
# Spyder project settings | ||
.spyderproject | ||
.spyproject | ||
|
||
# Rope project settings | ||
.ropeproject | ||
|
||
# mkdocs documentation | ||
/site | ||
|
||
# mypy | ||
.mypy_cache/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
[DESIGN] | ||
|
||
min-public-methods=1 | ||
|
||
|
||
[MESSAGES CONTROL] | ||
|
||
disable=R0801, C0330, E1101, E0611, C0114, C0116, C0103, R0913, R0914, W0702 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
# Patterns Dataset in Safegraph Mobility Data | ||
|
||
We import Zip Code-level raw mobility indicators from Safegraph **Weekly | ||
Patterns** dataset, calculate functions of the raw data, and then aggregate | ||
he data to the county, hrr, msa and state levels. | ||
|
||
## Brand Information | ||
Safegraph provides daily number of visits to points of interest (POIs) in Weekly | ||
Patterns datasets which is documanted [here](https://docs.safegraph.com/docs/weekly-patterns). | ||
Base information such as location name, address, category, and brand association | ||
for POIs are provided in **Places Schema** dataset which is documented [here] | ||
(https://docs.safegraph.com/docs/places-schema). Safegraph does not update their | ||
list of POIs frequently but there does exist versioning issue. The release | ||
version can be found in `release-metadata` in Weekly Patterns dataset and there | ||
are correspounding `brand_info.csv` provided in Places Schema dataset. To save | ||
storage space, we do not download the whole Places Schema dataset, but only add | ||
new necesary `brand_info.csv` in `./statics` with suffix YYYYMM(release version). | ||
|
||
## Geographical Levels | ||
* `county`: reported using zero-padded FIPS codes (consistency with the | ||
other COVIDcast data) | ||
* `msa`: reported using cbsa (consistent with all other COVIDcast sensors) | ||
* `hrr`: reported using HRR number (consistent with all other COVIDcast sensors) | ||
* `state`: reported using two-letter postal code | ||
|
||
## Metrics, Level 1 (`m1`) | ||
* `bars_visit`: The number of visits to bars(places with naics code = 722410) | ||
* `restaurants_visit`: The number of visits to restaurants(places with naics | ||
code = 722511) | ||
|
||
## Metrics, Level 2 (`m2`) | ||
* `num`: number of new deaths on a given week | ||
* `prop`: `num` / population * 100,000 (Notice the population here only includes | ||
population aggregated at Zip Code level. If there are no POIs for a certain | ||
Zip Code, the population there won't be considered.) | ||
|
||
|
||
## API Key | ||
|
||
We access the Safegraph data using an AWS key-secret pair which is valid | ||
until June 15, 2021. The AWS credentials have been issued under | ||
@huisaddison's Safegraph Data Catalog account. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Will this file basically be exported into the public documentation? If so, we should move this into an internal-only part of the codebase. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This file is not directly included in the public documentation, but
Personal tokens required for API access are encrypted before being committed to the repository, and the keys live on the delphi server, not in git. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,67 @@ | ||
# Patterns Dataset in Safegraph Mobility Data | ||
|
||
We import raw mobility data from Safegraph Weekly Patterns, calculate some | ||
statistics upon it, and aggregate the data from the Zip Code level to County, | ||
HRR, MSA and State levels. For detailed information see the files `DETAILS.md` | ||
contained in this directory. | ||
|
||
## Running the Indicator | ||
|
||
The indicator is run by directly executing the Python module contained in this | ||
directory. The safest way to do this is to create a virtual environment, | ||
installed the common DELPHI tools, and then install the module and its | ||
dependencies. To do this, run the following code from this directory: | ||
|
||
``` | ||
python -m venv env | ||
source env/bin/activate | ||
pip install ../_delphi_utils_python/. | ||
pip install . | ||
``` | ||
|
||
One must also install the | ||
[AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html). | ||
Please refer to OS-specific instructions to install this command line | ||
interface, and verify that it is installed by calling `which aws`. | ||
If `aws` is not installed prior to running the pipeline, it will raise | ||
a `FileNotFoundError`. | ||
|
||
All of the user-changable parameters are stored in `params.json`. To execute | ||
the module and produce the output datasets (by default, in `receiving`), run | ||
the following: | ||
|
||
``` | ||
env/bin/python -m delphi_safegraph_patterns | ||
``` | ||
|
||
Once you are finished with the code, you can deactivate the virtual environment | ||
and (optionally) remove the environment itself. | ||
|
||
``` | ||
deactivate | ||
rm -r env | ||
``` | ||
|
||
## Testing the code | ||
|
||
To do a static test of the code style, it is recommended to run **pylint** on | ||
the module. To do this, run the following from the main module directory: | ||
|
||
``` | ||
env/bin/pylint delphi_safegraph_patterns | ||
``` | ||
|
||
The most aggressive checks are turned off; only relatively important issues | ||
should be raised and they should be manually checked (or better, fixed). | ||
|
||
Unit tests are also included in the module. To execute these, run the following | ||
command from this directory: | ||
|
||
``` | ||
(cd tests && ../env/bin/pytest --cov=delphi_safegraph_patterns --cov-report=term-missing) | ||
``` | ||
|
||
The output will show the number of unit tests that passed and failed, along | ||
with the percentage of code covered by the tests. None of the tests should | ||
fail and the code lines that are not covered by unit tests should be small and | ||
should not include critical sub-routines. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
## Code Review (Python) | ||
|
||
A code review of this module should include a careful look at the code and the | ||
output. To assist in the process, but certainly not in replace of it, please | ||
check the following items. | ||
|
||
**Documentation** | ||
|
||
- [ ] the README.md file template is filled out and currently accurate; it is | ||
possible to load and test the code using only the instructions given | ||
- [ ] minimal docstrings (one line describing what the function does) are | ||
included for all functions; full docstrings describing the inputs and expected | ||
outputs should be given for non-trivial functions | ||
|
||
**Structure** | ||
|
||
- [ ] code should use 4 spaces for indentation; other style decisions are | ||
flexible, but be consistent within a module | ||
- [ ] any required metadata files are checked into the repository and placed | ||
within the directory `static` | ||
- [ ] any intermediate files that are created and stored by the module should | ||
be placed in the directory `cache` | ||
- [ ] final expected output files to be uploaded to the API are placed in the | ||
`receiving` directory; output files should not be committed to the respository | ||
- [ ] all options and API keys are passed through the file `params.json` | ||
- [ ] template parameter file (`params.json.template`) is checked into the | ||
code; no personal (i.e., usernames) or private (i.e., API keys) information is | ||
included in this template file | ||
|
||
**Testing** | ||
|
||
- [ ] module can be installed in a new virtual environment | ||
- [ ] pylint with the default `.pylint` settings run over the module produces | ||
minimal warnings; warnings that do exist have been confirmed as false positives | ||
- [ ] reasonably high level of unit test coverage covering all of the main logic | ||
of the code (e.g., missing coverage for raised errors that do not currently seem | ||
possible to reach are okay; missing coverage for options that will be needed are | ||
not) | ||
- [ ] all unit tests run without errors |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# -*- coding: utf-8 -*- | ||
"""Module to process Safegraph mobility data. | ||
|
||
This file defines the functions that are made public by the module. As the | ||
module is intended to be executed though the main method, these are primarily | ||
for testing. | ||
""" | ||
|
||
from __future__ import absolute_import | ||
|
||
from . import process |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# -*- coding: utf-8 -*- | ||
"""Call the function run_module when executed. | ||
|
||
This file indicates that calling the module (`python -m MODULE_NAME`) will | ||
call the function `run_module` found within the run.py file. There should be | ||
no need to change this template. | ||
""" | ||
|
||
from .run import run_module # pragma: no cover | ||
|
||
run_module() # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be "in a given week"?
The way I read it is:
If "on" it would refer to a given week entirely.
If "in" it would refer to days within a given week.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be "in". Thanks.