Skip to content

Commit 73a9063

Browse files
committed
Merge branch 'main' into automate-hospital-admission-patch
2 parents 1666e0c + 3450dfc commit 73a9063

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

66 files changed

+905
-546
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.3.55
2+
current_version = 0.3.56
33
commit = True
44
message = chore: bump covidcast-indicators to {new_version}
55
tag = False

.github/CONTRIBUTING.md

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ The production branch is configured to automatically deploy to our production en
1212

1313
* everything else
1414

15-
All other branches are development branches. We don't enforce a naming policy.
15+
All other branches are development branches. We don't enforce a naming policy, but it is recommended to prefix all branches you create with your name, username, or initials (e.g. `username/branch-name`).
16+
17+
We don't forbid force-pushing, but please keep to a minimum and be careful of using when modifying a branch at the same time as others.
1618

1719
## Issues
1820

@@ -29,7 +31,7 @@ So, how does one go about developing a pipeline for a new data source?
2931
### tl;dr
3032

3133
1. Create your new indicator branch from `main`.
32-
2. Build it using the appropriate template, following the guidelines in the included README.md and REVIEW.md files.
34+
2. Build it using the [indicator template](https://github.com/cmu-delphi/covidcast-indicators/tree/main/_template_python), following the guidelines in the included README.md, REVIEW.md, and INDICATOR_DEV_GUIDE.md files.
3335
3. Make some stuff!
3436
4. When your stuff works, push your development branch to remote, and open a PR against `main` for review.
3537
5. Once your PR has been merged, consult with a platform engineer for the remaining production setup needs. They will create a deployment workflow for your indicator including any necessary production parameters. Production secrets are encrypted in the Ansible vault. This workflow will be tested in staging by admins, who will consult you about any problems they encounter.
@@ -50,7 +52,7 @@ git checkout -b dev-my-feature-branch
5052

5153
### Creating your indicator
5254

53-
Create a directory for your new indicator by making a copy of `_template_r` or `_template_python` depending on the programming language you intend to use. If using Python, add the name of the directory to the list found in `jobs > build > strategy > matrix > packages` in `.github/workflows/python-ci.yml`, which will enable automated checks for your indicator when you make PRs. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.
55+
Create a directory for your new indicator by making a copy of `_template_python`. (We also make a `_template_r` available, but R should be only used as a last resort, due to complications using it in production.) Add the name of the directory to the list found in `jobs > build > strategy > matrix > packages` in `.github/workflows/python-ci.yml`, which will enable automated checks for your indicator when you make PRs. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.
5456

5557
* Consult your peers with questions! :handshake:
5658

@@ -62,7 +64,7 @@ Once you have something that runs locally and passes tests you set up your remot
6264
git push -u origin dev-my-feature-branch
6365
```
6466

65-
You can then draft public API documentation for people who would fetch this
67+
You can then draft [public API documentation](https://cmu-delphi.github.io/delphi-epidata/) for people who would fetch this
6668
data from the API. Public API documentation is kept in the delphi-epidata
6769
repository, and there is a [template Markdown
6870
file](https://github.com/cmu-delphi/delphi-epidata/blob/main/docs/api/covidcast-signals/_source-template.md)
@@ -104,7 +106,8 @@ We use a branch-based git workflow coupled with [Jenkins](https://www.jenkins.io
104106
* Package - Tar and gzip the built environment.
105107
* Deploy - Trigger an Ansible playbook to place the built package onto the runtime host, place any necessary production configuration, and adjust the runtime envirnemnt (if necessary).
106108

107-
There are several additional Jenkins-specific files that will need to be created for each indicator, as well as some configuration additions to the runtime host. It will be important to pair with a platform engineer to prepare the necessary production environment needs, test the workflow, validate on production, and ultimately sign off on a production release.
109+
There are several additional Jenkins-specific files that will need to be created for each indicator, as well as some configuration additions to the runtime host.
110+
It will be important to pair with a platform engineer to prepare the necessary production environment needs, test the workflow, validate on production, and ultimately sign off on a production release.
108111

109112
### Preparing container images of indicators
110113

.github/workflows/publish-release.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ jobs:
8686
- name: Release
8787
run: |
8888
make release
89-
- uses: actions/upload-artifact@v2
89+
- uses: actions/upload-artifact@v4
9090
with:
9191
name: delphi_utils
9292
path: _delphi_utils_python/dist/*.tar.gz

.github/workflows/python-ci.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ jobs:
5151
with:
5252
python-version: 3.8
5353
cache: "pip"
54-
cache-dependency-path: "setup.py"
54+
cache-dependency-path: "pyproject.toml"
5555
- name: Install testing dependencies
5656
run: |
5757
python -m pip install --upgrade pip

Jenkinsfile

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,10 @@ def deploy_production = [:]
1818

1919
pipeline {
2020
agent any
21+
environment {
22+
// Set the PATH variable to include the pyenv shims directory.
23+
PATH = "/var/lib/jenkins/.pyenv/shims:${env.PATH}"
24+
}
2125
stages {
2226
stage('Build dev/feature branch') {
2327
when {

_delphi_utils_python/.bumpversion.cfg

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
[bumpversion]
2-
current_version = 0.3.24
2+
current_version = 0.3.25
33
commit = True
44
message = chore: bump delphi_utils to {new_version}
55
tag = False
66

7-
[bumpversion:file:setup.py]
7+
[bumpversion:file:pyproject.toml]
88

99
[bumpversion:file:delphi_utils/__init__.py]

_delphi_utils_python/DEVELOP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ To install the module in your default version of Python, run the
99
following from this directory:
1010

1111
```
12-
pip install .
12+
pip install -e '.[dev]'
1313
```
1414

1515
As described in each of the indicator code directories, you will want to install

_delphi_utils_python/Makefile

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@ venv:
66
install: venv
77
. env/bin/activate; \
88
pip install wheel ; \
9-
pip install -e .
9+
pip install -e '.[dev]'
1010

1111
install-ci: venv
1212
. env/bin/activate; \
13-
pip install wheel ; \
14-
pip install .
13+
pip install 'build[virtualenv]' pylint pytest pydocstyle wheel twine ; \
14+
pip install '.[dev]'
1515

1616
lint:
1717
. env/bin/activate; pylint delphi_utils --rcfile=../pyproject.toml
@@ -30,4 +30,5 @@ clean:
3030

3131
release: lint test
3232
. env/bin/activate ; \
33-
python setup.py sdist bdist_wheel
33+
pip install 'build[virtualenv]' ; \
34+
python -m build --sdist --wheel

_delphi_utils_python/README.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,22 @@ Source code can be found here:
2222

2323
## Logger Usage
2424

25+
To make our structured logging as useful as it can be, particularly within the context of how we use logs in Elastic, the `event` argument (typically the first unnamed arg) should be a static string (to make filtering easier), and each dynamic/varying value should be specified in an individual meaningfully- and consistently-named argument to the logger call (for use in filtering, thresholding, grouping, visualization, etc).
26+
27+
### Commonly used argument names:
28+
- data_source
29+
- geo_type
30+
- signal
31+
- issue_date
32+
- filename
33+
2534
Single-thread usage.
2635

2736
```py
2837
from delphi_utils.logger import get_structured_logger
2938

3039
logger = get_structured_logger('my_logger')
31-
logger.info('Hello, world!')
40+
logger.info('Hello', name='World')
3241
```
3342

3443
Multi-thread usage.

_delphi_utils_python/delphi_utils/__init__.py

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -4,15 +4,14 @@
44
from __future__ import absolute_import
55

66
from .archive import ArchiveDiffer, GitArchiveDiffer, S3ArchiveDiffer
7-
from .export import create_export_csv
8-
from .utils import read_params
9-
10-
from .slack_notifier import SlackNotifier
11-
from .logger import get_structured_logger
7+
from .export import create_backup_csv, create_export_csv
128
from .geomap import GeoMapper
13-
from .smooth import Smoother
14-
from .signal import add_prefix
9+
from .logger import get_structured_logger
1510
from .nancodes import Nans
11+
from .signal import add_prefix
12+
from .slack_notifier import SlackNotifier
13+
from .smooth import Smoother
14+
from .utils import read_params
1615
from .weekday import Weekday
1716

18-
__version__ = "0.3.24"
17+
__version__ = "0.3.25"

_delphi_utils_python/delphi_utils/export.py

Lines changed: 76 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,18 @@
11
"""Export data in the format expected by the Delphi API."""
22
# -*- coding: utf-8 -*-
3+
import gzip
4+
import logging
35
from datetime import datetime
4-
from os.path import join
6+
from os.path import getsize, join
57
from typing import Optional
6-
import logging
78

8-
from epiweeks import Week
99
import numpy as np
1010
import pandas as pd
11+
from epiweeks import Week
1112

1213
from .nancodes import Nans
1314

15+
1416
def filter_contradicting_missing_codes(df, sensor, metric, date, logger=None):
1517
"""Find values with contradictory missingness codes, filter them, and log."""
1618
columns = ["val", "se", "sample_size"]
@@ -22,8 +24,10 @@ def filter_contradicting_missing_codes(df, sensor, metric, date, logger=None):
2224
for mask in masks:
2325
if not logger is None and df.loc[mask].size > 0:
2426
logger.info(
25-
"Filtering contradictory missing code in " +
26-
"{0}_{1}_{2}.".format(sensor, metric, date.strftime(format="%Y-%m-%d"))
27+
"Filtering contradictory missing code",
28+
sensor=sensor,
29+
metric=metric,
30+
date=date.strftime(format="%Y-%m-%d"),
2731
)
2832
df = df.loc[~mask]
2933
elif logger is None and df.loc[mask].size > 0:
@@ -130,3 +134,70 @@ def create_export_csv(
130134
export_df = export_df.sort_values(by="geo_id")
131135
export_df.to_csv(export_file, index=False, na_rep="NA")
132136
return dates
137+
138+
139+
def create_backup_csv(
140+
df: pd.DataFrame,
141+
backup_dir: str,
142+
custom_run: bool,
143+
issue: Optional[str] = None,
144+
geo_res: Optional[str] = None,
145+
sensor: Optional[str] = None,
146+
metric: Optional[str] = None,
147+
logger: Optional[logging.Logger] = None,
148+
):
149+
"""Save data for use as a backup.
150+
151+
This function is meant to save raw data fetched from data sources.
152+
Therefore, it avoids manipulating the data as much as possible to
153+
preserve the input.
154+
155+
When only required arguments are passed, data will be saved to a file of
156+
the format `<export_dir>/<today's date as YYYYMMDD>.csv`. Optional arguments
157+
should be passed if the source data is fetched from different tables or
158+
in batches by signal, geo, etc.
159+
160+
Parameters
161+
----------
162+
df: pd.DataFrame
163+
Columns: geo_id, timestamp, val, se, sample_size
164+
backup_dir: str
165+
Backup directory
166+
custom_run: bool
167+
Flag indicating if the current run is a patch, or other run where
168+
backups aren't needed. If so, don't save any data to disk
169+
issue: Optional[str]
170+
The date the data was fetched, in YYYYMMDD format. Defaults to "today"
171+
if not provided
172+
geo_res: Optional[str]
173+
Geographic resolution of the data
174+
sensor: Optional[str]
175+
Sensor that has been calculated (cumulative_counts vs new_counts)
176+
metric: Optional[str]
177+
Metric we are considering, if any.
178+
logger: Optional[logging.Logger]
179+
Pass a logger object here to log information about name and size of the backup file.
180+
181+
Returns
182+
---------
183+
dates: pd.Series[datetime]
184+
Series of dates for which CSV files were exported.
185+
"""
186+
if not custom_run:
187+
# Label the file with today's date (the date the data was fetched).
188+
if not issue:
189+
issue = datetime.today().strftime("%Y%m%d")
190+
191+
backup_filename = [issue, geo_res, metric, sensor]
192+
backup_filename = "_".join(filter(None, backup_filename)) + ".csv.gz"
193+
backup_file = join(backup_dir, backup_filename)
194+
195+
with gzip.open(backup_file, "wt", newline="") as f:
196+
df.to_csv(f, index=False, na_rep="NA")
197+
198+
if logger:
199+
logger.info(
200+
"Backup file created",
201+
backup_file=backup_file,
202+
backup_size=getsize(backup_file),
203+
)

_delphi_utils_python/delphi_utils/flash_eval/eval_day.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -153,8 +153,7 @@ def output(evd_ranking, day, lag, signal, logger):
153153
p_text += f"\t{start_link}|*{index}*, {'{:.2f}'.format(value)}>\n"
154154
else:
155155
break
156-
name = f"Signal: {signal} Lag: {lag}"
157-
logger.info(name, payload=p_text)
156+
logger.info("FLaSH: worth inspecting", signal=signal, lag=lag, payload=p_text)
158157

159158

160159
def evd_ranking_fn(ts_streams, EVD_max, EVD_min):

_delphi_utils_python/delphi_utils/geomap.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -443,7 +443,7 @@ def add_population_column(
443443
---------
444444
data: pd.DataFrame
445445
The dataframe with a FIPS code column.
446-
geocode_type: {"fips", "zip"}
446+
geocode_type:
447447
The type of the geocode contained in geocode_col.
448448
geocode_col: str, default None
449449
The name of the column containing the geocodes. If None, uses the geocode_type
@@ -671,8 +671,10 @@ def aggregate_by_weighted_sum(
671671
to a from_geo, e.g. "wastewater collection site").
672672
to_geo: str
673673
The column name of the geocode to aggregate to.
674-
sensor: str
674+
sensor_col: str
675675
The column name of the sensor to aggregate.
676+
time_col: str
677+
The column name of the timestamp to aggregate over.
676678
population_column: str
677679
The column name of the population to weight the sensor by.
678680

_delphi_utils_python/delphi_utils/logger.py

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
"""Structured logger utility for creating JSON logs.
22
3-
See the delphi_utils README.md for usage examples.
3+
To make our structured logging as useful as it can be, particularly within the context of how we use logs in Elastic,
4+
the `event` argument (typically the first unnamed arg) should be a static string (to make filtering easier),
5+
and each dynamic/varying value should be specified in an individual meaningfully- and consistently-named argument
6+
to the logger call (for use in filtering, thresholding, grouping, visualization, etc)
47
5-
The Delphi group uses two ~identical versions of this file.
6-
Try to keep them in sync with edits, for sanity.
7-
https://github.com/cmu-delphi/covidcast-indicators/blob/main/_delphi_utils_python/delphi_utils/logger.py
8-
https://github.com/cmu-delphi/delphi-epidata/blob/dev/src/common/logger.py
8+
See the delphi_utils README.md for usage examples.
99
"""
1010

1111
import contextlib

_delphi_utils_python/delphi_utils/runner.py

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@ def run_indicator_pipeline(indicator_fn: Callable[[Params], None],
5151

5252
#Get version and indicator name for startup
5353
ind_name = indicator_fn.__module__.replace(".run", "")
54+
5455
#Check for version.cfg in indicator directory
5556
if os.path.exists("version.cfg"):
5657
with open("version.cfg") as ver_file:
@@ -59,9 +60,15 @@ def run_indicator_pipeline(indicator_fn: Callable[[Params], None],
5960
if "current_version" in line:
6061
current_version = str.strip(line)
6162
current_version = current_version.replace("current_version = ", "")
62-
#Logging - Starting Indicator
63-
logger.info(f"Started {ind_name} with covidcast-indicators version {current_version}")
64-
else: logger.info(f"Started {ind_name} without version.cfg")
63+
logger.info(
64+
"Started a covidcast-indicator",
65+
indicator_name=ind_name,
66+
current_version=current_version,
67+
)
68+
else:
69+
logger.info(
70+
"Started a covidcast-indicator without version.cfg", indicator_name=ind_name
71+
)
6572

6673
indicator_fn(params)
6774
validator = validator_fn(params)
@@ -77,8 +84,10 @@ def run_indicator_pipeline(indicator_fn: Callable[[Params], None],
7784
break
7885
time.sleep(1)
7986
else:
80-
logger.error(f"Flash step timed out ({timer} s), terminating",
81-
elapsed_time_in_seconds = round(time.time() - start, 2))
87+
logger.error(
88+
"Flash step timed out, terminating",
89+
elapsed_time_in_seconds=round(time.time() - start, 2),
90+
)
8291
t1.terminate()
8392
t1.join()
8493
if validator:

0 commit comments

Comments
 (0)