Skip to content

Commit f42f045

Browse files
authored
Merge branch 'main' into utils-lint
2 parents 9f7ff21 + a471262 commit f42f045

File tree

571 files changed

+726
-120634
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

571 files changed

+726
-120634
lines changed

.github/CONTRIBUTING.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ git checkout -b dev-my-feature-branch
6161

6262
### Creating your indicator
6363

64-
Create a directory for your new indicator by making a copy of `_template_r` or `_template_python` depending on the programming language you intend to use. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.
64+
Create a directory for your new indicator by making a copy of `_template_r` or `_template_python` depending on the programming language you intend to use. If using Python, add the name of the directory to the list found in `jobs > build > strategy > matrix > packages` in `.github/workflows/python-ci.yml`, which will enable automated checks for your indicator when you make PRs. The template copies of `README.md` and `REVIEW.md` include the minimum requirements for code structure, documentation, linting, testing, and method of configuration. Beyond that, we don't have any established restrictions on implementation; you can look at other existing indicators see some examples of code layout, organization, and general approach.
6565

6666
- Consult your peers with questions! :handshake:
6767

.github/pull_request_template.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
### Description
2+
Type of change (bug fix, new feature, etc), brief description, and motivation for these changes.
3+
4+
### Changelog
5+
- change1
6+
- change2
7+
8+
### Fixes
9+
- Fixes #(issue)

.github/workflows/python-ci.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -7,13 +7,13 @@ on:
77
push:
88
branches: [ main ]
99
pull_request:
10-
types: [ review_requested, ready_for_review ]
10+
types: [ opened, synchronize, reopened, ready_for_review ]
1111
branches: [ main ]
1212

1313
jobs:
1414
build:
15-
1615
runs-on: ubuntu-20.04
16+
if: github.event.pull_request.draft == false
1717
strategy:
1818
matrix:
1919
#packages: [_delphi_utils_python, cdc_covidnet, claims_hosp, combo_cases_and_deaths, google_symptoms, jhu, nchs_mortality, quidel, quidel_covidtest, safegraph, safegraph_patterns, usafacts]

_delphi_utils_python/.pylintrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ disable=logging-format-interpolation,
1414
# Allow arbitrarily short-named variables.
1515
variable-rgx=[a-z_][a-z0-9_]*
1616
argument-rgx=[a-z_][a-z0-9_]*
17+
attr-rgx=[a-z_][a-z0-9_]*
1718

1819
[DESIGN]
1920

_delphi_utils_python/delphi_utils/smooth.py

Lines changed: 67 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
1-
"""
1+
"""Smoother utility.
2+
23
This file contains the smoothing utility functions. We have a number of
34
possible smoothers to choose from: windowed average, local weighted regression,
45
and a causal Savitzky-Golay filter.
@@ -10,14 +11,16 @@
1011
docstrings for details.
1112
"""
1213

14+
from typing import Union
1315
import warnings
1416

1517
import numpy as np
1618
import pandas as pd
1719

1820

1921
class Smoother: # pylint: disable=too-many-instance-attributes
20-
"""
22+
"""Smoother class.
23+
2124
This is the smoothing utility class. This class holds the parameter settings for its smoother
2225
methods and provides reasonable defaults. Basic usage can be found in the examples below.
2326
@@ -36,9 +39,13 @@ class Smoother: # pylint: disable=too-many-instance-attributes
3639
Descriptions of the smoothers are available in the doc strings. Full mathematical
3740
details are in: https://github.com/cmu-delphi/covidcast-modeling/ in the folder
3841
'indicator_smoother'.
42+
poly_fit_degree: int
43+
A parameter for the 'savgol' smoother which sets the degree of the polynomial fit.
3944
window_length: int
40-
The length of the averaging window for 'savgol' and 'moving_average'.
41-
This value is in the units of the data, which tends to be days.
45+
The length of the fitting window for 'savgol' and the averaging window 'moving_average'.
46+
This value is in the units provided by the data, which are likely to be days for Delphi.
47+
Note that if window_length is smaller than the length of the signal, then only the
48+
imputation method is run on the signal.
4249
gaussian_bandwidth: float or None
4350
If float, all regression is done with Gaussian weights whose variance is
4451
half the gaussian_bandwidth. If None, performs unweighted regression. (Applies
@@ -60,14 +67,19 @@ class Smoother: # pylint: disable=too-many-instance-attributes
6067
The smallest value to allow in a signal. If None, there is no smallest value.
6168
Currently only implemented for 'left_gauss_linear'. This should probably not be in the scope
6269
of the smoothing utility.
63-
poly_fit_degree: int
64-
A parameter for the 'savgol' smoother which sets the degree of the polynomial fit.
70+
boundary_method: {'shortened_window', 'identity', 'nan'}
71+
Determines how the 'savgol' method handles smoothing at the (left) boundary, where the past
72+
data length is shorter than the window_length parameter. If 'shortened_window', it uses the
73+
maximum window available; at the very edge (generally up to poly_fit_degree) it keeps the
74+
same value as the raw signal. If 'identity', it just keeps the raw signal. If 'nan', it
75+
writes nans. For the other smoothing methods, 'moving_average' writes nans and
76+
'left_gauss_linear' uses a shortened window.
6577
6678
Methods
6779
----------
6880
smooth: np.ndarray or pd.Series
6981
Takes a 1D signal and returns a smoothed version.
70-
The input and the output have the same lengthand type.
82+
The input and the output have the same length and type.
7183
7284
Example Usage
7385
-------------
@@ -108,6 +120,7 @@ def __init__(
108120
minval=None,
109121
boundary_method="shortened_window",
110122
):
123+
"""See class docstring."""
111124
self.smoother_name = smoother_name
112125
self.poly_fit_degree = poly_fit_degree
113126
self.window_length = window_length
@@ -118,35 +131,38 @@ def __init__(
118131

119132
valid_smoothers = {"savgol", "left_gauss_linear", "moving_average", "identity"}
120133
valid_impute_methods = {"savgol", "zeros", None}
134+
valid_boundary_methods = {"shortened_window", "identity", "nan"}
121135
if self.smoother_name not in valid_smoothers:
122-
raise ValueError("Invalid smoother name given.")
136+
raise ValueError("Invalid smoother_name given.")
123137
if self.impute_method not in valid_impute_methods:
124-
raise ValueError("Invalid impute method given.")
138+
raise ValueError("Invalid impute_method given.")
139+
if self.boundary_method not in valid_boundary_methods:
140+
raise ValueError("Invalid boundary_method given.")
125141

126142
if smoother_name == "savgol":
143+
# The polynomial fitting is done on a past window of size window_length
144+
# including the current day value.
127145
self.coeffs = self.savgol_coeffs(-self.window_length + 1, 0)
128146
else:
129147
self.coeffs = None
130148

131-
def smooth(self, signal):
132-
"""
133-
The major workhorse smoothing function. Can use one of three smoothing
134-
methods, as specified by the class variable smoother_name.
149+
def smooth(self, signal: Union[np.ndarray, pd.Series]) -> Union[np.ndarray, pd.Series]:
150+
"""Apply a smoother to a signal.
151+
152+
The major workhorse smoothing function. Imputes the nans and then applies
153+
a smoother to the signal.
135154
136155
Parameters
137156
----------
138157
signal: np.ndarray or pd.Series
139158
A 1D signal to be smoothed.
140159
160+
Returns
161+
----------
141162
signal_smoothed: np.ndarray or pd.Series
142163
A smoothed 1D signal. Returns an array of the same type and length as
143164
the input.
144165
"""
145-
if len(signal) < self.window_length:
146-
raise ValueError(
147-
"The window_length must be smaller than the length of the signal."
148-
)
149-
150166
is_pandas_series = isinstance(signal, pd.Series)
151167
signal = signal.to_numpy() if is_pandas_series else signal
152168

@@ -158,14 +174,16 @@ def smooth(self, signal):
158174
signal_smoothed = self.left_gauss_linear_smoother(signal)
159175
elif self.smoother_name == "moving_average":
160176
signal_smoothed = self.moving_average_smoother(signal)
161-
elif self.smoother_name == "identity":
162-
signal_smoothed = signal
177+
else:
178+
signal_smoothed = signal.copy()
163179

164-
return signal_smoothed if not is_pandas_series else pd.Series(signal_smoothed)
180+
signal_smoothed = signal_smoothed if not is_pandas_series else pd.Series(signal_smoothed)
181+
return signal_smoothed
165182

166183
def impute(self, signal):
167-
"""
168-
Imputes the nan values in the signal.
184+
"""Impute the nan values in the signal.
185+
186+
See the class docstring for an explanation of the impute methods.
169187
170188
Parameters
171189
----------
@@ -191,8 +209,7 @@ def impute(self, signal):
191209
return imputed_signal
192210

193211
def moving_average_smoother(self, signal):
194-
"""
195-
Computes a moving average on the signal.
212+
"""Compute a moving average on the signal.
196213
197214
Parameters
198215
----------
@@ -219,11 +236,10 @@ def moving_average_smoother(self, signal):
219236
return signal_smoothed
220237

221238
def left_gauss_linear_smoother(self, signal):
222-
"""
223-
DEPRECATED: This method is available to help sanity check the 'savgol' method.
224-
It is a little slow, so use 'savgol' with poly_fit_degree=1 instead.
239+
"""Smooth the y-values using a local linear regression with Gaussian weights.
225240
226-
Smooth the y-values using a local linear regression with Gaussian weights.
241+
DEPRECATED: This method is available to help sanity check the 'savgol' method.
242+
Use 'savgol' with poly_fit_degree=1 and the appropriate gaussian_bandwidth instead.
227243
228244
At each time t, we use the data from times 1, ..., t-dt, weighted
229245
using the Gaussian kernel, to produce the estimate at time t.
@@ -264,7 +280,8 @@ def left_gauss_linear_smoother(self, signal):
264280
return signal_smoothed
265281

266282
def savgol_predict(self, signal):
267-
"""
283+
"""Predict a single value using the savgol method.
284+
268285
Fits a polynomial through the values given by the signal and returns the value
269286
of the polynomial at the right-most signal-value. More precisely, fits a polynomial
270287
f(t) of degree poly_fit_degree through the points signal[-n], signal[-n+1] ..., signal[-1],
@@ -280,14 +297,15 @@ def savgol_predict(self, signal):
280297
predicted_value: float
281298
The anticipated value that comes after the end of the signal based on a polynomial fit.
282299
"""
300+
# Add one
283301
coeffs = self.savgol_coeffs(-len(signal) + 1, 0)
284302
predicted_value = signal @ coeffs
285303
return predicted_value
286304

287305
def savgol_coeffs(self, nl, nr):
288-
"""
289-
Solves for the Savitzky-Golay coefficients. The coefficients c_i
290-
give a filter so that
306+
"""Solve for the Savitzky-Golay coefficients.
307+
308+
The coefficients c_i give a filter so that
291309
y = sum_{i=-{n_l}}^{n_r} c_i x_i
292310
is the value at 0 (thus the constant term) of the polynomial fit
293311
through the points {x_i}. The coefficients are c_i are calculated as
@@ -299,9 +317,9 @@ def savgol_coeffs(self, nl, nr):
299317
Parameters
300318
----------
301319
nl: int
302-
The left window bound for the polynomial fit.
320+
The left window bound for the polynomial fit, inclusive.
303321
nr: int
304-
The right window bound for the polynomial fit.
322+
The right window bound for the polynomial fit, inclusive.
305323
poly_fit_degree: int
306324
The degree of the polynomial to be fit.
307325
gaussian_bandwidth: float or None
@@ -337,14 +355,10 @@ def savgol_coeffs(self, nl, nr):
337355
return coeffs
338356

339357
def savgol_smoother(self, signal):
340-
"""
341-
Returns a specific type of convolution of the 1D signal with the 1D signal
342-
coeffs, respecting boundary effects. That is, the output y is
343-
signal_smoothed_i = signal_i
344-
signal_smoothed_i = sum_{j=0}^n coeffs_j signal_{i+j}, if i >= len(coeffs) - 1
345-
In words, entries close to the left boundary are not smoothed, the window does
346-
not proceed over the right boundary, and the rest of the values are regular
347-
convolution.
358+
"""Smooth signal with the savgol smoother.
359+
360+
Returns a convolution of the 1D signal with the Savitzky-Golay coefficients, respecting
361+
boundary effects. For an explanation of boundary effects methods, see the class docstring.
348362
349363
Parameters
350364
----------
@@ -356,15 +370,14 @@ def savgol_smoother(self, signal):
356370
signal_smoothed: np.ndarray
357371
A smoothed 1D signal of same length as signal.
358372
"""
359-
360-
# reverse because np.convolve reverses the second argument
373+
# Reverse because np.convolve reverses the second argument
361374
temp_reversed_coeffs = np.array(list(reversed(self.coeffs)))
362375

363-
# does the majority of the smoothing, with the calculated coefficients
376+
# Smooth the part of the signal away from the boundary first
364377
signal_padded = np.append(np.nan * np.ones(len(self.coeffs) - 1), signal)
365378
signal_smoothed = np.convolve(signal_padded, temp_reversed_coeffs, mode="valid")
366379

367-
# this section handles the smoothing behavior at the (left) boundary:
380+
# This section handles the smoothing behavior at the (left) boundary:
368381
# - shortened_window (default) applies savgol with a smaller window to do the fit
369382
# - identity keeps the original signal (doesn't smooth)
370383
# - nan writes nans
@@ -373,22 +386,23 @@ def savgol_smoother(self, signal):
373386
if ix == 0:
374387
signal_smoothed[ix] = signal[ix]
375388
else:
389+
# At the very edge, the design matrix is often singular, in which case
390+
# we just fall back to the raw signal
376391
try:
377-
signal_smoothed[ix] = self.savgol_predict(signal[: (ix + 1)])
378-
except np.linalg.LinAlgError: # for small ix, the design matrix is singular
392+
signal_smoothed[ix] = self.savgol_predict(signal[:ix+1])
393+
except np.linalg.LinAlgError:
379394
signal_smoothed[ix] = signal[ix]
380395
return signal_smoothed
381396
elif self.boundary_method == "identity":
382-
for ix in range(len(self.coeffs)):
397+
for ix in range(min(len(self.coeffs), len(signal))):
383398
signal_smoothed[ix] = signal[ix]
384399
return signal_smoothed
385400
elif self.boundary_method == "nan":
386401
return signal_smoothed
387-
else:
388-
raise ValueError("Unknown boundary method.")
389402

390403
def savgol_impute(self, signal):
391-
"""
404+
"""Impute the nan values in signal using savgol.
405+
392406
This method fills the nan values in the signal with an M-degree polynomial fit
393407
on a rolling window of the immediate past up to window_length data points.
394408

_delphi_utils_python/tests/test_signal.py

Lines changed: 9 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
"""Tests for delphi_utils.signal."""
22
from unittest.mock import patch
3-
import pandas as pd
43

4+
import pandas as pd
5+
import pytest
56
from delphi_utils.signal import add_prefix, public_signal
67

78
# Constants for mocking out the call to `covidcast.metadata` within `public_signal()`.
@@ -14,12 +15,17 @@ class TestSignal:
1415
def test_add_prefix_to_all(self):
1516
"""Tests that `add_prefix()` derives work-in-progress names for all input signals."""
1617
assert add_prefix(["sig1", "sig3"], True, prefix="wip_") == ["wip_sig1", "wip_sig3"]
17-
18+
1819
def test_add_prefix_to_specified(self):
1920
"""Tests that `add_prefix()` derives work-in-progress names for specified signals."""
2021
assert add_prefix(["sig1", "sig2", "sig3"], ["sig2"], prefix="wip_") ==\
2122
["sig1", "wip_sig2", "sig3"]
22-
23+
24+
def test_invalid_prefix_input(self):
25+
"""Tests that `add_prefix()` raises a ValueError when invalid input is given."""
26+
with pytest.raises(ValueError):
27+
add_prefix(None, None)
28+
2329
@patch("covidcast.metadata")
2430
def test_add_prefix_to_non_public(self, metadata):
2531
"""Tests that `add_prefix()` derives work-in-progress names for non-public signals."""

_delphi_utils_python/tests/test_smooth.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -167,11 +167,11 @@ def test_impute(self):
167167
with pytest.raises(ValueError):
168168
imputed_signal = smoother.savgol_impute(signal)
169169

170-
# test window_length > len(signal)
170+
# test window_length > len(signal) and boundary_method="identity"
171171
signal = np.arange(20)
172172
smoother = Smoother(smoother_name="savgol", boundary_method="identity", window_length=30)
173-
with pytest.raises(ValueError):
174-
smoothed_signal = smoother.smooth(signal)
173+
smoothed_signal = smoother.smooth(signal)
174+
assert np.allclose(signal, smoothed_signal)
175175

176176
# test the boundary methods
177177
signal = np.arange(20)

_template_python/.pylintrc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ disable=logging-format-interpolation,
1414
# Allow arbitrarily short-named variables.
1515
variable-rgx=[a-z_][a-z0-9_]*
1616
argument-rgx=[a-z_][a-z0-9_]*
17+
attr-rgx=[a-z_][a-z0-9_]*
1718

1819
[DESIGN]
1920

0 commit comments

Comments
 (0)