Skip to content

Add the smoothing utility #176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Nov 3, 2020
Merged

Add the smoothing utility #176

merged 6 commits into from
Nov 3, 2020

Conversation

dshemetov
Copy link
Contributor

@dshemetov dshemetov commented Jul 31, 2020

This PR adds the smoothing utility and its tests as part of #171. The utility has not replaced the previous smoother implemetations in the rest of the code base, as I received the suggestion to do those in separate PRs.

@dshemetov dshemetov requested a review from krivard July 31, 2020 23:59
@dshemetov dshemetov marked this pull request as draft August 1, 2020 00:04
@dshemetov
Copy link
Contributor Author

Just to confirm: @krivard I recall you suggested I start by integrating into jhu? I can start that work next.

@dshemetov
Copy link
Contributor Author

I think this utility can contain its default values in its implementation in smooth.py, so I'm removing the dependence on config.py. If someone disagrees with that decision, please let me know, I'm very open to suggestions to all of my code 😃

@dshemetov dshemetov changed the title Adding the smoothing utility Add the smoothing utility Aug 1, 2020
Copy link
Contributor

@krivard krivard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pointed out some dangling TODOs but no substantive problems. We should probably get @statsmaths to do a technical review as a cross-check.

* implements the existing smoothers found in the indicators (window average, gaussian local linear regression)
* adds the more general Savitzky-Golay filter as 'savgol' with gaussian weighting
* adds imputation options
* adds tests to match
@nmdefries nmdefries self-requested a review October 16, 2020 20:36
@krivard krivard removed the request for review from taylor-arnold October 16, 2020 21:15
methods and provides reasonable defaults. Basic usage can be found in the examples below.

The smoother function takes numpy arrays as input, expecting the values to come from a
regularly-spaced time grid. NANs are ok, as long as the array does not begin with a NAN. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you expecting to see time series beginning with NaN? If so, it might be worth explicitly trimming them off rather than worrying about how they might cause problems.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have an error if the series begins with a NaN. I wanted the smooth methods to preserve the length of the array, expecting the user to insert the output back into a dataframe with a fixed index.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 415. Though now I think it makes sense to pull that check out to __init__.

* move smoother first value nan check to impute, so it's less buried
* clean documentation in a few functions
* remove @classmethod from savgol_coeffs
* fix breaking tests
Comment on lines 283 to 284
@classmethod
def savgol_coeffs(cls, nl, nr, poly_fit_degree, gaussian_bandwidth=100):
Copy link
Contributor

@nmdefries nmdefries Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see that this function is ever used as a class method (although I'm not familiar with this functionality). If you end up removing the @classmethod, this method can use class attributes for poly_fit_degree and gaussian_bandwidth.

The gaussian_bandwidth default here is different than the default in __init__().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching the different default. I think the @classmethod is left over from when I was testing this class and wanted to call this function without instantiating an object. At this point, it's probably unlikely anyone else will call it independently, so we can probably remove that.

Comment on lines 329 to 332
basis_vector = np.zeros(window_length)
coeffs = np.zeros(window_length)
for i in range(window_length):
basis_vector = np.zeros(window_length)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defined basis_vector twice; first is unused.

Copy link
Contributor

@nmdefries nmdefries left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nice documentation -- appreciate the example uses!

A couple tests are failing:

FAILED test_smooth.py::TestSmoothers::test_left_gauss_linear_smoother - ValueError: The window_length must be smaller than the length of the signal.
FAILED test_smooth.py::TestSmoothers::test_causal_savgol_smoother - ValueError: The window_length must be smaller than the length of the signal.

@dshemetov
Copy link
Contributor Author

@nmdefries incorporated your suggestions, thanks!

@krivard
Copy link
Contributor

krivard commented Oct 28, 2020

I'm still showing the following failing test; do you have changes you haven't pushed up to github yet? It looks like there was conversation after your last commit.

self = <test_smooth.TestSmoothers object at 0x7fe665caeb50>

    def test_impute(self):
        # test the nan imputer
        signal = np.array([i if i % 3 else np.nan for i in range(1, 40)])
        assert np.allclose(Smoother(impute_method=None).impute(signal), signal, equal_nan=True)

        # test the zeros imputer
        signal = np.array([i if i % 3 else np.nan for i in range(1, 40)])
        assert np.allclose(
            Smoother(impute_method="zeros").impute(signal),
            np.array([i if i % 3 else 0.0 for i in range(1, 40)])
        )

        # make a signal with periodic nans to test the imputer
        signal = np.array([i if i % 3 else np.nan for i in range(1, 40)])
        # test that the non-nan values are unchanged
        not_nans_ixs = np.bitwise_xor(np.isnan(signal, where=True), np.full(len(signal), True))
        smoothed_signal = Smoother().savgol_impute(signal)
        assert np.allclose(signal[not_nans_ixs], smoothed_signal[not_nans_ixs])
        # test that the imputer is close to the true line
>       assert np.allclose(range(1, 40), smoothed_signal, atol=0.5)
E       assert False
E        +  where False = <function allclose at 0x7fe67033dee0>(range(1, 40), array([ 1.        ,  2.        ,  2.40533001,  4.        ,  5.        ,\n        6.47948675,  7.        ,  8.        , ...      , 32.        , 33.01427941, 34.        , 35.        ,\n       36.00357408, 37.        , 38.        , 38.99659138]), atol=0.5)
E        +    where <function allclose at 0x7fe67033dee0> = np.allclose
E        +    and   range(1, 40) = range(1, 40)

test_smooth.py:133: AssertionError

@dshemetov
Copy link
Contributor Author

dshemetov commented Oct 28, 2020

This is really strange... I have that test, it's passing on my end, and I have pushed everything. What else could be wrong?

@dshemetov
Copy link
Contributor Author

The commit hash I'm on locally is the same as the one here. Stumped.

* converts to numpy arrays under the hood and converts back after
* updated docstrings and tests to match
@dshemetov
Copy link
Contributor Author

While this is blocked, I added an interface for handling pd.Series. This should let indicators use the transform method and not worry about numpy conversions.

@krivard
Copy link
Contributor

krivard commented Nov 3, 2020

What version of python are you using? Prod is 3.8.2; my local is 3.8.5.

@dshemetov
Copy link
Contributor Author

I'm using 3.8.5.

@krivard
Copy link
Contributor

krivard commented Nov 3, 2020

Dmitry and I debugged this offline. Reproduced test failure on a fresh clone and fresh python environment on my machine; reproduced passing condition on a fresh clone and fresh python environment (two different kinds!) on his with matching Python version; reproduced passing condition on a fresh clone and fresh python environment on prod with indicator's Python. Prod wins; I give up. 👿 👻 👺

@krivard krivard merged commit 92a750c into main Nov 3, 2020
@krivard krivard deleted the dev-smoothing branch February 9, 2021 18:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants