-
Notifications
You must be signed in to change notification settings - Fork 16
Add the smoothing utility #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Just to confirm: @krivard I recall you suggested I start by integrating into |
I think this utility can contain its default values in its implementation in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointed out some dangling TODOs but no substantive problems. We should probably get @statsmaths to do a technical review as a cross-check.
* implements the existing smoothers found in the indicators (window average, gaussian local linear regression) * adds the more general Savitzky-Golay filter as 'savgol' with gaussian weighting * adds imputation options * adds tests to match
aab94b9
to
3912f4c
Compare
methods and provides reasonable defaults. Basic usage can be found in the examples below. | ||
|
||
The smoother function takes numpy arrays as input, expecting the values to come from a | ||
regularly-spaced time grid. NANs are ok, as long as the array does not begin with a NAN. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you expecting to see time series beginning with NaN? If so, it might be worth explicitly trimming them off rather than worrying about how they might cause problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have an error if the series begins with a NaN. I wanted the smooth methods to preserve the length of the array, expecting the user to insert the output back into a dataframe with a fixed index.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line 415. Though now I think it makes sense to pull that check out to __init__
.
* move smoother first value nan check to impute, so it's less buried * clean documentation in a few functions * remove @classmethod from savgol_coeffs * fix breaking tests
@classmethod | ||
def savgol_coeffs(cls, nl, nr, poly_fit_degree, gaussian_bandwidth=100): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't see that this function is ever used as a class method (although I'm not familiar with this functionality). If you end up removing the @classmethod
, this method can use class attributes for poly_fit_degree
and gaussian_bandwidth
.
The gaussian_bandwidth
default here is different than the default in __init__()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching the different default. I think the @classmethod
is left over from when I was testing this class and wanted to call this function without instantiating an object. At this point, it's probably unlikely anyone else will call it independently, so we can probably remove that.
basis_vector = np.zeros(window_length) | ||
coeffs = np.zeros(window_length) | ||
for i in range(window_length): | ||
basis_vector = np.zeros(window_length) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Defined basis_vector twice; first is unused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Nice documentation -- appreciate the example uses!
A couple tests are failing:
FAILED test_smooth.py::TestSmoothers::test_left_gauss_linear_smoother - ValueError: The window_length must be smaller than the length of the signal.
FAILED test_smooth.py::TestSmoothers::test_causal_savgol_smoother - ValueError: The window_length must be smaller than the length of the signal.
@nmdefries incorporated your suggestions, thanks! |
I'm still showing the following failing test; do you have changes you haven't pushed up to github yet? It looks like there was conversation after your last commit.
|
This is really strange... I have that test, it's passing on my end, and I have pushed everything. What else could be wrong? |
The commit hash I'm on locally is the same as the one here. Stumped. |
* converts to numpy arrays under the hood and converts back after * updated docstrings and tests to match
While this is blocked, I added an interface for handling pd.Series. This should let indicators use the transform method and not worry about numpy conversions. |
What version of python are you using? Prod is 3.8.2; my local is 3.8.5. |
I'm using 3.8.5. |
Dmitry and I debugged this offline. Reproduced test failure on a fresh clone and fresh python environment on my machine; reproduced passing condition on a fresh clone and fresh python environment (two different kinds!) on his with matching Python version; reproduced passing condition on a fresh clone and fresh python environment on prod with |
This PR adds the smoothing utility and its tests as part of #171. The utility has not replaced the previous smoother implemetations in the rest of the code base, as I received the suggestion to do those in separate PRs.