-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix #12373: rolling functions raise ValueError on float32 data #12376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -275,6 +275,11 @@ def test_how_compat(self): | |||
getattr(s, t)(freq='D', **kwargs), op)(how=how) | |||
assert_series_equal(result, expected) | |||
|
|||
# GH #12373 : rolling functions error on float32 data | |||
def test_ensure_type_float64(self): | |||
series_float32 = pd.Series(np.arange(5, dtype=np.float32)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run thru lots of dtypes here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here is a list of types: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html . Probably can skip the complex and bool types.
Should we also add tests for the other functions like min()
, std()
, median()
, etc.?
@BranYang can you rebase / update. |
@jreback Sure, working on this and will update soon. |
Still working on this. Lots of tests to handle |
@jreback Just added bunch of tests to test window behavior on different dtypes. Not very sure is this a good idea because all these tests together takes too long to finish. What do you think about these added tests? Should we also tests |
@BranYang yeah don't need to add for FULL tests (e.g. the moments consistency), these are very time consuming. Good idea about how to do it, but need to inherit from a simpler function (that MomentConsistency) is a monster. |
@BranYang if you can update would be great. your fix is good, just need to reduce the testing time (e.g. can add a relatively simple set of tests, and refactor a bit later). This caused slow tests to go from 20m to 40m, not a good thing) |
@jreback Yes I am working on write a new set of tests to test rolling results on each dtype. |
@jreback Please review the tests this time. I have to comment out tests for import pandas as pd
import numpy as np
d = pd.Series(np.arange(10), dtype='M8[ns]')
d.rolling(window=2).count() # raise a TypeError Do we need count() work for datetime64/timedelta/category dtypes? |
Yeah these should all work, but I see they don't work in 0.17.1, so can prob ignore for now. so instead of always casting to
|
@jreback Not very sure if my change for And - I find that when constructing DataFrame by specifying two dtypes it will raise error: import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(10).reshape((5, 2)), dtype='M8[ns, UTC]')
# TypeError: data type not understood
df2 = pd.DataFrame(np.arange(10).reshape((5, 2)), dtype='category')
# NotImplementedError: > 1 ndim Categorical are not supported at this time So I just do not test this two cases ( Dataframe * ('M8[ns, UTC]', 'category')). |
revert your change for non numeric dtypes |
@BranYang we have a plethora of these issues already listed here: #8659 (IIRC the non-numeric handling is the first). Your tests look good now. Just remove the datetimelike/categorical stuff. FYI, you can't use a numpy constructor for pandas extension dtypes, but the pandas objects accept these happily.
|
@jreback From my point of view, since Series accept array_d1 = np.arange(5)
array_d2 = np.arange(10).reshape((5, 2))
series = pd.Series(array_d1, dtype='datetime64[ns, UTC]') # this works
df = pd.DataFrame(array_d2, dtype='datetime64[ns, UTC]') # doesn't work |
that last is a bug, can you make a separate issue. for now, let's exclude this from this PR, just need to finish this up. |
@@ -494,8 +497,31 @@ def count(self): | |||
obj = self._convert_freq() | |||
window = self._get_window() | |||
window = min(window, len(obj)) if not self.center else window | |||
|
|||
# GH #12373: rolling functions raise ValueError on float32 data | |||
# enables count for timedelta/datatime64 dtypes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove all of this for now
@jreback Please review. |
def setUp(self): | ||
self._create_data() | ||
|
||
def _cast_result(self, result, from_dtype, to_dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't need this as you are overriding this in TestDatetimelikes
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do so because I see that assertRaises
doesn't captures the error in setUp(), in which we construct the data and if the dtype is specified to 'M8[ns, UTC]'
will raise error there.
What's your suggestion to capture this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
override create_data_types
in Testdatetimelike. setUp
should NEVER raise, the point is its a known starting point. Really try to make this as simple as possible. I am happy for you to skip ALL Datetimelike test ATM.
@jreback ping |
roll = d.rolling(window=self.window) | ||
result = f(roll) | ||
if res_dtype: | ||
result = self._cast_result(result, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls take all casting out. The expected should already have the expectation of casts built in. If something is later changed, then it will obvious as the expecations are listed. They may be wrong, but that's ok for now (when I mean wrong they are not the 'correct' dtype as everything is getting casted ATM). This should be a dead simple tests. We don't want to keep expanding this, really make this much much simpler.
pls rebase on master, make fixes asap. |
@jreback rebased. please cancel the previous build because it will fail for bad code style.. |
looks good. ping when green. |
@jreback ping |
thanks @BranYang ok I added jreback@b4cd033 basically the test weren't covering datetimelikes correctly as count was implemented (but not tested), while the other functions actually (sometimes) worked, incorrectly. This generates a complete set of tests (very fast). xref to #12535 which need to update the tests to account for nan values (they work) in count (so can test for datetimelikes with nans). merging shortly. |
thanks @BranYang |
git diff upstream/master | flake8 --diff