Fix #12373: rolling functions raise ValueError on float32 data #12376

BranYang · 2016-02-18T04:52:09Z

closes BUG: rolling functions raise ValueError on float32 data #12373
tests added
passes git diff upstream/master | flake8 --diff
whatsnew entry added
add test for various other dtypes

jreback · 2016-02-18T04:53:56Z

pandas/tests/test_window.py

@@ -275,6 +275,11 @@ def test_how_compat(self):
                            getattr(s, t)(freq='D', **kwargs), op)(how=how)
                        assert_series_equal(result, expected)

+    # GH #12373 : rolling functions error on float32 data
+    def test_ensure_type_float64(self):
+        series_float32 = pd.Series(np.arange(5, dtype=np.float32))


run thru lots of dtypes here

here is a list of types: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html . Probably can skip the complex and bool types.

Should we also add tests for the other functions like min(), std(), median(), etc.?

jreback · 2016-02-22T02:05:08Z

@BranYang can you rebase / update.

BranYang · 2016-02-22T02:55:58Z

@jreback Sure, working on this and will update soon.

BranYang · 2016-02-23T08:00:27Z

Still working on this. Lots of tests to handle

BranYang · 2016-02-26T17:03:57Z

@jreback Just added bunch of tests to test window behavior on different dtypes.

Not very sure is this a good idea because all these tests together takes too long to finish.

What do you think about these added tests? Should we also tests M8[ns],m8[ns],datetime64[ns, UTC],category like this or should I just write several new cases?

jreback · 2016-02-27T15:03:12Z

@BranYang yeah don't need to add for FULL tests (e.g. the moments consistency), these are very time consuming. Good idea about how to do it, but need to inherit from a simpler function (that MomentConsistency) is a monster.

jreback · 2016-02-29T12:38:51Z

@BranYang if you can update would be great. your fix is good, just need to reduce the testing time (e.g. can add a relatively simple set of tests, and refactor a bit later). This caused slow tests to go from 20m to 40m, not a good thing)

BranYang · 2016-03-01T01:31:00Z

@jreback Yes I am working on write a new set of tests to test rolling results on each dtype.

jreback · 2016-03-01T12:49:04Z

thanks @BranYang ok, your fix is fine. lmk about putting in simpler tests (for now), can do a refactor later, but going to need this for 0.18.0 (hopefully going to cut a 0.18.0rc2 at the end of the week).

#12481 is the ultimate way to go, but a little more tricky so going to hold off on that a bit.

BranYang · 2016-03-01T18:06:46Z

@jreback Please review the tests this time.

I have to comment out tests for M8[ns],m8[ns],datetime64[ns, UTC] because call count on them will raise a type error:

import pandas as pd
import numpy as np
d = pd.Series(np.arange(10), dtype='M8[ns]')
d.rolling(window=2).count() # raise a TypeError

Do we need count() work for datetime64/timedelta/category dtypes?

jreback · 2016-03-01T19:34:39Z

Yeah these should all work, but I see they don't work in 0.17.1, so can prob ignore for now.

so instead of always casting to float, check for com.needs_i8_conversion() if so, then do this. (we will need some more sophisticated versions of this (as some of these ops need to cast back), but let's at least have count working. (this is sort of easy so you could add this in if you want).

In [4]: d
Out[4]: 
0   1970-01-01 00:00:00.000000000
1   1970-01-01 00:00:00.000000001
2   1970-01-01 00:00:00.000000002
3   1970-01-01 00:00:00.000000003
4   1970-01-01 00:00:00.000000004
5   1970-01-01 00:00:00.000000005
6   1970-01-01 00:00:00.000000006
7   1970-01-01 00:00:00.000000007
8   1970-01-01 00:00:00.000000008
9   1970-01-01 00:00:00.000000009
dtype: datetime64[ns]

In [5]: Series(d.view('i8').astype('float64')).rolling(window=2).count()
Out[5]: 
0    1.0
1    2.0
2    2.0
3    2.0
4    2.0
5    2.0
6    2.0
7    2.0
8    2.0
9    2.0
dtype: float64

BranYang · 2016-03-02T09:19:47Z

@jreback Not very sure if my change for count is the best way to make it work. Waiting for comments.

And - I find that when constructing DataFrame by specifying two dtypes it will raise error:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(10).reshape((5, 2)), dtype='M8[ns, UTC]')
# TypeError: data type not understood
df2 = pd.DataFrame(np.arange(10).reshape((5, 2)), dtype='category')
# NotImplementedError: > 1 ndim Categorical are not supported at this time

So I just do not test this two cases ( Dataframe * ('M8[ns, UTC]', 'category')).

jreback · 2016-03-02T11:07:14Z

revert your change for non numeric dtypes
this is another issue and will require some care

jreback · 2016-03-02T12:57:19Z

@BranYang we have a plethora of these issues already listed here: #8659 (IIRC the non-numeric handling is the first). Your tests look good now. Just remove the datetimelike/categorical stuff.

FYI, you can't use a numpy constructor for pandas extension dtypes, but the pandas objects accept these happily.

In [4]: Series(np.arange(5),dtype='datetime64[ns, UTC]')
Out[4]: 
0             1970-01-01 00:00:00+00:00
1   1970-01-01 00:00:00.000000001+00:00
2   1970-01-01 00:00:00.000000002+00:00
3   1970-01-01 00:00:00.000000003+00:00
4   1970-01-01 00:00:00.000000004+00:00
dtype: datetime64[ns, UTC]

BranYang · 2016-03-02T14:27:19Z

@jreback From my point of view, since Series accept dtype='datetime64[ns, UTC]', why can't DataFrame accecpt it as well?

array_d1 = np.arange(5)
array_d2 = np.arange(10).reshape((5, 2))
series = pd.Series(array_d1, dtype='datetime64[ns, UTC]')  # this works
df = pd.DataFrame(array_d2, dtype='datetime64[ns, UTC]') # doesn't work

jreback · 2016-03-02T14:29:18Z

that last is a bug, can you make a separate issue.

for now, let's exclude this from this PR, just need to finish this up.

jreback · 2016-03-02T14:29:34Z

pandas/core/window.py

@@ -494,8 +497,31 @@ def count(self):
        obj = self._convert_freq()
        window = self._get_window()
        window = min(window, len(obj)) if not self.center else window
+
+        # GH #12373: rolling functions raise ValueError on float32 data
+        # enables count for timedelta/datatime64 dtypes


remove all of this for now

BranYang · 2016-03-02T16:13:01Z

@jreback Please review.

jreback · 2016-03-03T02:21:57Z

pandas/tests/test_window.py

+    def setUp(self):
+        self._create_data()
+
+    def _cast_result(self, result, from_dtype, to_dtype):


shouldn't need this as you are overriding this in TestDatetimelikes.

I do so because I see that assertRaises doesn't captures the error in setUp(), in which we construct the data and if the dtype is specified to 'M8[ns, UTC]' will raise error there.

What's your suggestion to capture this?

override create_data_types in Testdatetimelike. setUp should NEVER raise, the point is its a known starting point. Really try to make this as simple as possible. I am happy for you to skip ALL Datetimelike test ATM.

BranYang · 2016-03-04T01:18:26Z

@jreback ping

jreback · 2016-03-05T15:21:43Z

pandas/tests/test_window.py

+            roll = d.rolling(window=self.window)
+            result = f(roll)
+            if res_dtype:
+                result = self._cast_result(result,


pls take all casting out. The expected should already have the expectation of casts built in. If something is later changed, then it will obvious as the expecations are listed. They may be wrong, but that's ok for now (when I mean wrong they are not the 'correct' dtype as everything is getting casted ATM). This should be a dead simple tests. We don't want to keep expanding this, really make this much much simpler.

jreback · 2016-03-05T15:22:06Z

pls rebase on master, make fixes asap.

BranYang · 2016-03-05T17:08:39Z

@jreback rebased. please cancel the previous build because it will fail for bad code style..

jreback · 2016-03-05T18:50:01Z

looks good. ping when green.

closes pandas-dev#12373 closes pandas-dev#12376

BranYang · 2016-03-05T23:47:28Z

@jreback ping

jreback · 2016-03-06T00:07:09Z

thanks @BranYang

ok I added jreback@b4cd033

basically the test weren't covering datetimelikes correctly as count was implemented (but not tested), while the other functions actually (sometimes) worked, incorrectly. This generates a complete set of tests (very fast). xref to #12535 which need to update the tests to account for nan values (they work) in count (so can test for datetimelikes with nans).

merging shortly.

jreback · 2016-03-06T00:46:01Z

thanks @BranYang

jreback reviewed Feb 18, 2016
View reviewed changes

jreback added Bug Stats Dtype Conversions Unexpected or buggy dtype conversions labels Feb 18, 2016

jreback added this to the 0.18.0 milestone Feb 18, 2016

jreback mentioned this pull request Feb 27, 2016

BUG: support fused types in roll_min/max #12373 #12481

Closed

jreback reviewed Mar 2, 2016
View reviewed changes

jreback reviewed Mar 3, 2016
View reviewed changes

BranYang added 3 commits March 5, 2016 23:17

Fix #12373: rolling functions raise ValueError on float32 data

73f8823

reverse changes for count, refactor datatimelike tests

9a17990

skip TestDatetimeLikeDtype tests.

d20adfa

jreback reviewed Mar 5, 2016
View reviewed changes

BranYang added 2 commits March 6, 2016 00:24

remove type cast in tests

d26a680

style change

9241d76

jreback pushed a commit to jreback/pandas that referenced this pull request Mar 5, 2016

BUG: rolling functions raise ValueError on float32 data

254cedb

closes pandas-dev#12373 closes pandas-dev#12376

jreback mentioned this pull request Mar 5, 2016

TST: test_window/Datetimelike adjustments #12535

Closed

jreback closed this in 3d70be7 Mar 6, 2016

BranYang deleted the cast branch March 6, 2016 01:11

Uh oh!

Fix #12373: rolling functions raise ValueError on float32 data #12376

Fix #12373: rolling functions raise ValueError on float32 data #12376

Uh oh!

Conversation

BranYang commented Feb 18, 2016

Uh oh!

jreback Feb 18, 2016

Choose a reason for hiding this comment

Uh oh!

jennolsen84 Feb 18, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Feb 22, 2016

Uh oh!

BranYang commented Feb 22, 2016

Uh oh!

BranYang commented Feb 23, 2016

Uh oh!

BranYang commented Feb 26, 2016

Uh oh!

jreback commented Feb 27, 2016

Uh oh!

jreback commented Feb 29, 2016

Uh oh!

BranYang commented Mar 1, 2016

Uh oh!

jreback commented Mar 1, 2016

Uh oh!

BranYang commented Mar 1, 2016

Uh oh!

jreback commented Mar 1, 2016

Uh oh!

BranYang commented Mar 2, 2016

Uh oh!

jreback commented Mar 2, 2016

Uh oh!

jreback commented Mar 2, 2016

Uh oh!

BranYang commented Mar 2, 2016

Uh oh!

jreback commented Mar 2, 2016

Uh oh!

jreback Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

BranYang commented Mar 2, 2016

Uh oh!

jreback Mar 3, 2016

Choose a reason for hiding this comment

Uh oh!

BranYang Mar 3, 2016

Choose a reason for hiding this comment

Uh oh!

jreback Mar 3, 2016

Choose a reason for hiding this comment

Uh oh!

BranYang commented Mar 4, 2016

Uh oh!

jreback Mar 5, 2016

Choose a reason for hiding this comment

Uh oh!

jreback commented Mar 5, 2016

Uh oh!

BranYang commented Mar 5, 2016

Uh oh!

jreback commented Mar 5, 2016

Uh oh!

BranYang commented Mar 5, 2016

Uh oh!

jreback commented Mar 6, 2016

Uh oh!

jreback commented Mar 6, 2016

Uh oh!

Uh oh!