Skip to content

Fix #12373: rolling functions raise ValueError on float32 data #12376

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

BranYang
Copy link
Contributor

@@ -275,6 +275,11 @@ def test_how_compat(self):
getattr(s, t)(freq='D', **kwargs), op)(how=how)
assert_series_equal(result, expected)

# GH #12373 : rolling functions error on float32 data
def test_ensure_type_float64(self):
series_float32 = pd.Series(np.arange(5, dtype=np.float32))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run thru lots of dtypes here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here is a list of types: http://docs.scipy.org/doc/numpy-1.10.1/user/basics.types.html . Probably can skip the complex and bool types.

Should we also add tests for the other functions like min(), std(), median(), etc.?

@jreback jreback added Bug Stats Dtype Conversions Unexpected or buggy dtype conversions labels Feb 18, 2016
@jreback jreback added this to the 0.18.0 milestone Feb 18, 2016
@jreback
Copy link
Contributor

jreback commented Feb 22, 2016

@BranYang can you rebase / update.

@BranYang
Copy link
Contributor Author

@jreback Sure, working on this and will update soon.

@BranYang
Copy link
Contributor Author

Still working on this. Lots of tests to handle

@BranYang
Copy link
Contributor Author

@jreback Just added bunch of tests to test window behavior on different dtypes.

Not very sure is this a good idea because all these tests together takes too long to finish.

What do you think about these added tests? Should we also tests M8[ns],m8[ns],datetime64[ns, UTC],category like this or should I just write several new cases?

@jreback
Copy link
Contributor

jreback commented Feb 27, 2016

@BranYang yeah don't need to add for FULL tests (e.g. the moments consistency), these are very time consuming. Good idea about how to do it, but need to inherit from a simpler function (that MomentConsistency) is a monster.

@jreback
Copy link
Contributor

jreback commented Feb 29, 2016

@BranYang if you can update would be great. your fix is good, just need to reduce the testing time (e.g. can add a relatively simple set of tests, and refactor a bit later). This caused slow tests to go from 20m to 40m, not a good thing)

@BranYang
Copy link
Contributor Author

BranYang commented Mar 1, 2016

@jreback Yes I am working on write a new set of tests to test rolling results on each dtype.

@jreback
Copy link
Contributor

jreback commented Mar 1, 2016

thanks @BranYang ok, your fix is fine. lmk about putting in simpler tests (for now), can do a refactor later, but going to need this for 0.18.0 (hopefully going to cut a 0.18.0rc2 at the end of the week).

#12481 is the ultimate way to go, but a little more tricky so going to hold off on that a bit.

@BranYang
Copy link
Contributor Author

BranYang commented Mar 1, 2016

@jreback Please review the tests this time.

I have to comment out tests for M8[ns],m8[ns],datetime64[ns, UTC] because call count on them will raise a type error:

import pandas as pd
import numpy as np
d = pd.Series(np.arange(10), dtype='M8[ns]')
d.rolling(window=2).count() # raise a TypeError

Do we need count() work for datetime64/timedelta/category dtypes?

@jreback
Copy link
Contributor

jreback commented Mar 1, 2016

Yeah these should all work, but I see they don't work in 0.17.1, so can prob ignore for now.

so instead of always casting to float, check for com.needs_i8_conversion() if so, then do this. (we will need some more sophisticated versions of this (as some of these ops need to cast back), but let's at least have count working. (this is sort of easy so you could add this in if you want).

In [4]: d
Out[4]: 
0   1970-01-01 00:00:00.000000000
1   1970-01-01 00:00:00.000000001
2   1970-01-01 00:00:00.000000002
3   1970-01-01 00:00:00.000000003
4   1970-01-01 00:00:00.000000004
5   1970-01-01 00:00:00.000000005
6   1970-01-01 00:00:00.000000006
7   1970-01-01 00:00:00.000000007
8   1970-01-01 00:00:00.000000008
9   1970-01-01 00:00:00.000000009
dtype: datetime64[ns]

In [5]: Series(d.view('i8').astype('float64')).rolling(window=2).count()
Out[5]: 
0    1.0
1    2.0
2    2.0
3    2.0
4    2.0
5    2.0
6    2.0
7    2.0
8    2.0
9    2.0
dtype: float64

@BranYang
Copy link
Contributor Author

BranYang commented Mar 2, 2016

@jreback Not very sure if my change for count is the best way to make it work. Waiting for comments.

And - I find that when constructing DataFrame by specifying two dtypes it will raise error:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.arange(10).reshape((5, 2)), dtype='M8[ns, UTC]')
# TypeError: data type not understood
df2 = pd.DataFrame(np.arange(10).reshape((5, 2)), dtype='category')
# NotImplementedError: > 1 ndim Categorical are not supported at this time

So I just do not test this two cases ( Dataframe * ('M8[ns, UTC]', 'category')).

@jreback
Copy link
Contributor

jreback commented Mar 2, 2016

revert your change for non numeric dtypes
this is another issue and will require some care

@jreback
Copy link
Contributor

jreback commented Mar 2, 2016

@BranYang we have a plethora of these issues already listed here: #8659 (IIRC the non-numeric handling is the first). Your tests look good now. Just remove the datetimelike/categorical stuff.

FYI, you can't use a numpy constructor for pandas extension dtypes, but the pandas objects accept these happily.

In [4]: Series(np.arange(5),dtype='datetime64[ns, UTC]')
Out[4]: 
0             1970-01-01 00:00:00+00:00
1   1970-01-01 00:00:00.000000001+00:00
2   1970-01-01 00:00:00.000000002+00:00
3   1970-01-01 00:00:00.000000003+00:00
4   1970-01-01 00:00:00.000000004+00:00
dtype: datetime64[ns, UTC]

@BranYang
Copy link
Contributor Author

BranYang commented Mar 2, 2016

@jreback From my point of view, since Series accept dtype='datetime64[ns, UTC]', why can't DataFrame accecpt it as well?

array_d1 = np.arange(5)
array_d2 = np.arange(10).reshape((5, 2))
series = pd.Series(array_d1, dtype='datetime64[ns, UTC]')  # this works
df = pd.DataFrame(array_d2, dtype='datetime64[ns, UTC]') # doesn't work

@jreback
Copy link
Contributor

jreback commented Mar 2, 2016

that last is a bug, can you make a separate issue.

for now, let's exclude this from this PR, just need to finish this up.

@@ -494,8 +497,31 @@ def count(self):
obj = self._convert_freq()
window = self._get_window()
window = min(window, len(obj)) if not self.center else window

# GH #12373: rolling functions raise ValueError on float32 data
# enables count for timedelta/datatime64 dtypes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove all of this for now

@BranYang
Copy link
Contributor Author

BranYang commented Mar 2, 2016

@jreback Please review.

def setUp(self):
self._create_data()

def _cast_result(self, result, from_dtype, to_dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't need this as you are overriding this in TestDatetimelikes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do so because I see that assertRaises doesn't captures the error in setUp(), in which we construct the data and if the dtype is specified to 'M8[ns, UTC]' will raise error there.

What's your suggestion to capture this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

override create_data_types in Testdatetimelike. setUp should NEVER raise, the point is its a known starting point. Really try to make this as simple as possible. I am happy for you to skip ALL Datetimelike test ATM.

@BranYang
Copy link
Contributor Author

BranYang commented Mar 4, 2016

@jreback ping

roll = d.rolling(window=self.window)
result = f(roll)
if res_dtype:
result = self._cast_result(result,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls take all casting out. The expected should already have the expectation of casts built in. If something is later changed, then it will obvious as the expecations are listed. They may be wrong, but that's ok for now (when I mean wrong they are not the 'correct' dtype as everything is getting casted ATM). This should be a dead simple tests. We don't want to keep expanding this, really make this much much simpler.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2016

pls rebase on master, make fixes asap.

@BranYang
Copy link
Contributor Author

BranYang commented Mar 5, 2016

@jreback rebased. please cancel the previous build because it will fail for bad code style..

@jreback
Copy link
Contributor

jreback commented Mar 5, 2016

looks good. ping when green.

jreback pushed a commit to jreback/pandas that referenced this pull request Mar 5, 2016
@BranYang
Copy link
Contributor Author

BranYang commented Mar 5, 2016

@jreback ping

@jreback
Copy link
Contributor

jreback commented Mar 6, 2016

thanks @BranYang

ok I added jreback@b4cd033

basically the test weren't covering datetimelikes correctly as count was implemented (but not tested), while the other functions actually (sometimes) worked, incorrectly. This generates a complete set of tests (very fast). xref to #12535 which need to update the tests to account for nan values (they work) in count (so can test for datetimelikes with nans).

merging shortly.

@jreback jreback closed this in 3d70be7 Mar 6, 2016
@jreback
Copy link
Contributor

jreback commented Mar 6, 2016

thanks @BranYang

@BranYang BranYang deleted the cast branch March 6, 2016 01:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: rolling functions raise ValueError on float32 data
3 participants