Skip to content

API: boolean dtype upsets cumsum #4170

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hayd opened this issue Jul 9, 2013 · 18 comments · Fixed by #4440
Closed

API: boolean dtype upsets cumsum #4170

hayd opened this issue Jul 9, 2013 · 18 comments · Fixed by #4440
Labels
API Design Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Milestone

Comments

@hayd
Copy link
Contributor

hayd commented Jul 9, 2013

cumsum seems to require skipna=False otherwise it sulks here. (Not investigate which others are also affected, cumprod is though).

In [10]: b = pd.Series([False, False, False, True, True, False, False])

In [11]: b
Out[11]:
0    False
1    False
2    False
3     True
4     True
5    False
6    False
dtype: bool

In [12]: b.cumsum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-f3f684a93525> in <module>()
----> 1 b.cumsum()

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.pyc in cumsum(self, axis, dtype, out, skipna)
   1626
   1627         if do_mask:
-> 1628             np.putmask(result, mask, pa.NA)
   1629
   1630         return Series(result, index=self.index)

ValueError: cannot convert float NaN to integer

In [13]: b.cumsum(skipna=False)
Out[13]:
0    0
1    0
2    0
3    1
4    2
5    2
6    2
dtype: int64

If it has nans or you int or object it works as expected:

In [21]: b.astype(int).cumsum()
In [22]: b.astype(object).cumsum()  # False at the beginning is expected
In [23]: b.astype(int).astype(object).cumsum()

Also, if you try and inset an nan it doesn't work nor raise (!):

In [31]: b.loc[0] = np.nan

In [32]: b
Out[32]:
0     True
1    False
2    False
3     True
4     True
5    False
6    False
dtype: bool
@cpcloud
Copy link
Member

cpcloud commented Aug 2, 2013

silently converts nan to True, i guess that's to preserve dtypes...is it the convention to treat nan as "truthy"?

@hayd
Copy link
Contributor Author

hayd commented Aug 2, 2013

seems the opposite of what you would expect...

In [1]: if np.nan: print "thruthy"
thruthy

@hayd
Copy link
Contributor Author

hayd commented Aug 2, 2013

This is the way python's nan works, so I think it's not numpy's fault:

In [2]: bool(float('nan'))
Out[2]: True

strange..

@jreback
Copy link
Contributor

jreback commented Aug 2, 2013

if there are nan in the series, this should be object type and should skip the nan, can you put a test in for this? (I am -1 for converting nan to True btw)....nothing should do this explicity I think

@hayd
Copy link
Contributor Author

hayd commented Aug 2, 2013

[31] should raise like it does when you try to insert NaN into int, which gives:

ValueError: cannot convert float NaN to integer

(perhaps that should be upcast though...)

@jreback
Copy link
Contributor

jreback commented Aug 2, 2013

can you post an example with nan in it?

@hayd
Copy link
Contributor Author

hayd commented Aug 2, 2013

Do you mean for int:

In [70]: s = pd.Series([1, 2])

In [71]: s.loc[0] = np.nan
ValueError: cannot convert float NaN to integer

and for bool

In [76]: s = pd.Series([False, True])

In [77]: s.loc[0] = np.nan

In [78]: s
Out[78]: 
0    True
1    True
dtype: bool

@jreback
Copy link
Contributor

jreback commented Aug 2, 2013

both of these need to wait for #3482 as this is very tricky to fix in the current implementation. in essence you need in-place dtype changing of a numpy array where the actual itemsize of the dtype changes...(some cases this works, but others it doesn't)

@hayd
Copy link
Contributor Author

hayd commented Aug 2, 2013

Is that nearly mergeable? :)

@jreback
Copy link
Contributor

jreback commented Aug 2, 2013

yes
just waiting for wesm to have a look
soon

@cpcloud
Copy link
Member

cpcloud commented Aug 2, 2013

(not) casting on assignment tho is separate from this. this just fixes the 2 methods

@jreback is it even possible (without resorting to trickery) to do an in-place update of an ndarray's dtype??

i can put in a test for bool/object series with nan

@jreback
Copy link
Contributor

jreback commented Aug 2, 2013

ok

maybe wait for #3482

@cpcloud
Copy link
Member

cpcloud commented Aug 2, 2013

for sure

@hayd
Copy link
Contributor Author

hayd commented Aug 2, 2013

@cpcloud do you want me to make a separate issue for that? I tagged it cheekily on the end... my bad

@cpcloud
Copy link
Member

cpcloud commented Aug 2, 2013

@hayd 😄 no problem! that would be great.

@cpcloud
Copy link
Member

cpcloud commented Aug 3, 2013

fwiw i think this particular issue cumsum/cumprod can be fixed before #3482...the conversion issue is separate....since i'm checking for bool dtype here...unless i'm missing something

@jreback
Copy link
Contributor

jreback commented Aug 3, 2013

yes that is right

I was talking about in place dtype conversions

@wesm
Copy link
Member

wesm commented Aug 5, 2013

I'm not sure that in-place dtype conversions are really the answer. I guess we just need to tackle the NA problem one of these days. I have some ideas but it's a big project and I personally won't have the resources for it for some time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Dtype Conversions Unexpected or buggy dtype conversions Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants