API: boolean dtype upsets cumsum #4170

hayd · 2013-07-09T13:33:48Z

cumsum seems to require skipna=False otherwise it sulks here. (Not investigate which others are also affected, cumprod is though).

In [10]: b = pd.Series([False, False, False, True, True, False, False])

In [11]: b
Out[11]:
0    False
1    False
2    False
3     True
4     True
5    False
6    False
dtype: bool

In [12]: b.cumsum()
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-12-f3f684a93525> in <module>()
----> 1 b.cumsum()

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pandas/core/series.pyc in cumsum(self, axis, dtype, out, skipna)
   1626
   1627         if do_mask:
-> 1628             np.putmask(result, mask, pa.NA)
   1629
   1630         return Series(result, index=self.index)

ValueError: cannot convert float NaN to integer

In [13]: b.cumsum(skipna=False)
Out[13]:
0    0
1    0
2    0
3    1
4    2
5    2
6    2
dtype: int64

If it has nans or you int or object it works as expected:

In [21]: b.astype(int).cumsum()
In [22]: b.astype(object).cumsum()  # False at the beginning is expected
In [23]: b.astype(int).astype(object).cumsum()

Also, if you try and inset an nan it doesn't work nor raise (!):

In [31]: b.loc[0] = np.nan

In [32]: b
Out[32]:
0     True
1    False
2    False
3     True
4     True
5    False
6    False
dtype: bool

The text was updated successfully, but these errors were encountered:

cpcloud · 2013-08-02T04:18:25Z

silently converts nan to True, i guess that's to preserve dtypes...is it the convention to treat nan as "truthy"?

hayd · 2013-08-02T09:25:37Z

seems the opposite of what you would expect...

In [1]: if np.nan: print "thruthy"
thruthy

hayd · 2013-08-02T09:29:30Z

This is the way python's nan works, so I think it's not numpy's fault:

In [2]: bool(float('nan'))
Out[2]: True

strange..

jreback · 2013-08-02T12:18:40Z

if there are nan in the series, this should be object type and should skip the nan, can you put a test in for this? (I am -1 for converting nan to True btw)....nothing should do this explicity I think

hayd · 2013-08-02T12:52:47Z

[31] should raise like it does when you try to insert NaN into int, which gives:

ValueError: cannot convert float NaN to integer

(perhaps that should be upcast though...)

jreback · 2013-08-02T12:59:51Z

can you post an example with nan in it?

hayd · 2013-08-02T13:09:12Z

Do you mean for int:

In [70]: s = pd.Series([1, 2])

In [71]: s.loc[0] = np.nan
ValueError: cannot convert float NaN to integer

and for bool

In [76]: s = pd.Series([False, True])

In [77]: s.loc[0] = np.nan

In [78]: s
Out[78]: 
0    True
1    True
dtype: bool

jreback · 2013-08-02T13:57:46Z

both of these need to wait for #3482 as this is very tricky to fix in the current implementation. in essence you need in-place dtype changing of a numpy array where the actual itemsize of the dtype changes...(some cases this works, but others it doesn't)

hayd · 2013-08-02T14:10:59Z

Is that nearly mergeable? :)

jreback · 2013-08-02T14:18:02Z

yes
just waiting for wesm to have a look
soon

cpcloud · 2013-08-02T14:33:06Z

(not) casting on assignment tho is separate from this. this just fixes the 2 methods

@jreback is it even possible (without resorting to trickery) to do an in-place update of an ndarray's dtype??

i can put in a test for bool/object series with nan

jreback · 2013-08-02T14:45:33Z

ok

maybe wait for #3482

cpcloud · 2013-08-02T14:49:20Z

for sure

hayd · 2013-08-02T14:52:11Z

@cpcloud do you want me to make a separate issue for that? I tagged it cheekily on the end... my bad

cpcloud · 2013-08-02T14:52:56Z

@hayd 😄 no problem! that would be great.

cpcloud · 2013-08-03T00:14:02Z

fwiw i think this particular issue cumsum/cumprod can be fixed before #3482...the conversion issue is separate....since i'm checking for bool dtype here...unless i'm missing something

jreback · 2013-08-03T00:15:45Z

yes that is right

I was talking about in place dtype conversions

wesm · 2013-08-05T20:49:27Z

I'm not sure that in-place dtype conversions are really the answer. I guess we just need to tackle the NA problem one of these days. I have some ideas but it's a big project and I personally won't have the resources for it for some time.

cpcloud mentioned this issue Aug 2, 2013

BUG: allow cumprod and cumsum to work with bool dtypes #4440

Merged

ghost assigned cpcloud Aug 2, 2013

hayd closed this as completed in #4440 Aug 7, 2013

hayd mentioned this issue Aug 7, 2013

BUG: changing series dtype inplace #4463

Closed

wesm unassigned cpcloud Oct 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: boolean dtype upsets cumsum #4170

API: boolean dtype upsets cumsum #4170

hayd commented Jul 9, 2013

cpcloud commented Aug 2, 2013

hayd commented Aug 2, 2013

hayd commented Aug 2, 2013

jreback commented Aug 2, 2013

hayd commented Aug 2, 2013

jreback commented Aug 2, 2013

hayd commented Aug 2, 2013

jreback commented Aug 2, 2013

hayd commented Aug 2, 2013

jreback commented Aug 2, 2013

cpcloud commented Aug 2, 2013

jreback commented Aug 2, 2013

cpcloud commented Aug 2, 2013

hayd commented Aug 2, 2013

cpcloud commented Aug 2, 2013

cpcloud commented Aug 3, 2013

jreback commented Aug 3, 2013

wesm commented Aug 5, 2013

API: boolean dtype upsets cumsum #4170

API: boolean dtype upsets cumsum #4170

Comments

hayd commented Jul 9, 2013

cpcloud commented Aug 2, 2013

hayd commented Aug 2, 2013

hayd commented Aug 2, 2013

jreback commented Aug 2, 2013

hayd commented Aug 2, 2013

jreback commented Aug 2, 2013

hayd commented Aug 2, 2013

jreback commented Aug 2, 2013

hayd commented Aug 2, 2013

jreback commented Aug 2, 2013

cpcloud commented Aug 2, 2013

jreback commented Aug 2, 2013

cpcloud commented Aug 2, 2013

hayd commented Aug 2, 2013

cpcloud commented Aug 2, 2013

cpcloud commented Aug 3, 2013

jreback commented Aug 3, 2013

wesm commented Aug 5, 2013