Now boolean operators work with NDFrames? #4633

cpcloud · 2013-08-22T01:38:33Z

Awkward side effect of the new and improved Series probably, I typed something wrong and was shocked to not get the usual numpy ValueError barf:

In [57]: a, b = Series(rand(10) > 0.5), Series(rand(10) > 0.5)

In [58]: a
Out[58]:
0     True
1     True
2    False
3    False
4     True
5    False
6    False
7     True
8     True
9     True
dtype: bool

In [59]: b
Out[59]:
0     True
1     True
2     True
3     True
4    False
5     True
6    False
7     True
8     True
9     True
dtype: bool

In [60]: a and b # what?! no way!!
Out[60]:
0     True
1     True
2     True
3     True
4    False
5     True
6    False
7     True
8     True
9     True
dtype: bool

In [61]: a & b # whew
Out[61]:
0     True
1     True
2    False
3    False
4    False
5    False
6    False
7     True
8     True
9     True
dtype: bool

The text was updated successfully, but these errors were encountered:

jreback · 2013-08-22T01:53:41Z

In [11]: dfa, dfb = DataFrame(randn(10,2)),DataFrame(randn(10,2))

In [13]: dfa
Out[13]: 
          0         1
0  1.032736 -0.981451
1  0.154600  1.458225
2  0.688022  0.793732
3 -1.171007 -0.643027
4 -0.559731 -0.019199
5  1.079594 -0.549702
6 -1.803250  1.319622
7  0.602654  0.550604
8 -0.908063 -1.304829
9  0.103192  1.542427

In [14]: dfb
Out[14]: 
          0         1
0 -0.445965 -1.055636
1  0.929273  0.160086
2  0.302718 -0.653348
3 -1.227953 -1.627101
4  1.832395 -0.036561
5  0.806457  1.730636
6  1.805333 -0.299774
7  0.283562 -0.263815
8  0.749068  0.589955
9  0.523333 -2.085325

In [15]: dfa and dfb
Out[15]: 
          0         1
0 -0.445965 -1.055636
1  0.929273  0.160086
2  0.302718 -0.653348
3 -1.227953 -1.627101
4  1.832395 -0.036561
5  0.806457  1.730636
6  1.805333 -0.299774
7  0.283562 -0.263815
8  0.749068  0.589955
9  0.523333 -2.085325

hayd · 2013-08-22T11:56:05Z

This seems like a good thing. Now truthy-ness of Series/DataFrames is consistent, empty Series are Falsey.

In [6]: s1 = pd.Series([1])

In [7]: s_ = pd.Series()

In [8]: s_ or s
Out[8]: 
0    1
dtype: int64

Not sure I can use this anywhere, but I'll try.

miketkelly · 2013-08-22T22:30:53Z

Consistency aside, this seems likely to lead to lots of subtle bugs when people use df[a or b] when they mean df[a | b].

jtratner · 2013-08-23T00:32:05Z

Along the lines of @mtkni's comment : is it a good thing that Series behaves more like a Python container vs a numpy container?

hayd · 2013-08-23T00:40:02Z

Ew:

In [1]: bool(np.array([]))
Out[1]: False

In [2]: bool(np.array([1]))
Out[2]: True

In [3]: bool(np.array([1, 2]))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-5ba97928842c> in <module>()
----> 1 bool(np.array([1, 2]))

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

That said, you're probably right that and/or should raise. :(

kjordahl · 2013-08-23T02:59:21Z

You also get fun things like this:

>>> a = Series(rand(10))
>>> bool(a == 0)
True
>>> bool(a == 1)
True

I think it should raise as well. It's doing numpy-like comparisons and broadcasting, not throwing an exception can allow subtle bugs to go undetected.

Possibly related: #1069 (old but apparently reverted), #4576, #4537

jreback · 2013-08-23T03:05:59Z

this like this I think are easily caught

I had to add a couple of coercion method which can handle these cases

eg bool , int etc

kjordahl · 2013-08-23T03:12:18Z

What's the use case for allowing bool(s) to return True for any series with nonzero length? It has never been the case in previous releases, since Series inherited from numpy.ndarray. This is new behavior, and IMO it does not add anything useful, only a source of bugs.

jreback · 2013-08-23T03:17:37Z

oh I agree

however we have a couple of tests that would fail otherwise so need to fix those

iIIRC there was a couple of places this is relied upon (which is just wrong)

@cpcloud pls mark this and we can take a look

kjordahl · 2013-08-23T03:35:32Z

OK, thanks.

It looks like raising ValueError for truth tests on DataFrames (which had been introduced in PR #1073) was reverted by PR #3696. I would argue that should be restored as well.

jreback · 2013-08-23T03:45:48Z

yeh these should all be consistent
I'll take alook tomorrow

jreback · 2013-08-23T12:19:22Z

In [17]: s = Series(randn(4))

In [18]: df = DataFrame(randn(10,2))

In [19]: s_empty = Series()

In [20]: df_empty = DataFrame()

This is from 0.12

In [21]: bool(s)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [22]: bool(s_empty)
Out[22]: False

In [23]: bool(df)
Out[23]: True

In [24]: bool(df_empty)
Out[24]: False

master

In [5]: bool(s)
Out[5]: True

In [6]: bool(s_empty)
Out[6]: False

In [7]: bool(df)
Out[7]: True

In [8]: bool(df_empty)
Out[8]: False

These are the same in 0.12 and master, e.g. a 1-element Series is special cased
to evaluate to return the boolean of a single value

In [9]: bool(Series([True]))
Out[9]: True

In [10]: bool(Series([False]))
Out[10]: False

I am going to argue that 0.12 is inconsistent and master is correct

However, one could have a point of view that ALL of these should raise

any thoughts?

jreback · 2013-08-23T12:26:39Z

@wesm ?

kjordahl · 2013-08-23T12:46:09Z

pandas 0.11.0:

>>> bool(s)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

>>> bool(s_empty)
False

>>> bool(df)
ValueError: Cannot call bool() on DataFrame.

>>> bool(df_empty)
ValueError: Cannot call bool() on DataFrame.

To me, it looks like this is the right behavior (with the possible exception of bool(df_empty)).

hayd · 2013-08-23T12:53:48Z

I'm strongly of the opinion it should be all or nothing. The numpy behaviour is stupid.

jreback · 2013-08-23T17:04:40Z

Changing to this (for all NDFrame, and eliminating the special case from Series)

    def __nonzero__(self):
        raise ValueError("cannot use __nonzero__ in NDframe")
    __bool__ = __nonzero__

causes very few failures, (aside from the ones I am purposely testing), really just a couple of test cases that are not formed well

e.g. things like these, which are simply incorrect usages....will change the PR and eveyone can look

s1 = Series.....
s2 = Series....

assert s1 == s2

if s1:
    .....

jreback · 2013-08-23T17:48:00Z

this is a revert to #1073/#1069

now a call to nonzero raises TypeError ALWAYS

The following is the behavior

In [17]: s = Series(randn(4))

In [18]: df = DataFrame(randn(10,2))

In [19]: s_empty = Series()

In [20]: df_empty = DataFrame()

In [5]: bool(s)
Out[5]: TypeError: cannot use __nonzero__ in NDframe
  try using empty to evaluate whether the object is empty  or .all()/.any() for elementwise comparisons

In [6]: bool(s_empty)
Out[6]: TypeError: cannot use __nonzero__ in NDframe
  try using empty to evaluate whether the object is empty  or .all()/.any() for elementwise comparisons

In [7]: bool(df)
Out[7]: TypeError: cannot use __nonzero__ in NDframe
  try using empty to evaluate whether the object is empty  or .all()/.any() for elementwise comparisons

In [8]: bool(df_empty)
Out[8]: TypeError: cannot use __nonzero__ in NDframe
  try using empty to evaluate whether the object is empty  or .all()/.any() for elementwise comparisons

And prevents these fun ones (same for Series/Panel)

In [4]: df1 = DataFrame(np.ones((4,4)))

In [5]: df2 = DataFrame(np.zeros((4,4)))

In [6]: df1 and df2
TypeError: cannot use __nonzero__ in NDframe
  try using empty to evaluate whether the object is empty  or .all()/.any() for elementwise comparisons

In [7]: df1 or df2
TypeError: cannot use __nonzero__ in NDframe
  try using empty to evaluate whether the object is empty  or .all()/.any() for elementwise comparisons

In [8]: not df1
TypeError: cannot use __nonzero__ in NDframe
  try using empty to evaluate whether the object is empty  or .all()/.any() for elementwise comparisons

In [9]: def f():
   ...:     if df1:
   ...:         print("this is cool")
   ...:         

In [10]: f()
TypeError: cannot use __nonzero__ in NDframe
  try using empty to evaluate whether the object is empty  or .all()/.any() for elementwise comparisons

jreback · 2013-08-23T17:48:24Z

anyone think this should be a ValueError (like numpy)?

jtratner · 2013-08-23T19:21:17Z

I think you should change the error message - the numpy error message is
clearer. (and most users aren't going to understand why bool(...) or if
... ends up with something about nonzero) Whether it should be
TypeError or ValueError: I think it makes sense to follow numpy's lead
here, if only for compatibility - no strong preference for me.

jreback · 2013-08-23T19:51:10Z

@jtratner
do you want to suggest something?

numpy

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

make the same? I also want to mention empty?

jreback · 2013-08-23T19:57:20Z

revised to be the same as numpy

jreback · 2013-08-24T00:44:47Z

see revised PR for updates on the error messages

jreback mentioned this issue Aug 23, 2013

API: GH4633, bool(obj) behavior, raise on __nonzero__ always #4657

Merged

jreback closed this as completed in #4657 Aug 31, 2013

armaganthis3 mentioned this issue Apr 25, 2014

Adjust Bool evaluation for empty DataFrames to match PEP8 #6964

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Now boolean operators work with NDFrames? #4633

Now boolean operators work with NDFrames? #4633

cpcloud commented Aug 22, 2013

jreback commented Aug 22, 2013

hayd commented Aug 22, 2013

miketkelly commented Aug 22, 2013

jtratner commented Aug 23, 2013

hayd commented Aug 23, 2013

kjordahl commented Aug 23, 2013

jreback commented Aug 23, 2013

kjordahl commented Aug 23, 2013

jreback commented Aug 23, 2013

kjordahl commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

kjordahl commented Aug 23, 2013

hayd commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

jtratner commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 24, 2013

Now boolean operators work with NDFrames? #4633

Now boolean operators work with NDFrames? #4633

Comments

cpcloud commented Aug 22, 2013

jreback commented Aug 22, 2013

hayd commented Aug 22, 2013

miketkelly commented Aug 22, 2013

jtratner commented Aug 23, 2013

hayd commented Aug 23, 2013

kjordahl commented Aug 23, 2013

jreback commented Aug 23, 2013

kjordahl commented Aug 23, 2013

jreback commented Aug 23, 2013

kjordahl commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

kjordahl commented Aug 23, 2013

hayd commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

jtratner commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 23, 2013

jreback commented Aug 24, 2013