Skip to content

Commit 5148e90

Browse files
committed
Merge pull request #4657 from jreback/bool
API: GH4633, bool(obj) behavior, raise on __nonzero__ always
2 parents 03aa067 + e06d7a8 commit 5148e90

File tree

13 files changed

+200
-31
lines changed

13 files changed

+200
-31
lines changed

doc/source/10min.rst

+17-1
Original file line numberDiff line numberDiff line change
@@ -269,7 +269,6 @@ A ``where`` operation for getting.
269269
270270
df[df > 0]
271271
272-
273272
Setting
274273
~~~~~~~
275274

@@ -708,3 +707,20 @@ Reading from an excel file
708707
:suppress:
709708
710709
os.remove('foo.xlsx')
710+
711+
Gotchas
712+
-------
713+
714+
If you are trying an operation and you see an exception like:
715+
716+
.. code-block:: python
717+
718+
>>> if pd.Series([False, True, False]):
719+
print("I was true")
720+
Traceback
721+
...
722+
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
723+
724+
See :ref:`Comparisons<basics.compare>` for an explanation and what to do.
725+
726+
See :ref:`Gotachas<gotchas>` as well.

doc/source/basics.rst

+48-2
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88
from pandas import *
99
randn = np.random.randn
1010
np.set_printoptions(precision=4, suppress=True)
11-
from pandas.compat import lrange
11+
from pandas.compat import lrange
1212
1313
==============================
1414
Essential Basic Functionality
@@ -198,16 +198,62 @@ replace NaN with some other value using ``fillna`` if you wish).
198198
199199
Flexible Comparisons
200200
~~~~~~~~~~~~~~~~~~~~
201+
202+
.. _basics.compare:
203+
201204
Starting in v0.8, pandas introduced binary comparison methods eq, ne, lt, gt,
202205
le, and ge to Series and DataFrame whose behavior is analogous to the binary
203206
arithmetic operations described above:
204207

205208
.. ipython:: python
206209
207210
df.gt(df2)
208-
209211
df2.ne(df)
210212
213+
These operations produce a pandas object the same type as the left-hand-side input
214+
that if of dtype ``bool``. These ``boolean`` objects can be used in indexing operations,
215+
see :ref:`here<indexing.boolean>`
216+
217+
Furthermore, you can apply the reduction functions: ``any()`` and ``all()`` to provide a
218+
way to summarize these results.
219+
220+
.. ipython:: python
221+
222+
(df>0).all()
223+
(df>0).any()
224+
225+
Finally you can test if a pandas object is empty, via the ``empty`` property.
226+
227+
.. ipython:: python
228+
229+
df.empty
230+
DataFrame(columns=list('ABC')).empty
231+
232+
.. warning::
233+
234+
You might be tempted to do the following:
235+
236+
.. code-block:: python
237+
238+
>>>if df:
239+
...
240+
241+
Or
242+
243+
.. code-block:: python
244+
245+
>>> df and df2
246+
247+
These both will raise as you are trying to compare multiple values.
248+
249+
.. code-block:: python
250+
251+
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
252+
253+
254+
See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.
255+
256+
211257
Combining overlapping data sets
212258
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
213259

doc/source/gotchas.rst

+53-1
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,58 @@
1515
Caveats and Gotchas
1616
*******************
1717

18+
Using If/Truth Statements with Pandas
19+
-------------------------------------
20+
21+
.. _gotchas.truth:
22+
23+
Pandas follows the numpy convention of raising an error when you try to convert something to a ``bool``.
24+
This happens in a ``if`` or when using the boolean operations, ``and``, ``or``, or ``not``. It is not clear
25+
what the result of
26+
27+
.. code-block:: python
28+
29+
>>> if Series([False, True, False]):
30+
...
31+
32+
should be. Should it be ``True`` because it's not zero-length? ``False`` because there are ``False`` values?
33+
It is unclear, so instead, pandas raises a ``ValueError``:
34+
35+
.. code-block:: python
36+
37+
>>> if pd.Series([False, True, False]):
38+
print("I was true")
39+
Traceback
40+
...
41+
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
42+
43+
44+
If you see that, you need to explicitly choose what you want to do with it (e.g., use `any()`, `all()` or `empty`).
45+
or, you might want to compare if the pandas object is ``None``
46+
47+
.. code-block:: python
48+
49+
>>> if pd.Series([False, True, False]) is not None:
50+
print("I was not None")
51+
>>> I was not None
52+
53+
Bitwise boolean
54+
~~~~~~~~~~~~~~~
55+
56+
Bitwise boolean operators like ``==`` and ``!=`` will return a boolean ``Series``,
57+
which is almost always what you want anyways.
58+
59+
.. code-block:: python
60+
61+
>>> s = pd.Series(range(5))
62+
>>> s == 4
63+
0 False
64+
1 False
65+
2 False
66+
3 False
67+
4 True
68+
dtype: bool
69+
1870
``NaN``, Integer ``NA`` values and ``NA`` type promotions
1971
---------------------------------------------------------
2072

@@ -428,7 +480,7 @@ parse HTML tables in the top-level pandas io function ``read_html``.
428480
lxml will work correctly:
429481

430482
.. code-block:: sh
431-
483+
432484
# remove the included version
433485
conda remove lxml
434486

doc/source/release.rst

+3
Original file line numberDiff line numberDiff line change
@@ -134,6 +134,9 @@ pandas 0.13
134134
now returns a ``MultiIndex`` rather than an ``Index``. (:issue:`4039`)
135135

136136
- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
137+
- Factored out excel_value_to_python_value from ExcelFile::_parse_excel (:issue:`4589`)
138+
- ``__nonzero__`` for all NDFrame objects, will now raise a ``ValueError``, this reverts back to (:issue:`1073`, :issue:`4633`)
139+
behavior.
137140

138141
**Internal Refactoring**
139142

doc/source/v0.13.0.txt

+12
Original file line numberDiff line numberDiff line change
@@ -121,6 +121,18 @@ API changes
121121
index.set_names(["bob", "cranberry"], inplace=True)
122122

123123
- Infer and downcast dtype if ``downcast='infer'`` is passed to ``fillna/ffill/bfill`` (:issue:`4604`)
124+
- ``__nonzero__`` for all NDFrame objects, will now raise a ``ValueError``, this reverts back to (:issue:`1073`, :issue:`4633`)
125+
behavior.
126+
127+
This prevent behaviors like (which will now all raise ``ValueError``)
128+
129+
..code-block ::
130+
131+
if df:
132+
....
133+
134+
df1 and df2
135+
s1 and s2
124136

125137
Enhancements
126138
~~~~~~~~~~~~

pandas/core/generic.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -531,7 +531,8 @@ def empty(self):
531531
return not all(len(self._get_axis(a)) > 0 for a in self._AXIS_ORDERS)
532532

533533
def __nonzero__(self):
534-
return not self.empty
534+
raise ValueError("The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().")
535+
535536
__bool__ = __nonzero__
536537

537538
#----------------------------------------------------------------------

pandas/core/groupby.py

+14-1
Original file line numberDiff line numberDiff line change
@@ -2101,9 +2101,22 @@ def filter(self, func, dropna=True, *args, **kwargs):
21012101
else:
21022102
res = path(group)
21032103

2104-
if res:
2104+
def add_indexer():
21052105
indexers.append(self.obj.index.get_indexer(group.index))
21062106

2107+
# interpret the result of the filter
2108+
if isinstance(res,(bool,np.bool_)):
2109+
if res:
2110+
add_indexer()
2111+
else:
2112+
if getattr(res,'ndim',None) == 1:
2113+
if res.ravel()[0]:
2114+
add_indexer()
2115+
else:
2116+
2117+
# in theory you could do .all() on the boolean result ?
2118+
raise TypeError("the filter must return a boolean result")
2119+
21072120
if len(indexers) == 0:
21082121
filtered = self.obj.take([]) # because np.concatenate would fail
21092122
else:

pandas/core/series.py

-7
Original file line numberDiff line numberDiff line change
@@ -798,13 +798,6 @@ def __contains__(self, key):
798798
__long__ = _coerce_method(int)
799799
__int__ = _coerce_method(int)
800800

801-
def __nonzero__(self):
802-
# special case of a single element bool series degenerating to a scalar
803-
if self.dtype == np.bool_ and len(self) == 1:
804-
return bool(self.iloc[0])
805-
return not self.empty
806-
__bool__ = __nonzero__
807-
808801
# we are preserving name here
809802
def __getstate__(self):
810803
return dict(_data=self._data, name=self.name)

pandas/io/tests/test_pytables.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -1593,19 +1593,19 @@ def test_table_values_dtypes_roundtrip(self):
15931593
with ensure_clean(self.path) as store:
15941594
df1 = DataFrame({'a': [1, 2, 3]}, dtype='f8')
15951595
store.append('df_f8', df1)
1596-
assert df1.dtypes == store['df_f8'].dtypes
1596+
assert_series_equal(df1.dtypes,store['df_f8'].dtypes)
15971597

15981598
df2 = DataFrame({'a': [1, 2, 3]}, dtype='i8')
15991599
store.append('df_i8', df2)
1600-
assert df2.dtypes == store['df_i8'].dtypes
1600+
assert_series_equal(df2.dtypes,store['df_i8'].dtypes)
16011601

16021602
# incompatible dtype
16031603
self.assertRaises(ValueError, store.append, 'df_i8', df1)
16041604

16051605
# check creation/storage/retrieval of float32 (a bit hacky to actually create them thought)
16061606
df1 = DataFrame(np.array([[1],[2],[3]],dtype='f4'),columns = ['A'])
16071607
store.append('df_f4', df1)
1608-
assert df1.dtypes == store['df_f4'].dtypes
1608+
assert_series_equal(df1.dtypes,store['df_f4'].dtypes)
16091609
assert df1.dtypes[0] == 'float32'
16101610

16111611
# check with mixed dtypes

pandas/tests/test_frame.py

+1-4
Original file line numberDiff line numberDiff line change
@@ -10607,13 +10607,10 @@ def test_index_namedtuple(self):
1060710607
df = DataFrame([(1, 2), (3, 4)], index=index, columns=["A", "B"])
1060810608
self.assertEqual(df.ix[IndexType("foo", "bar")]["A"], 1)
1060910609

10610-
def test_bool_empty_nonzero(self):
10610+
def test_empty_nonzero(self):
1061110611
df = DataFrame([1, 2, 3])
10612-
self.assertTrue(bool(df))
1061310612
self.assertFalse(df.empty)
1061410613
df = DataFrame(index=['a', 'b'], columns=['c', 'd']).dropna()
10615-
self.assertFalse(bool(df))
10616-
self.assertFalse(bool(df.T))
1061710614
self.assertTrue(df.empty)
1061810615
self.assertTrue(df.T.empty)
1061910616

pandas/tests/test_generic.py

+46-4
Original file line numberDiff line numberDiff line change
@@ -73,7 +73,6 @@ def _construct(self, shape, value=None, **kwargs):
7373
arr = np.random.randn(*shape)
7474
return self._typ(arr,**kwargs)
7575

76-
7776
def _compare(self, result, expected):
7877
self._comparator(result,expected)
7978

@@ -82,14 +81,14 @@ def test_rename(self):
8281
# single axis
8382
for axis in self._axes():
8483
kwargs = { axis : list('ABCD') }
85-
o = self._construct(4,**kwargs)
84+
obj = self._construct(4,**kwargs)
8685

8786
# no values passed
8887
#self.assertRaises(Exception, o.rename(str.lower))
8988

9089
# rename a single axis
91-
result = o.rename(**{ axis : str.lower })
92-
expected = o.copy()
90+
result = obj.rename(**{ axis : str.lower })
91+
expected = obj.copy()
9392
setattr(expected,axis,list('abcd'))
9493
self._compare(result, expected)
9594

@@ -119,6 +118,41 @@ def test_get_numeric_data(self):
119118
self._compare(result, o)
120119

121120
# _get_numeric_data is includes _get_bool_data, so can't test for non-inclusion
121+
def test_nonzero(self):
122+
123+
# GH 4633
124+
# look at the boolean/nonzero behavior for objects
125+
obj = self._construct(shape=4)
126+
self.assertRaises(ValueError, lambda : bool(obj == 0))
127+
self.assertRaises(ValueError, lambda : bool(obj == 1))
128+
self.assertRaises(ValueError, lambda : bool(obj))
129+
130+
obj = self._construct(shape=4,value=1)
131+
self.assertRaises(ValueError, lambda : bool(obj == 0))
132+
self.assertRaises(ValueError, lambda : bool(obj == 1))
133+
self.assertRaises(ValueError, lambda : bool(obj))
134+
135+
obj = self._construct(shape=4,value=np.nan)
136+
self.assertRaises(ValueError, lambda : bool(obj == 0))
137+
self.assertRaises(ValueError, lambda : bool(obj == 1))
138+
self.assertRaises(ValueError, lambda : bool(obj))
139+
140+
# empty
141+
obj = self._construct(shape=0)
142+
self.assertRaises(ValueError, lambda : bool(obj))
143+
144+
# invalid behaviors
145+
146+
obj1 = self._construct(shape=4,value=1)
147+
obj2 = self._construct(shape=4,value=1)
148+
149+
def f():
150+
if obj1:
151+
print("this works and shouldn't")
152+
self.assertRaises(ValueError, f)
153+
self.assertRaises(ValueError, lambda : obj1 and obj2)
154+
self.assertRaises(ValueError, lambda : obj1 or obj2)
155+
self.assertRaises(ValueError, lambda : not obj1)
122156

123157
class TestSeries(unittest.TestCase, Generic):
124158
_typ = Series
@@ -154,6 +188,14 @@ def test_get_numeric_data_preserve_dtype(self):
154188
expected = Series([],dtype='M8[ns]')
155189
self._compare(result, expected)
156190

191+
def test_nonzero_single_element(self):
192+
193+
s = Series([True])
194+
self.assertRaises(ValueError, lambda : bool(s))
195+
196+
s = Series([False])
197+
self.assertRaises(ValueError, lambda : bool(s))
198+
157199
class TestDataFrame(unittest.TestCase, Generic):
158200
_typ = DataFrame
159201
_comparator = lambda self, x, y: assert_frame_equal(x,y)

pandas/tests/test_series.py

-6
Original file line numberDiff line numberDiff line change
@@ -296,12 +296,6 @@ def test_scalar_conversion(self):
296296
self.assert_(int(Series([1.])) == 1)
297297
self.assert_(long(Series([1.])) == 1)
298298

299-
self.assert_(bool(Series([True])) == True)
300-
self.assert_(bool(Series([False])) == False)
301-
302-
self.assert_(bool(Series([True,True])) == True)
303-
self.assert_(bool(Series([False,True])) == True)
304-
305299
def test_astype(self):
306300
s = Series(np.random.randn(5),name='foo')
307301

pandas/tseries/tests/test_timeseries.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -256,7 +256,7 @@ def test_indexing(self):
256256
df = DataFrame(randn(5,5),columns=['open','high','low','close','volume'],index=date_range('2012-01-02 18:01:00',periods=5,tz='US/Central',freq='s'))
257257
expected = df.loc[[df.index[2]]]
258258
result = df['2012-01-02 18:01:02']
259-
self.assert_(result == expected)
259+
assert_frame_equal(result,expected)
260260

261261
# this is a single date, so will raise
262262
self.assertRaises(KeyError, df.__getitem__, df.index[2],)

0 commit comments

Comments
 (0)