Series.any() and .all() don't return bool values if dtype=object #12863

jdavies1618 · 2016-04-11T15:26:44Z

Code Sample, a copy-pastable example if possible

s = pd.Series(index=range(5), data=['a', 'b', 'c', 'd', 'e'], dtype=object)
print [s.any(), s.all()]

Expected Output

[True, True]

Actual output

['a', 'e']

output of `pd.show_versions()`

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Darwin
OS-release: 15.0.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: None
nose: 1.3.7
pip: 8.1.1
setuptools: None
Cython: None
numpy: 1.10.1
scipy: 0.16.1
statsmodels: None
IPython: 4.0.0
sphinx: 1.3.3
patsy: None
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.4.3
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.11
pymysql: None
psycopg2: None
Jinja2: None

The text was updated successfully, but these errors were encountered:

jreback · 2016-04-11T16:50:36Z

what you are doing doesn't make any sense but it is correct.

http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.any.html

I guess this should return a boolean. But to be honest I don't know what numpy is doing.

In [2]: np.array(['a','b','c'],dtype='object').any()
Out[2]: 'a'

cc @charris
cc @njsmith

charris · 2016-04-11T17:24:54Z

Hmm..., yes, it is correct

In [1]: 'a' or 'b'
Out[1]: 'a'

I suppose we could call bool on the result for the logical operators.

In [2]: bool('a' or 'b')
Out[2]: True

jreback · 2016-04-11T18:28:31Z

yep, looks like this is the case for dtype=object in general

In [4]: np.array([1,2,3],dtype=object).any()
Out[4]: 1

jreback · 2016-04-11T18:29:50Z

ok, I think we will add a bool() if its a scalar on the pandas side for compat.

@jdavies1618 want to do a pull-request?

jdavies1618 · 2016-04-11T18:41:46Z

Will do - thanks for the quick feedback!

artheist · 2016-04-13T19:26:07Z

This is weird, it looks like there are some discrepancies in the test suite for nanall and nanany.
(test_anaylitics.py file in series test suite)

In test_all_any, these three lines seems to assert the reported bug is the desired behaviour
# Alternative types, with implicit 'object' dtype.
s = Series(['abc', True])
self.assertEqual('abc', s.any()) # 'abc' || True => 'abc'
Also, in the following test test_all_any_params, the first lines which test for NaN are not coherent
This line seems to state that NaN is considered true
self.assertTrue(s1.all(skipna=False)) # nan && True => True
but this line contradicts if:
self.assertTrue(np.isnan(s2.any(skipna=False))) # nan || False => nan

In my opinion, it would be relevant (and more coherent) to have the same behavior as python any and all buitlin methods, such that:
any(['abc', True]) -> True
all(['abc', False]) -> True
any([float('NaN'), True]) -> True
all([float('NaN'), True]) - True
any([float('NaN'), False]) -> True
all([float('NaN'), False]) - False

If this is the case, I have a commit that fixes it, but I then would have to change the test suite also.

jreback · 2016-04-14T02:56:53Z

so what are you proposing for changes?

jdavies1618 · 2016-04-14T17:29:46Z

Here's a pass I took:

--- a/pandas/tests/series/test_analytics.py
+++ b/pandas/tests/series/test_analytics.py
@@ -548,16 +548,19 @@ class TestSeriesAnalytics(TestData, tm.TestCase):

          self.assertTrue(bool_series.any())
          # Alternative types, with implicit 'object' dtype.
 +        # Changed for gh-12863
          s = Series(['abc', True])
 -        self.assertEqual('abc', s.any())  # 'abc' || True => 'abc'
 +        self.assertEqual(True, s.any())  # 'abc' || True => 'abc'

      def test_all_any_params(self):
          # Check skipna, with implicit 'object' dtype.
 +        # Changed for gh-12863
          s1 = Series([np.nan, True])
          s2 = Series([np.nan, False])
 -        self.assertTrue(s1.all(skipna=False))  # nan && True => True
 +        self.assertTrue(s1.all(skipna=False))  # bool(nan && True) => True
          self.assertTrue(s1.all(skipna=True))
 -        self.assertTrue(np.isnan(s2.any(skipna=False)))  # nan || False => nan
 +        # Below: bool(nan || False) => False
 +        self.assertFalse(np.isnan(s2.any(skipna=False)))
          self.assertFalse(s2.any(skipna=True))

artheist · 2016-04-14T20:34:39Z

There are more complete tests to be done for proper coverage of all possible situation.
I just made a pull request.

artheist · 2016-04-14T20:35:36Z

Sorry, I was too fast, there are still errors in tests, sorry

toobaz · 2017-04-22T17:47:38Z

I guess this should return a boolean. But to be honest I don't know what numpy is doing.

Notice this is considered a bug and is (or was) worked on in numpy:
numpy/numpy#4352
numpy/numpy#5267

So that's probably where effort would better be directed.

jreback · 2017-04-23T13:42:52Z

So that's probably where effort would better be directed.

good luck with that. And even if a PR was actually proposed / accepted, then this bug would linger in pandas even longer. Much better to do a fix here.

For 'any()' and 'all()' not returning bool when ndarray is an object

ShaharNaveh · 2019-12-22T23:02:55Z

take

For 'any()' and 'all()' not returning bool when ndarray is an object

For the current state of the code, this change is not required. It will, however, save someone fixing this in future, when actually wanting to use a newer OpenSAFELY Python image. However, the tests currently rely on a quirky behaviour of `.all()` in pandas that has been fixed. In the version of pandas inside the `python:latest` image (v1.0.1), the behaviour is that `.all()` can return an `dtype=object` instead of a Boolean, and our tests as written pass. In later versions of pandas, `.all()` returns a Boolean instead, and our tests fail. We can rewrite the tests to work both prior to and post this pandas fix. This was checked by running the tests with the `Dockerfile` as provided, and editing the `Dockerfile` to use the `v2` OpenSAFELY Python image. See pandas-dev/pandas#12863.

jreback added Dtype Conversions Unexpected or buggy dtype conversions Usage Question labels Apr 11, 2016

jreback added Compat pandas objects compatability with Numpy or Python functions Difficulty Novice labels Apr 11, 2016

jreback added this to the Next Major Release milestone Apr 11, 2016

artheist mentioned this issue Apr 14, 2016

Fix ndarray allany #12901

Closed

4 tasks

mroeschke added a commit to mroeschke/pandas that referenced this issue Nov 16, 2016

BUG: Any/All sometimes does not return boolean (pandas-dev#12863)

aa21fa4

TomAugspurger added the good first issue label Oct 11, 2017

jreback removed the Difficulty Novice label Dec 15, 2017

TomAugspurger mentioned this issue Mar 9, 2018

DOC: Improved the docstring of Series.any() #20078

Closed

5 tasks

jorisvandenbossche mentioned this issue Aug 2, 2019

any/all reductions on boolean object-typed Series #27709

Closed

mroeschke removed the Usage Question label Oct 9, 2019

jbrockmendel removed the Effort Low label Oct 21, 2019

ShaharNaveh pushed a commit to ShaharNaveh/pandas that referenced this issue Dec 22, 2019

BUG: Fix for pandas-dev#12863

26ad0b3

For 'any()' and 'all()' not returning bool when ndarray is an object

ShaharNaveh mentioned this issue Dec 22, 2019

BUG: Series.any() and .all() don't return bool values if dtype=object #30416

Closed

5 tasks

github-actions bot assigned ShaharNaveh Dec 22, 2019

ShaharNaveh pushed a commit to ShaharNaveh/pandas that referenced this issue Dec 23, 2019

BUG: Fix for pandas-dev#12863

f9037cf

For 'any()' and 'all()' not returning bool when ndarray is an object

ShaharNaveh pushed a commit to ShaharNaveh/pandas that referenced this issue Dec 23, 2019

BUG: Fix for pandas-dev#12863

876b763

For 'any()' and 'all()' not returning bool when ndarray is an object

ShaharNaveh pushed a commit to ShaharNaveh/pandas that referenced this issue Dec 23, 2019

BUG: Fix for pandas-dev#12863

b971974

For 'any()' and 'all()' not returning bool when ndarray is an object

ShaharNaveh pushed a commit to ShaharNaveh/pandas that referenced this issue Dec 23, 2019

BUG: Fix for pandas-dev#12863

78da081

For 'any()' and 'all()' not returning bool when ndarray is an object

ShaharNaveh pushed a commit to ShaharNaveh/pandas that referenced this issue Dec 23, 2019

BUG: Fix for pandas-dev#12863

48f08c0

For 'any()' and 'all()' not returning bool when ndarray is an object

ShaharNaveh removed their assignment Jan 8, 2020

mroeschke added Bug and removed Compat pandas objects compatability with Numpy or Python functions good first issue labels Apr 10, 2020

jbrockmendel added Numeric Operations Arithmetic, Comparison, and Logical operations Reduction Operations sum, mean, min, max, etc. labels Sep 20, 2020

jorisvandenbossche mentioned this issue Oct 1, 2020

DOC: Different behaviour with respect to Series.all docs when NA values are present #36692

Open

3 tasks

This was referenced Oct 8, 2020

BUG: any() and all() behavior on string series is different from python #36880

Closed

BUG: dataframe.any() method behaves differently for empty rows and columns #35450

Closed

mzeitlin11 mentioned this issue Jan 10, 2021

BUG: DataFrame.any() with not returning Boolean series with skipna=False across columns with numeric and string types #38962

Closed

2 tasks

This was referenced Apr 21, 2021

BUG: Errors caused by DataFrame.all(..., skipna=False, ...) in rows without na values. #41079

Open

BUG: any/all not returning booleans for object type #41102

Merged

jreback modified the milestones: Contributions Welcome, 1.3 Apr 22, 2021

jreback closed this as completed in #41102 May 6, 2021

mzeitlin11 mentioned this issue May 21, 2021

QST: | (or) behavior when working with two series that have different indexes #41604

Open

2 tasks

jorisvandenbossche mentioned this issue Mar 17, 2023

BUG: any() and all() raise with extension strings #51939

Open

3 tasks

StevenMaude mentioned this issue Jul 4, 2024

Update dependencies 2024-07 opensafely-core/interactive-templates#368

Merged

StevenMaude mentioned this issue Jul 4, 2024

Fix quirks in pandas test behaviour opensafely-core/interactive-templates#370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Series.any() and .all() don't return bool values if dtype=object #12863

Series.any() and .all() don't return bool values if dtype=object #12863

jdavies1618 commented Apr 11, 2016

jreback commented Apr 11, 2016

charris commented Apr 11, 2016

jreback commented Apr 11, 2016

jreback commented Apr 11, 2016

jdavies1618 commented Apr 11, 2016

artheist commented Apr 13, 2016

jreback commented Apr 14, 2016

jdavies1618 commented Apr 14, 2016

artheist commented Apr 14, 2016

artheist commented Apr 14, 2016

toobaz commented Apr 22, 2017

jreback commented Apr 23, 2017

ShaharNaveh commented Dec 22, 2019

Series.any() and .all() don't return bool values if dtype=object #12863

Series.any() and .all() don't return bool values if dtype=object #12863

Comments

jdavies1618 commented Apr 11, 2016

Code Sample, a copy-pastable example if possible

Expected Output

Actual output

output of pd.show_versions()

jreback commented Apr 11, 2016

charris commented Apr 11, 2016

jreback commented Apr 11, 2016

jreback commented Apr 11, 2016

jdavies1618 commented Apr 11, 2016

artheist commented Apr 13, 2016

jreback commented Apr 14, 2016

jdavies1618 commented Apr 14, 2016

artheist commented Apr 14, 2016

artheist commented Apr 14, 2016

toobaz commented Apr 22, 2017

jreback commented Apr 23, 2017

ShaharNaveh commented Dec 22, 2019

output of `pd.show_versions()`