BUG: pandas.DataFrame.any bool_only parameter not working since v0.24.0 #32432

funjo · 2020-03-04T12:35:02Z

Code Sample, a copy-pastable example if possible

This is the code that I run to test the DataFrame.any function in different pandas versions

import pandas as pd
pd.__version__
df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
df
df.dtypes
df.any(axis=1)
df.any(axis=1, bool_only=True)

Problem description

In pandas version 0.23.4, the output is expected when bool_only=True is specified

>>> import pandas as pd
>>> pd.__version__
'0.23.4'
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
       A  B
0   True  1
1  False  2
>>> df.dtypes
A     bool
B    int64
>>> df.any(axis=1)
0    True
1    True
dtype: bool
>>> df.any(axis=1, bool_only=True)
0     True
1    False
dtype: bool

However in pandas version 0.24.0, the final output is wrong for bool_only=True

>>> import pandas as pd
>>> pd.__version__
'0.24.0'
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df
       A  B
0   True  1
1  False  2
>>> df.dtypes
A     bool
B    int64
>>> df.any(axis=1)
0    True
1    True
dtype: bool
>>> df.any(axis=1, bool_only=True)
0    True
1    True
dtype: bool

Expected Output

>>> df.any(axis=1, bool_only=True)
0     True
1    False
dtype: bool

Output of `pd.show_versions()`

I can only reproduce this bug for pandas version v0.24.0 and above so I'll include details for v0.24.0.

INSTALLED VERSIONS

commit: None
python: 3.7.4.final.0
python-bits: 64
OS: Darwin
OS-release: 18.7.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: en_GB.UTF-8

pandas: 0.24.0
pytest: 5.0.1
pip: 20.0.2
setuptools: 40.8.0
Cython: None
numpy: 1.17.0
scipy: None
pyarrow: 0.14.1
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.8.0
pytz: 2019.2
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

shaido987 · 2020-03-06T06:53:20Z

A regression was introduced in 0.24.0 that made DataFrame.any and DataFrame.all ignore the bool_only parameter, see here: #25101. It would see that the fix applied did not cover all cases.

funjo · 2020-03-06T10:53:08Z

@shaido987 Thanks for the link. I can confirm that this issue still exists in the latest version v1.0.1, so the fix did not cover all cases for sure.

>>> import pandas as pd
>>> pd.__version__
'1.0.1'
>>> df = pd.DataFrame({"A": [True, False], "B": [1, 2]})
>>> df.any(axis=1)
0    True
1    True
dtype: bool

jbrockmendel · 2020-09-04T01:23:05Z

In DataFrame._reduce there is a function _get_data:

        def _get_data(axis_matters: bool) -> "DataFrame":
            if filter_type is None:
                data = self._get_numeric_data()
            elif filter_type == "bool":
                if axis_matters:
                    # GH#25101, GH#24434
                    data = self._get_bool_data() if axis == 0 else self
                else:
                    data = self._get_bool_data()
            else:  # pragma: no cover
                msg = (
                    f"Generating numeric_only data with filter_type {filter_type} "
                    "not supported."
                )
                raise NotImplementedError(msg)
            return data

It looks like changing the data = self._get_bool_data() if axis == 0 else self to just data = self._get_bool_data() fixes the example in the OP. Not sure if that breaks anything else.

simplyrucha mentioned this issue Jul 29, 2020

BUG: dataframe.any() method behaves differently for empty rows and columns #35450

Closed

3 tasks

jbrockmendel added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Sep 4, 2020

jbrockmendel mentioned this issue Sep 4, 2020

BUG: DataFrame.any with axis=1 and bool_only=True #36106

Merged

5 tasks

jreback added this to the 1.2 milestone Sep 11, 2020

jreback closed this as completed in #36106 Sep 11, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: pandas.DataFrame.any bool_only parameter not working since v0.24.0 #32432

BUG: pandas.DataFrame.any bool_only parameter not working since v0.24.0 #32432

funjo commented Mar 4, 2020 •

edited

Loading

INSTALLED VERSIONS

shaido987 commented Mar 6, 2020

funjo commented Mar 6, 2020

jbrockmendel commented Sep 4, 2020

BUG: pandas.DataFrame.any bool_only parameter not working since v0.24.0 #32432

BUG: pandas.DataFrame.any bool_only parameter not working since v0.24.0 #32432

Comments

funjo commented Mar 4, 2020 • edited Loading

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

shaido987 commented Mar 6, 2020

funjo commented Mar 6, 2020

jbrockmendel commented Sep 4, 2020

funjo commented Mar 4, 2020 •

edited

Loading

Output of `pd.show_versions()`