Skip to content

BUG: DataFrame.(any|all) inconsistency #34918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
40a55ec
CLN: avoid unnecessary consolidate_inplace calls
jbrockmendel Jun 21, 2020
ba0c526
Fix bool-object corner case
jbrockmendel Jun 21, 2020
e73204b
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Jun 23, 2020
faf8bc8
add test
jbrockmendel Jun 23, 2020
e9a9d1a
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Jun 25, 2020
fa398e0
more explicit assertions
jbrockmendel Jun 25, 2020
52902a6
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Jul 6, 2020
b5ecafb
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Jul 7, 2020
a65cac0
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Aug 4, 2020
da19314
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Aug 5, 2020
887f6d1
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Aug 25, 2020
1d72508
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 5, 2020
4d6749b
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 8, 2020
103d446
whatsnew
jbrockmendel Sep 8, 2020
5adff4e
revert whitespace change
jbrockmendel Sep 8, 2020
bf3d940
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 11, 2020
fd286dc
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 11, 2020
a772fb7
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 20, 2020
03bb7a9
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 21, 2020
2c4c011
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 22, 2020
3f2cf67
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 22, 2020
214b362
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 22, 2020
3804b0d
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 22, 2020
3f141b5
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 24, 2020
b39e533
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 24, 2020
5f8086e
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 24, 2020
eb8c75d
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Sep 26, 2020
2dc84b9
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Oct 3, 2020
1a27257
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 5, 2020
aca1f84
Merge branch 'master' of https://github.com/pandas-dev/pandas into re…
jbrockmendel Nov 5, 2020
7bfadbc
lint fixup
jbrockmendel Nov 5, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -419,6 +419,7 @@ Numeric
^^^^^^^
- Bug in :func:`to_numeric` where float precision was incorrect (:issue:`31364`)
- Bug in :meth:`DataFrame.any` with ``axis=1`` and ``bool_only=True`` ignoring the ``bool_only`` keyword (:issue:`32432`)
- Bug in :meth:`DataFrame.any` and :meth:`DataFrame.all` with object-dtype columns sometimes returning larger :class:`Series` when operating on a subset of columns (:issue:`34918`)
- Bug in :meth:`Series.equals` where a ``ValueError`` was raised when numpy arrays were compared to scalars (:issue:`35267`)
- Bug in :class:`Series` where two :class:`Series` each have a :class:`DatetimeIndex` with different timezones having those indexes incorrectly changed when performing arithmetic operations (:issue:`33671`)
- Bug in :meth:`pd._testing.assert_almost_equal` was incorrect for complex numeric types (:issue:`28235`)
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
)
from pandas.core.dtypes.common import (
DT64NS_DTYPE,
is_bool_dtype,
is_dtype_equal,
is_extension_array_dtype,
is_list_like,
Expand Down Expand Up @@ -713,8 +714,9 @@ def get_bool_data(self, copy: bool = False) -> "BlockManager":
copy : bool, default False
Whether to copy the blocks
"""
self._consolidate_inplace()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think consolidating here might be the better option, as it at least ensures consistent behaviour when you have multiple columns independent of consolidation status. The inconsistency between a single column and multiple column is still present of course, but IMO there is nothing to do about this giving our consolidated blocks (well, we could deprecate object dtype being regarded as bool ..)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, we could deprecate object dtype being regarded as bool

This PR does that (well changes outright, not deprecates). AFAICT thats the only way to make the behavior independent of whether the presence of another object-dtype-but-not-bool-like column.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should deprecate this first

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only non-test place where get_bool_data is called from is in DataFrame._reduce which I know you've been working on recently. The topic of how to handle object-dtype blocks has come up there, too. If we end up handling object-dtype blocks column-wise there, that would render this distinction irrelevant.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO we should deprecate this first

what exactly are you suggesting we deprecate? do you have an example of something that breaks on this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche can you respond here re what specific behavior you'd like to deprecate?

return self._combine([b for b in self.blocks if b.is_bool], copy)
# Note: use is_bool_dtype instead of blk.is_bool to exclude
# object-dtype blocks containing all-bool entries.
return self._combine([b for b in self.blocks if is_bool_dtype(b.dtype)], copy)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may actually be a bug, since it implies the following inconsistent behavior in master:

ser = pd.Series([True, False, True], dtype=object)
ser2 = pd.Series(["A", "B", "C"])

df = ser.to_frame("A")
>>> df._get_bool_data()
       A
0   True
1  False
2   True

df["B"] = ser2
>>> df._get_bool_data()
Empty DataFrame
Columns: []
Index: [0, 1, 2]

adding columns shouldnt make get_bool_data smaller.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a test for this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test added


def get_numeric_data(self, copy: bool = False) -> "BlockManager":
"""
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/internals/test_internals.py
Original file line number Diff line number Diff line change
Expand Up @@ -687,6 +687,20 @@ def test_get_bool_data(self):
np.array([True, False, True]),
)

def test_get_bool_data_consistency(self):
# GH#34918
ser = Series([True, False, True], dtype=object)
ser2 = Series(["A", "B", "C"])
df = ser.to_frame("A")

expected = DataFrame(index=ser.index)
bd1 = df._get_bool_data()
tm.assert_frame_equal(bd1, expected)

df["B"] = ser2
bd2 = df._get_bool_data()
tm.assert_frame_equal(bd2, expected)

def test_unicode_repr_doesnt_raise(self):
repr(create_mgr("b,\u05d0: object"))

Expand Down