-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: DataFrame.all(bool_only=True) inconsistency with object dtype #37799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
77ee0fe
f8736d5
359abd9
87313f4
76a97eb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -236,6 +236,54 @@ Other enhancements | |
- Improve numerical stability for :meth:`Rolling.skew()`, :meth:`Rolling.kurt()`, :meth:`Expanding.skew()` and :meth:`Expanding.kurt()` through implementation of Kahan summation (:issue:`6929`) | ||
- Improved error reporting for subsetting columns of a :class:`DataFrameGroupBy` with ``axis=1`` (:issue:`37725`) | ||
|
||
.. --------------------------------------------------------------------------- | ||
|
||
.. _whatsnew_120.notable_bug_fixes: | ||
|
||
Notable bug fixes | ||
~~~~~~~~~~~~~~~~~ | ||
|
||
These are bug fixes that might have notable behavior changes. | ||
|
||
Consistency of DataFrame Reductions | ||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ | ||
:meth:`DataFrame.any` and :meth:`DataFrame.all` with ``bool_only=True`` now | ||
determines whether to exclude object-dtype columns on a column-by-column basis, | ||
instead of checking if *all* object-dtype columns can be considered boolean. | ||
|
||
This prevents pathological behavior where applying the reduction on a subset | ||
of columns could result in a larger :class:`Series` result. See (:issue:`37799`). | ||
|
||
.. ipython:: python | ||
|
||
df = pd.DataFrame({"A": ["foo", "bar"], "B": [True, False]}, dtype=object) | ||
df["C"] = pd.Series([True, True]) | ||
|
||
|
||
*Previous behavior*: | ||
|
||
.. code-block:: ipython | ||
|
||
In [5]: df.all(bool_only=True) | ||
Out[5]: | ||
C True | ||
dtype: bool | ||
|
||
In [6]: df[["B", "C"]].all(bool_only=True) | ||
Out[6]: | ||
B False | ||
C True | ||
dtype: bool | ||
|
||
*New behavior*: | ||
|
||
.. ipython:: python | ||
|
||
In [5]: df.all(bool_only=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. don't need the prompts (remove in followon / other PR) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so remove the "In [5]: "? do i also remove the "Out [5]: "? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yep i python will generate appropriately |
||
|
||
In [6]: df[["B", "C"]].all(bool_only=True) | ||
|
||
|
||
.. _whatsnew_120.api_breaking.python: | ||
|
||
Increased minimum version for Python | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -450,6 +450,20 @@ def f(mask, val, idx): | |
|
||
return self.split_and_operate(None, f, inplace) | ||
|
||
def _split(self) -> List["Block"]: | ||
""" | ||
Split a block into a list of single-column blocks. | ||
""" | ||
assert self.ndim == 2 | ||
|
||
new_blocks = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. might be worth adding some unit tests for this (e.g. on a block that is empty, since column and multilple) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good idea, was accidentally making a copy instead of a view! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. wow, testing is good! |
||
for i, ref_loc in enumerate(self.mgr_locs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could be done as a list comprehension There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. but nbd |
||
vals = self.values[slice(i, i + 1)] | ||
|
||
nb = self.make_block(vals, [ref_loc]) | ||
new_blocks.append(nb) | ||
return new_blocks | ||
|
||
def split_and_operate(self, mask, f, inplace: bool) -> List["Block"]: | ||
""" | ||
split the block per-column, and apply the callable f | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add this PR number here, and is there an actual issue? ping on green.