Skip to content

DEPR: Enforce deprecation of numeric_only=None in DataFrame aggregations #49551

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Nov 9, 2022

Conversation

rhshadrach
Copy link
Member

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

@rhshadrach rhshadrach added Deprecate Functionality to remove in pandas DataFrame DataFrame data structure Reduction Operations sum, mean, min, max, etc. labels Nov 5, 2022
@rhshadrach rhshadrach added this to the 2.0 milestone Nov 5, 2022
@jbrockmendel
Copy link
Member

looks like some frame reductions tests are failing

…/pandas into enforce_df_reductions

� Conflicts:
�	pandas/core/frame.py
�	pandas/tests/apply/test_frame_apply.py
�	pandas/tests/frame/methods/test_quantile.py
�	pandas/tests/frame/test_reductions.py
�	pandas/tests/groupby/test_apply.py
�	pandas/tests/groupby/test_categorical.py
@rhshadrach
Copy link
Member Author

looks like some frame reductions tests are failing

Thanks @jbrockmendel - this got a bit tricky. When the input is numeric but has object dtype, axis=1, and numeric_only=False, in 1.5.x the result is object dtype. When numeric_only=None however, the result is float. I think this behavior originated from #676. It seems odd, but I think a very rare case. I opted to go to with the 1.5.x default (numeric_only=None) behavior here and put a line in the whatsnew. If that looks good, I can open an issue on this for followup.

Also, when working on these fixes, I realized I missed generic.py / series.py for this deprecation. Went unnoticed because None is falsey.

@@ -441,7 +441,7 @@ Removal of prior version deprecations/changes
- Changed behavior of comparison of a :class:`Timestamp` with a ``datetime.date`` object; these now compare as un-equal and raise on inequality comparisons, matching the ``datetime.datetime`` behavior (:issue:`36131`)
- Enforced deprecation of silently dropping columns that raised a ``TypeError`` in :class:`Series.transform` and :class:`DataFrame.transform` when used with a list or dictionary (:issue:`43740`)
- Change behavior of :meth:`DataFrame.apply` with list-like so that any partial failure will raise an error (:issue:`43740`)
-
- Enforced deprecation of silently dropping columns that raised in DataFrame reductions (:issue:`41480`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

specific to numeric_only=None? or "when numeric_only is not specified"?

@@ -10541,25 +10539,22 @@ def _get_data() -> DataFrame:
data = self._get_bool_data()
return data

numeric_only_bool = com.resolve_numeric_only(numeric_only)
if numeric_only is not None or axis == 0:
if numeric_only or axis == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why can't we go through this path unconditionally?

Copy link
Member Author

@rhshadrach rhshadrach Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I don't think what I wrote was all too clear, but tried to explain this in #49551 (comment). I think we should take this path unconitionality, but there would be a behavior change for numeric_only=False, axis=1 and object dtype. I plan to open an issue and do a followup here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example to make this more explicit (on main):

df = pd.DataFrame({'a': [1, 2, 3], 'b': [4, 5, 6]}, dtype=object)
print(df.sum(axis=1, numeric_only=None))
print(df.sum(axis=1, numeric_only=False))
0    5.0
1    7.0
2    9.0
dtype: float64
0    5
1    7
2    9
dtype: object

Taking this path unconditionally would be to always get object dtype. I think that's the right thing to do, but would be a change in the default (numeric_only=None) behavior and plan to handle in a follow up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, thanks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened #49603

@jbrockmendel
Copy link
Member

needs rebase, otherwise LGTM

@mroeschke mroeschke merged commit b7ea7c6 into pandas-dev:main Nov 9, 2022
@mroeschke
Copy link
Member

Thanks @rhshadrach

@jbrockmendel
Copy link
Member

nice!

@rhshadrach rhshadrach deleted the enforce_df_reductions branch November 10, 2022 00:10
codamuse pushed a commit to codamuse/pandas that referenced this pull request Nov 12, 2022
…ons (pandas-dev#49551)

* WIP

* DEPR: Enforce deprecation of numeric_only=None in DataFrame aggregations

* Partial reverts

* numeric_only in generic/series, fixup

* cleanup

* Remove docs warning

* fixups

* Fixups
mliu08 pushed a commit to mliu08/pandas that referenced this pull request Nov 27, 2022
…ons (pandas-dev#49551)

* WIP

* DEPR: Enforce deprecation of numeric_only=None in DataFrame aggregations

* Partial reverts

* numeric_only in generic/series, fixup

* cleanup

* Remove docs warning

* fixups

* Fixups
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DataFrame DataFrame data structure Deprecate Functionality to remove in pandas Reduction Operations sum, mean, min, max, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants