BUG/df.agg-with-df-with-missing-values-results-in-IndexError #58864

abeltavares · 2024-05-29T21:45:42Z

closes BUG: df.agg with df with missing values results in IndexError #58810
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

abeltavares · 2024-06-03T08:07:21Z

Ready for review.

mroeschke · 2024-06-03T18:21:03Z

doc/source/whatsnew/v3.0.0.rst

@@ -39,6 +39,7 @@ Other enhancements
 - Users can globally disable any ``PerformanceWarning`` by setting the option ``mode.performance_warnings`` to ``False`` (:issue:`56920`)
 - :meth:`Styler.format_index_names` can now be used to format the index and column names (:issue:`48936` and :issue:`47489`)
 - :class:`.errors.DtypeWarning` improved to include column names when mixed data types are detected (:issue:`58174`)
+- :meth:`DataFrame.agg` now correctly handles missing values without raising an IndexError (:issue:`58810`)


This should be in the bug fix section

mroeschke · 2024-06-03T18:23:33Z

pandas/core/apply.py

+            col_idx_order = list(Index(s.index).get_indexer(fun))
+            col_idx_order = [i for i in col_idx_order if 0 <= i < len(s)]
+            if col_idx_order:
+                s = s.iloc[col_idx_order]


Instead I think you can filter col_idx_order where it's equal to -1. See the get_indexer docstring

Yeah, makes sense.

mroeschke · 2024-06-03T20:47:15Z

pandas/core/apply.py

+            col_idx_order = list(Index(s.index).get_indexer(fun))
+            col_idx_order = [i for i in col_idx_order if i != -1]
+            if col_idx_order:
+                s = s.iloc[col_idx_order]


Suggested change

col_idx_order = list(Index(s.index).get_indexer(fun))

col_idx_order = [i for i in col_idx_order if i != -1]

if col_idx_order:

s = s.iloc[col_idx_order]

col_idx_order = Index(s.index).get_indexer(fun)

col_idx_order = col_idx_order[col_idx_order != -1]

s = s.iloc[col_idx_order]

That won't produce the expected behavior.
Take the "A" example in the docstring without the condition it will be:

foo NaN aab NaN bar NaN dat NaN

Which is wrong.

This happens because:

col_idx_order is determined by Index(s.index).get_indexer(fun).

Since s only has one value with index ["mean"] and fun = ["max"], there is no match, so col_idx_order = [-1].

The code s = s.iloc[col_idx_order] results in an empty Series because -1 indicates no match, producing thr wrong behaviour.

Is this not supposed to be?

Ah OK then you can add back the if not col_idx_order.empty: condition

AttributeError: 'numpy.ndarray' object has no attribute 'empty'
The best way i find was to make it a list and check that way.

I guess we could use a boolean mask directly on the NumPy array returned by get_indexer applying only the valid indices.

col_idx_order = Index(s.index).get_indexer(fun) valid_idx = col_idx_order != -1 if valid_idx.any(): s = s.iloc[col_idx_order[valid_idx]]

Let me know what you think.

Sure that solution works. Thanks.

mroeschke · 2024-06-03T20:47:36Z

pandas/core/apply.py


        # assign the new user-provided "named aggregation" as index names, and reindex
        # it based on the whole user-provided names.
-        s.index = reordered_indexes[idx : idx + len(fun)]
+        if len(s) > 0:


Suggested change

if len(s) > 0:

if not s.empty:

mroeschke · 2024-06-05T16:59:09Z

Thanks @abeltavares

abeltavares force-pushed the BUG/df.agg-with-df-with-missing-values-results-in-IndexError branch from de69d12 to 5947e3e Compare May 29, 2024 21:45

abeltavares changed the title ~~fix~~ BUG/df.agg-with-df-with-missing-values-results-in-IndexError May 29, 2024

abeltavares force-pushed the BUG/df.agg-with-df-with-missing-values-results-in-IndexError branch from 58df051 to 092b60a Compare May 30, 2024 09:55

rhshadrach added Bug Apply Apply, Aggregate, Transform, Map labels Jun 1, 2024

abeltavares force-pushed the BUG/df.agg-with-df-with-missing-values-results-in-IndexError branch 3 times, most recently from 7ad3684 to 3aec102 Compare June 2, 2024 21:52

fix

8c6f34b

abeltavares force-pushed the BUG/df.agg-with-df-with-missing-values-results-in-IndexError branch from 3aec102 to 8c6f34b Compare June 2, 2024 21:53

mroeschke reviewed Jun 3, 2024

View reviewed changes

improve and fix bug entry

4f32bc5

mroeschke reviewed Jun 3, 2024

View reviewed changes

update

165e2f3

abeltavares requested a review from mroeschke June 5, 2024 06:50

mroeschke approved these changes Jun 5, 2024

View reviewed changes

mroeschke added this to the 3.0 milestone Jun 5, 2024

mroeschke merged commit f7590e6 into pandas-dev:main Jun 5, 2024
47 checks passed

abeltavares deleted the BUG/df.agg-with-df-with-missing-values-results-in-IndexError branch June 5, 2024 19:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG/df.agg-with-df-with-missing-values-results-in-IndexError #58864

BUG/df.agg-with-df-with-missing-values-results-in-IndexError #58864

Uh oh!

abeltavares commented May 29, 2024 •

edited

Loading

Uh oh!

abeltavares commented Jun 3, 2024

Uh oh!

mroeschke Jun 3, 2024

Uh oh!

abeltavares Jun 3, 2024

Uh oh!

mroeschke Jun 3, 2024

Uh oh!

abeltavares Jun 3, 2024

Uh oh!

mroeschke Jun 3, 2024

Uh oh!

abeltavares Jun 3, 2024 •

edited

Loading

Uh oh!

mroeschke Jun 3, 2024

Uh oh!

abeltavares Jun 4, 2024 •

edited

Loading

Uh oh!

mroeschke Jun 4, 2024

Uh oh!

mroeschke Jun 3, 2024

Uh oh!

Uh oh!

mroeschke commented Jun 5, 2024

Uh oh!

Uh oh!

Uh oh!

BUG/df.agg-with-df-with-missing-values-results-in-IndexError #58864

BUG/df.agg-with-df-with-missing-values-results-in-IndexError #58864

Uh oh!

Conversation

abeltavares commented May 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abeltavares commented Jun 3, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abeltavares Jun 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

abeltavares Jun 4, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mroeschke commented Jun 5, 2024

Uh oh!

Uh oh!

abeltavares commented May 29, 2024 •

edited

Loading

abeltavares Jun 3, 2024 •

edited

Loading

abeltavares Jun 4, 2024 •

edited

Loading