Skip to content

BUG: AttributeError raised by map inside groupby reduction #29160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
2 changes: 1 addition & 1 deletion pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -262,7 +262,7 @@ def aggregate(self, func=None, *args, **kwargs):

try:
return self._python_agg_general(func, *args, **kwargs)
except (AssertionError, TypeError):
except (AssertionError, TypeError, AttributeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to remove AttributeError from the clause below

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you indicate where the errors come from here as much as possible

raise
except Exception:
result = self._aggregate_named(func, *args, **kwargs)
Expand Down
4 changes: 3 additions & 1 deletion pandas/core/indexes/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -731,7 +731,9 @@ def astype(self, dtype, copy=True):
# Ensure that self.astype(self.dtype) is self
return self

new_values = self._data.astype(dtype, copy=copy)
# use _eadata since _data links to _index_data which can be
# overwritten in groupby-reduction
new_values = self._eadata.astype(dtype, copy=copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain why this is needed in astype, while the comment is about groupby?

And it's not possible to prevent that _data gets written in the first place? (or is it done in a try/except that only fails later .. ). Adding yet another data like attribute doesn't seem ideal ..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And it's not possible to prevent that _data gets written in the first place?

Take a look at how we patch _index_data in libreduction. I spent some time trying to find a way around that and didn't find any.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make the _index_data in a settable property that on setting also updates the underlying data of the EA? (to keep both synced)

(didn't look in detail, so this might be nonsense)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_index_data is set using object.__setattr__, so I don't think that would work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_index_data is set using object.setattr, so I don't think that would work.

Based on a quick test, it seems that setattr still goes through a property's setter (now, if the actual idea can help, I would need to take a further look)

Do you have an example test that hits this (a failing test if you don't do the above change) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have an example test that hits this (a failing test if you don't do the above change) ?

If you make the groupby change to stop catching AttributeError but don't change the index code, there is a test (dont know which off the top of my head) that raises an AttributeError in DatetimeIndex.map because index._data is an ndarray instead of a DTA.


# pass copy=False because any copying will be done in the
# _data.astype call above
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -368,6 +368,8 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, dtype=None):
result.name = name
# For groupby perf. See note in indexes/base about _index_data
result._index_data = dtarr._data
result._eadata = dtarr

result._reset_identity()
return result

Expand Down
2 changes: 2 additions & 0 deletions pandas/core/indexes/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,8 @@ def _simple_new(cls, values, name=None, freq=None, **kwargs):
result._data = values
# For groupby perf. See note in indexes/base about _index_data
result._index_data = values._data
result._eadata = values

result.name = name
result._reset_identity()
return result
Expand Down
1 change: 1 addition & 0 deletions pandas/core/indexes/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -273,6 +273,7 @@ def _simple_new(cls, values, name=None, freq=None, dtype=_TD_DTYPE):
result.name = name
# For groupby perf. See note in indexes/base about _index_data
result._index_data = tdarr._data
result._eadata = tdarr

result._reset_identity()
return result
Expand Down