Skip to content

BUG: fix AttributeError raised in libreduction #29100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 22, 2019

Conversation

jbrockmendel
Copy link
Member

Bite-sized chunk of the promised "addressing of recently-added TODOs" for groupby Exception un-suppression. cc @WillAyd @jreback.

@@ -261,7 +261,7 @@ def aggregate(self, func=None, *args, **kwargs):

try:
return self._python_agg_general(func, *args, **kwargs)
except AssertionError:
except (AssertionError, AttributeError):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what hits AttributeError here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have tests that raise AttributeError here (in master) when a Categorical or IntegerArray is passed to SeriesBinGrouper.

@@ -203,7 +203,8 @@ cdef class SeriesBinGrouper:
self.f = f

values = series.values
if not values.flags.c_contiguous:
if util.is_array(values) and not values.flags.c_contiguous:
# e.g. Categorical has no `flags` attribute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this work with Categorical otherwise (the BinGrouper)? do we need tests here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have tests that get here with Categorical and IntegerArray.

@jreback jreback added the Bug label Oct 19, 2019
@jreback jreback added this to the 1.0 milestone Oct 19, 2019
@jreback
Copy link
Contributor

jreback commented Oct 19, 2019

lgtm.

@jbrockmendel
Copy link
Member Author

Looks like we have "slow" tests that still raise AttributeError. AFAICT within the groupby/reduce machinery a DatetimeIndex is having dti._data temporarily overwritten with an ndarray (instead of DatetimeArray which it should be). I expect this is related to the _index_data workaround in libreduction.

For now I'm reverting the re-raising of AttributeError and adding the index_data thing to my list of groupby-followups.

@WillAyd
Copy link
Member

WillAyd commented Oct 21, 2019

lgtm as well.

Looks like we have "slow" tests that still raise AttributeError.

Hmm do we just not hit these at all in CI?

@jbrockmendel
Copy link
Member Author

Hmm do we just not hit these at all in CI?

It's hit in the CI, but it wasn't hit locally before I opened the PR. Currently working on a branch that I think fixes the remaining AttributeError, but longer-term the _index_data thing is pretty nasty.

@jorisvandenbossche
Copy link
Member

Shouldn't this add a test, or remove an otherwise catched exception?

@jbrockmendel
Copy link
Member Author

or remove an otherwise catched exception?

There is one more case where an AttributeError gets raised, so we can't just yet.

@jorisvandenbossche
Copy link
Member

But the cause of the regression in the case I reported was that the AttributeError was no longer catched? Or for the case in our current tests, that's then probably still catched elsewhere?

@jbrockmendel
Copy link
Member Author

But the cause of the regression in the case I reported was that the AttributeError was no longer catched?

The thing that we recently stopped catching (or more specifically, started re-raising) consistently is AssertionError. Without knowing anything about the geopandas case, this is my best guess for what might have been affected.

Or for the case in our current tests, that's then probably still catched elsewhere?

If you look at the most recent commit for this PR, you'll see that in the status quo (and in this PR, after the last commit), AttributeError is caught in an except Exception block on L266. Does that answer the question?

@jorisvandenbossche
Copy link
Member

My example was running into another try/except block (see the one I linked in #29141 (comment)), so I suppose that is simply not covered by our tests.

The reason it was working is because _python_agg_general is doing a self._try_cast on the result, casting back to the extension dtype if possible:

def _python_agg_general(self, func, *args, **kwargs):
func = self._is_builtin_func(func)
f = lambda x: func(x, *args, **kwargs)
# iterate through "columns" ex exclusions to populate output dict
output = {}
for name, obj in self._iterate_slices():
try:
result, counts = self.grouper.agg_series(obj, f)
except TypeError:
continue
else:
output[name] = self._try_cast(result, obj, numeric_only=True)

@jbrockmendel
Copy link
Member Author

The branch that fixes the remaining AttributeError (and so starts re-raising all AttributeErrors) is passing locally, will make a PR for that as a follow-up once this goes in.

@jreback jreback merged commit 0a44fea into pandas-dev:master Oct 22, 2019
@jreback
Copy link
Contributor

jreback commented Oct 22, 2019

thanks

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
bongolegend pushed a commit to bongolegend/pandas that referenced this pull request Jan 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants