Skip to content

REF: avoid catching all exceptions in libreduction #38285

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Dec 17, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 25 additions & 6 deletions pandas/_libs/reduction.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -365,11 +365,7 @@ def apply_frame_axis0(object frame, object f, object names,
chunk = slider.dummy
object.__setattr__(chunk, 'name', names[i])

try:
piece = f(chunk)
except Exception as err:
# We can't be more specific without knowing something about `f`
raise InvalidApply("Let this error raise above us") from err
piece = f(chunk)

# Need to infer if low level index slider will cause segfaults
require_slow_apply = i == 0 and piece is chunk
Expand Down Expand Up @@ -406,7 +402,8 @@ cdef class BlockSlider:
"""
cdef:
object frame, dummy, index, block
list blk_values
list blocks, blk_values
ndarray orig_blklocs, orig_blknos
ndarray values
Slider idx_slider
char **base_ptrs
Expand All @@ -418,6 +415,13 @@ cdef class BlockSlider:
self.dummy = frame[:0]
self.index = self.dummy.index

# GH#35417 attributes we need to restore at each step in case
# the function modified them.
mgr = self.dummy._mgr
self.orig_blklocs = mgr.blklocs
self.orig_blknos = mgr.blknos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this fixing a specific bug? (but don't see any test added)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

without this we have other tests that fail in cython but not in python, but im not aware of any bugs this causes in master

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean that on master the try: piece = f(chunk) ... would fail, and thus be elevated to the python level (where it worked), but with this fix ensures those specific cases don't fail anymore an work on the cython level?

Can you give an example of such a test?

Also, the issues number in the comment seems not directly related (well, it might be it was needed there as well, but since that's an open PR, that's a bit confusing reference)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give an example of such a test?

FAILED pandas/tests/groupby/test_apply_mutate.py::test_mutate_groups - ValueError: shape mismatch: value array of shape (1,6) could not be broa...

Will attempt to clarify .

self.blocks = [x for x in self.dummy._mgr.blocks]

self.blk_values = [block.values for block in self.dummy._mgr.blocks]

for values in self.blk_values:
Expand All @@ -441,6 +445,9 @@ cdef class BlockSlider:
cdef:
ndarray arr
Py_ssize_t i

self._restore_blocks()

# move blocks
for i in range(self.nblocks):
arr = self.blk_values[i]
Expand All @@ -460,9 +467,21 @@ cdef class BlockSlider:
cdef:
ndarray arr
Py_ssize_t i

self._restore_blocks()

for i in range(self.nblocks):
arr = self.blk_values[i]

# axis=1 is the frame's axis=0
arr.data = self.base_ptrs[i]
arr.shape[1] = 0

cdef _restore_blocks(self):
"""
Ensure that we have the original blocks, blknos, and blklocs.
"""
mgr = self.dummy._mgr
mgr.blocks = self.blocks
mgr._blklocs = self.orig_blklocs
mgr._blknos = self.orig_blknos
11 changes: 4 additions & 7 deletions pandas/core/groupby/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -198,13 +198,10 @@ def apply(self, f: F, data: FrameOrSeries, axis: int = 0):
try:
result_values, mutated = splitter.fast_apply(f, sdata, group_keys)

except libreduction.InvalidApply as err:
# This Exception is raised if `f` triggers an exception
# but it is preferable to raise the exception in Python.
if "Let this error raise above us" not in str(err):
# TODO: can we infer anything about whether this is
# worth-retrying in pure-python?
raise
except IndexError:
# test_apply_mutate this is a rare case in which re-running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which test is this referring to? (I don't see a test_apply_mutate)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_apply_mutate.py, will update to make clearer

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel is there something you were going to update here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the reminder, just updated

# in python-space may make a difference
pass

else:
# If the fast apply path could be used we can return here.
Expand Down