-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pivot_table with nested elements and numpy 1.24 #50682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Decided not to check the numpy version per se since in prior numpy versions this was raising a DeprecationWarning internally |
pandas/core/common.py
Outdated
@@ -232,7 +233,9 @@ def asarray_tuplesafe(values: Iterable, dtype: NpDtype | None = None) -> ArrayLi | |||
elif isinstance(values, ABCIndex): | |||
return values._values | |||
|
|||
if isinstance(values, list) and dtype in [np.object_, object]: | |||
if isinstance(values, list) and ( | |||
dtype in [np.object_, object] or any(is_list_like(val) for val in values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea about performance impact here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in the worst case where this can't short circuit there's a perf hit
In [1]: values = list(range(1000))
In [2]: %timeit pd.core.common.asarray_tuplesafe(values) # PR
279 µs ± 1.6 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [2]: %timeit pd.core.common.asarray_tuplesafe(values) # main
41.8 µs ± 623 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a common use-case?
Would try-except be significantly faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. With the try except I get closer to main performance (~44)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, added two minor suggestions
# Can remove warning filter once NumPy 1.24 is min version | ||
warnings.simplefilter("ignore", np.VisibleDeprecationWarning) | ||
result = np.asarray(values, dtype=dtype) | ||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general I think it's better to just wrap the line that can raise in the try block. Is it a problem with the warning catching in this case? No big deal, I guess the warnings stuff won't raise a ValueError
, but if it does, the behavior won't be as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm in this case I think this is better as is, if we would wrap try-except into the catch_warnings statement, then we would also catch warnings in the except block, which isn't what we want here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah result = np.asarray(values, dtype=dtype)
will raise a warning (due to our usage) w/ numpy < 1.24 and raise an exception with numpy >= 1.24. As mentioned, I don't want to accidentally mask a warning within the except
block.
Co-authored-by: Marc Garcia <[email protected]>
…ents and numpy 1.24) (#50792) Backport PR #50682: BUG: pivot_table with nested elements and numpy 1.24 Co-authored-by: Matthew Roeschke <[email protected]>
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.I think we have still pinned numpy in our 1.5.x branch but confirmed this still works locally
EDIT: Numpy is no longer pinned on main so this should run in the CI