-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Handle extension arrays in algorithms.diff #31025
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
fcde96b
7c5e6f7
3cc7c11
5017912
dfea6a5
38fe40c
fc6eef0
84e5e93
4183b5b
ab9b23f
2f5d55f
e0ce8be
bd18da2
1c0a9fe
f3af8f5
4d0c5cf
6843e2b
bd6c157
7861f57
a496f13
869ce96
8fa2836
d34ffe3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,6 +2,7 @@ | |
Generic data algorithms. This module is experimental at the moment and not | ||
intended for public consumption | ||
""" | ||
import operator | ||
from textwrap import dedent | ||
from typing import TYPE_CHECKING, Dict, Optional, Tuple, Union | ||
from warnings import catch_warnings, simplefilter, warn | ||
|
@@ -1829,11 +1830,25 @@ def diff(arr, n: int, axis: int = 0): | |
------- | ||
shifted | ||
""" | ||
from pandas.core.arrays import PandasDtype | ||
|
||
n = int(n) | ||
na = np.nan | ||
dtype = arr.dtype | ||
|
||
if dtype.kind == "b": | ||
op = operator.xor | ||
else: | ||
op = operator.sub | ||
|
||
if isinstance(dtype, PandasDtype): | ||
# PandasArray cannot necessarily hold shifted versions of itself. | ||
arr = np.asarray(arr) | ||
dtype = arr.dtype | ||
|
||
if is_extension_array_dtype(dtype): | ||
jreback marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return op(arr, arr.shift(n)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. cc @jbrockmendel this routine likely can use some TLC when you have a chance. |
||
|
||
is_timedelta = False | ||
is_bool = False | ||
if needs_i8_conversion(arr): | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1972,6 +1972,14 @@ class ObjectValuesExtensionBlock(ExtensionBlock): | |
Series[T].values is an ndarray of objects. | ||
""" | ||
|
||
def diff(self, n: int, axis: int = 1) -> List["Block"]: | ||
# Block.shape vs. Block.values.shape mismatch | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. More technical debt from the 1D arrays inside 2D blocks :( There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, and this is on ObjectValuesExtensionBlock, but is only useful for PeriodArray. IntervalArray is the only other array to use this, and doesn't implement There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why is this only needed for ObjectValuesExtensionBlock, and not for ExtensionBlock? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suppose that in principle, we can hit this from ExtensionBlock. We hit the problem when going from a NonConsolidatable block type (like period) to a consolidatable one (like object). In that case, the values passed to In practice, I think that for most EAs, There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It certainly seems fine to ignore this corner case for now.
TomAugspurger marked this conversation as resolved.
Show resolved
Hide resolved
|
||
# Do the op, get the object-dtype ndarray, and reshape | ||
# to put into an ObjectBlock | ||
new_values = algos.diff(self.values, n, axis=axis) | ||
new_values = np.atleast_2d(new_values) | ||
return [self.make_block(values=new_values)] | ||
|
||
def external_values(self): | ||
return self.values.astype(object) | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -248,6 +248,10 @@ def test_repeat(self, data, repeats, as_series, use_numpy): | |
# Fails creating expected | ||
super().test_repeat(data, repeats, as_series, use_numpy) | ||
|
||
@pytest.mark.skip(reason="algorithms.diff skips PandasArray") | ||
def test_diff(self, data, periods): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. why is this skipped? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Either a bug or not implemented behavior in PandasArray. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe xfail it then? |
||
return super().test_diff(data, periods) | ||
|
||
|
||
@skip_nested | ||
class TestArithmetics(BaseNumPyTests, base.BaseArithmeticOpsTests): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be more idiomatic to do extract_array up front?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say not because we don't need to "extract" the array, since arrays are already passed (this doesn't get passed Series or Index objects). You can of course use
extract_array
to get rid of PandasArrays, but I think the above is more explicit.