-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Fix copy semantics in __array__
#60046
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1c5195c
2183861
404827b
ec08728
4ac6323
9b6c209
77058df
6799f55
4217baf
5e4cb87
357f8a0
9927903
3b000be
421f904
d70405e
5289a82
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -579,11 +579,12 @@ def astype(self, dtype: AstypeArg, copy: bool = True) -> ArrayLike: | |
raise ValueError("Cannot convert float NaN to integer") | ||
|
||
elif len(self.codes) == 0 or len(self.categories) == 0: | ||
result = np.array( | ||
self, | ||
dtype=dtype, | ||
copy=copy, | ||
) | ||
# For NumPy 1.x compatibility we cannot use copy=None. And | ||
# `copy=False` has the meaning of `copy=None` here: | ||
if not copy: | ||
result = np.asarray(self, dtype=dtype) | ||
else: | ||
result = np.array(self, dtype=dtype) | ||
|
||
else: | ||
# GH8628 (PERF): astype category codes instead of astyping array | ||
|
@@ -1663,7 +1664,7 @@ def __array__( | |
Specifies the the dtype for the array. | ||
|
||
copy : bool or None, optional | ||
Unused. | ||
See :func:`numpy.asarray`. | ||
|
||
Returns | ||
------- | ||
|
@@ -1686,13 +1687,18 @@ def __array__( | |
>>> np.asarray(cat) | ||
array(['a', 'b'], dtype=object) | ||
""" | ||
if copy is False: | ||
raise ValueError( | ||
"Unable to avoid copy while creating an array as requested." | ||
) | ||
|
||
ret = take_nd(self.categories._values, self._codes) | ||
if dtype and np.dtype(dtype) != self.categories.dtype: | ||
return np.asarray(ret, dtype) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I did not understand why this is needed. If dtypes match, NumPy should make it a no-op? If dtype is None, it is the same as not passing. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, also not sure why this is here |
||
# When we're a Categorical[ExtensionArray], like Interval, | ||
# we need to ensure __array__ gets all the way to an | ||
# ndarray. | ||
return np.asarray(ret) | ||
|
||
# `take_nd` should already make a copy, so don't force again. | ||
return np.asarray(ret, dtype=dtype) | ||
|
||
def __array_ufunc__(self, ufunc: np.ufunc, method: str, *inputs, **kwargs): | ||
# for binary ops, use our custom dunder methods | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -842,7 +842,7 @@ def __array__( | |
the dtype is inferred from the data. | ||
|
||
copy : bool or None, optional | ||
Unused. | ||
See :func:`numpy.asarray`. | ||
|
||
Returns | ||
------- | ||
|
@@ -879,8 +879,15 @@ def __array__( | |
dtype='datetime64[ns]') | ||
""" | ||
values = self._values | ||
arr = np.asarray(values, dtype=dtype) | ||
if astype_is_view(values.dtype, arr.dtype): | ||
if copy is None: | ||
# Note: branch avoids `copy=None` for NumPy 1.x support | ||
arr = np.asarray(values, dtype=dtype) | ||
else: | ||
arr = np.array(values, dtype=dtype, copy=copy) | ||
|
||
if copy is True: | ||
return arr | ||
if copy is False or astype_is_view(values.dtype, arr.dtype): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One more issue I noticed while doing the backport: for a similar case in generic.py we changed the Will try to come up with a test case that would fail because of this, and do a follow-up tomorrow. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks, ouch. This is clearly hard to get right without tests :/. The release snippet looks fine to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It also only matters for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking closer, I think this is right? Note that it says EDIT: Not sure if it's worth to skip the check though. It seems like it may be more confusing then anything... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, that's a good point. I assume the Maybe it's then in There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking once more at it, we actually already have tests for this, because we test that we get a read-only array using I expanded the test with |
||
arr = arr.view() | ||
arr.flags.writeable = False | ||
return arr | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
lib.is_range_indexer(self._codes)
i.e. theself.categories._values
are all unique then, a copy could be avoided?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably, but I left it out, it doesn't seem vital to try and improve it. Also fixed things up, because
take_nd
should presumably always return a copy.