-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
[ArrayManager] Array version of putmask logic #44396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
0e612c2
51a932e
a9f520a
4193142
ad113c1
ee29fed
3c9a8a7
2a4e23e
cf34860
7f3a9a7
78017a0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,114 @@ | ||
""" | ||
Wrappers around array_algos with internals-specific logic | ||
""" | ||
from __future__ import annotations | ||
|
||
import numpy as np | ||
|
||
from pandas.core.dtypes.cast import ( | ||
can_hold_element, | ||
find_common_type, | ||
infer_dtype_from, | ||
) | ||
from pandas.core.dtypes.common import is_interval_dtype | ||
from pandas.core.dtypes.generic import ( | ||
ABCDataFrame, | ||
ABCIndex, | ||
ABCSeries, | ||
) | ||
from pandas.core.dtypes.missing import ( | ||
is_valid_na_for_dtype, | ||
na_value_for_dtype, | ||
) | ||
|
||
from pandas.core.array_algos.putmask import ( | ||
extract_bool_array, | ||
putmask_smart, | ||
putmask_without_repeat, | ||
setitem_datetimelike_compat, | ||
validate_putmask, | ||
) | ||
from pandas.core.arrays import ExtensionArray | ||
from pandas.core.arrays._mixins import NDArrayBackedExtensionArray | ||
|
||
|
||
def putmask_flexible(array: np.ndarray | ExtensionArray, mask, new): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. where is all of this logic from? this doesn't look like 'simple' wrappers to me. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's logic that for the BlockManager lives on the blocks (the Block.putmask version also calls those There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. great can u then remove from where it's used now There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The BlockManager is still used .. I can try to see if there are some more parts that could be moved into There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, so I could easily remove most of the logic of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. let me put this another way. you are adding a ton of code to methods, is this generally useful? can we reuse elsewhere? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
No, it is only for use within |
||
""" | ||
Putmask implementation for ArrayManager.putmask. | ||
|
||
Flexible version that will upcast if needed. | ||
""" | ||
if isinstance(array, np.ndarray): | ||
return putmask_flexible_ndarray(array, mask=mask, new=new) | ||
else: | ||
return putmask_flexible_ea(array, mask=mask, new=new) | ||
|
||
|
||
def putmask_flexible_ndarray(array: np.ndarray, mask, new): | ||
""" | ||
Putmask implementation for ArrayManager putmask for ndarray. | ||
|
||
Flexible version that will upcast if needed. | ||
""" | ||
mask, noop = validate_putmask(array, mask) | ||
assert not isinstance(new, (ABCIndex, ABCSeries, ABCDataFrame)) | ||
|
||
# if we are passed a scalar None, convert it here | ||
if not array.dtype == "object" and is_valid_na_for_dtype(new, array.dtype): | ||
new = na_value_for_dtype(array.dtype, compat=False) | ||
|
||
if can_hold_element(array, new): | ||
putmask_without_repeat(array, mask, new) | ||
return array | ||
|
||
elif noop: | ||
return array | ||
|
||
dtype, _ = infer_dtype_from(new) | ||
if dtype.kind in ["m", "M"]: | ||
array = array.astype(object) | ||
# convert to list to avoid numpy coercing datetimelikes to integers | ||
new = setitem_datetimelike_compat(array, mask.sum(), new) | ||
# putmask_smart below converts it back to array | ||
np.putmask(array, mask, new) | ||
return array | ||
|
||
new_values = putmask_smart(array, mask, new) | ||
return new_values | ||
|
||
|
||
def _coerce_to_target_dtype(array, new): | ||
dtype, _ = infer_dtype_from(new, pandas_dtype=True) | ||
new_dtype = find_common_type([array.dtype, dtype]) | ||
return array.astype(new_dtype, copy=False) | ||
|
||
|
||
def putmask_flexible_ea(array: ExtensionArray, mask, new): | ||
""" | ||
Putmask implementation for ArrayManager putmask for EA. | ||
|
||
Flexible version that will upcast if needed. | ||
""" | ||
mask = extract_bool_array(mask) | ||
|
||
if isinstance(array, NDArrayBackedExtensionArray): | ||
if not can_hold_element(array, new): | ||
array = _coerce_to_target_dtype(array, new) | ||
return putmask_flexible(array, mask, new) | ||
|
||
try: | ||
array._putmask(mask, new) | ||
except TypeError: | ||
if not is_interval_dtype(array.dtype): | ||
# Discussion about what we want to support in the general | ||
# case GH#39584 | ||
raise | ||
|
||
array = _coerce_to_target_dtype(array, new) | ||
if array.dtype == np.dtype("object"): | ||
# For now at least, only support casting e.g. | ||
# Interval[int64]->Interval[float64], | ||
raise | ||
return putmask_flexible(array, mask, new) | ||
|
||
return array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not feasible to go through 'apply'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. So I forgot that
ArrayManager.apply
indeed already has this "align" logic as well. However, that is not used at the moment (onlyputmask
andwhere
define align keys, and those both still go throughapply_with_block
at the moment; this is also confirmed by codecov that it is unused).But then I would rather remove it from
apply
, instead of going throughapply
, which will also prevent duplication. The reason for doing it in putmask: 1)apply
is already quite complex (eg also dealing with ignoring failures), so being able to remove the "alignment" handling there would be nice (can still share this later withwhere
), and 2)putmask
can update the existing arrays, whileapply
always constructs a new manager, and 3) for the CoW branch I need to add copy-on-write logic toputmask
, which is not needed forapply
in general.