-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: ExtensionArray.fillna #19909
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: ExtensionArray.fillna #19909
Changes from 5 commits
74614cb
67a19ba
280ac94
6705712
69a0f9b
4d6846b
f3b81dc
70efbb8
8fc3851
39096e5
9e44f88
e902f18
17126e6
1106ef2
744c381
9a3fa55
2bc43bc
f22fd7b
c26d261
b342efe
a609f48
a35f93c
3f78dec
1160e15
c9c7a48
05fced6
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -216,6 +216,72 @@ def isna(self): | |
""" | ||
raise AbstractMethodError(self) | ||
|
||
def tolist(self): | ||
# type: () -> list | ||
"""Convert the array to a list of scalars.""" | ||
return list(self) | ||
|
||
def fillna(self, value=None, method=None, limit=None): | ||
""" Fill NA/NaN values using the specified method. | ||
|
||
Parameters | ||
---------- | ||
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None | ||
Method to use for filling holes in reindexed Series | ||
pad / ffill: propagate last valid observation forward to next valid | ||
backfill / bfill: use NEXT valid observation to fill gap | ||
value : scalar, array-like | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value before method ? (order of signature) |
||
If a scalar value is passed it is used to fill all missing values. | ||
Alternatively, an array-like 'value' can be given. It's expected | ||
that the array-like have the same length as 'self'. | ||
limit : int, default None | ||
(Not implemented yet for ExtensionArray!) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this can now maybe be passed to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, I had to change things up slightly to use integers and the datetime fillna methods if you want to take a look. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Categorical.fillna can probably do soemthing similar, haven't looked. |
||
If method is specified, this is the maximum number of consecutive | ||
NaN values to forward/backward fill. In other words, if there is | ||
a gap with more than this number of consecutive NaNs, it will only | ||
be partially filled. If method is not specified, this is the | ||
maximum number of entries along the entire axis where NaNs will be | ||
filled. | ||
|
||
Returns | ||
------- | ||
filled : ExtensionArray with NA/NaN filled | ||
""" | ||
from pandas.api.types import is_scalar | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A I'll see what I can do to simplify this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How would that be different with |
||
from pandas.util._validators import validate_fillna_kwargs | ||
from pandas.core.missing import pad_1d, backfill_1d | ||
|
||
value, method = validate_fillna_kwargs(value, method) | ||
|
||
mask = self.isna() | ||
|
||
if not is_scalar(value): | ||
if len(value) != len(self): | ||
raise ValueError("Length of 'value' does not match. Got ({}) " | ||
" expected {}".format(len(value), len(self))) | ||
value = value[mask] | ||
|
||
if limit is not None: | ||
msg = ("Specifying 'limit' for 'fillna' has not been implemented " | ||
"yet for {} typed data".format(self.dtype)) | ||
raise NotImplementedError(msg) | ||
|
||
if mask.any(): | ||
if method is not None: | ||
# ffill / bfill | ||
func = pad_1d if method == 'pad' else backfill_1d | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, by |
||
idx = np.arange(len(self), dtype=np.float64) | ||
idx[mask] = np.nan | ||
idx = func(idx, mask=mask).astype(np.int64) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. probably needs to be np.intp (or an ensure_platform_int) for windows? |
||
new_values = self.take(idx) | ||
else: | ||
# fill with value | ||
new_values = self.copy() | ||
new_values[mask] = value | ||
else: | ||
new_values = self.copy() | ||
return type(self)(new_values) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wait on #19906 before merging this, so I can update this line. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this can now just be |
||
|
||
# ------------------------------------------------------------------------ | ||
# Indexing methods | ||
# ------------------------------------------------------------------------ | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1963,6 +1963,23 @@ def concat_same_type(self, to_concat, placement=None): | |
return self.make_block_same_class(values, ndim=self.ndim, | ||
placement=placement) | ||
|
||
def fillna(self, value, limit=None, inplace=False, downcast=None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was moved from Categorical.fillna with changes 1.) Removed check for limit, since that's done in |
||
mgr=None): | ||
values = self.values if inplace else self.values.copy() | ||
values = values.fillna(value=value, limit=limit) | ||
return [self.make_block_same_class(values=values, | ||
placement=self.mgr_locs, | ||
ndim=self.ndim)] | ||
|
||
def interpolate(self, method='pad', axis=0, inplace=False, limit=None, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This was just a move from |
||
fill_value=None, **kwargs): | ||
|
||
values = self.values if inplace else self.values.copy() | ||
return self.make_block_same_class( | ||
values=values.fillna(value=fill_value, method=method, | ||
limit=limit), | ||
placement=self.mgr_locs) | ||
|
||
|
||
class NumericBlock(Block): | ||
__slots__ = () | ||
|
@@ -2522,27 +2539,6 @@ def _try_coerce_result(self, result): | |
|
||
return result | ||
|
||
def fillna(self, value, limit=None, inplace=False, downcast=None, | ||
mgr=None): | ||
# we may need to upcast our fill to match our dtype | ||
if limit is not None: | ||
raise NotImplementedError("specifying a limit for 'fillna' has " | ||
"not been implemented yet") | ||
|
||
values = self.values if inplace else self.values.copy() | ||
values = self._try_coerce_result(values.fillna(value=value, | ||
limit=limit)) | ||
return [self.make_block(values=values)] | ||
|
||
def interpolate(self, method='pad', axis=0, inplace=False, limit=None, | ||
fill_value=None, **kwargs): | ||
|
||
values = self.values if inplace else self.values.copy() | ||
return self.make_block_same_class( | ||
values=values.fillna(fill_value=fill_value, method=method, | ||
limit=limit), | ||
placement=self.mgr_locs) | ||
|
||
def shift(self, periods, axis=0, mgr=None): | ||
return self.make_block_same_class(values=self.values.shift(periods), | ||
placement=self.mgr_locs) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -69,7 +69,10 @@ def test_getitem_scalar(self): | |
|
||
|
||
class TestMissing(base.BaseMissingTests): | ||
pass | ||
|
||
@pytest.mark.skip(reason="Backwards compatability") | ||
def test_fillna_limit_raises(self): | ||
"""Has a different error message.""" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can just up the error message for |
||
|
||
|
||
class TestMethods(base.BaseMethodsTests): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this needed for fillna?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indirectly, via the default
fillna
.Series.__iter__
callsSeries.tolist
, which callsSeries._values.tolist
. We could modify that to check the type and just calllist(self._values)
for EAs. I don't have a strong preference vs. adding a default.tolist
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or directly call
values.__iter__
? As iterating through values to make a list and then iterate again through that list sounds like a bit of overhead.For
.tolist()
itself, it is of course consistent with Series and numpy arrays, but personally I am not sure what its value is compared tolist(values)
. I don't think you typically can implement a more efficient conversion to lists than whatlist(values)
will do by default? (i.e. iterating through the values)