-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Accept dict or Series in fillna for categorical Series #18293
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
46de7e9
3024118
2ef5444
5780c43
2f4be6d
a69e696
572d246
2dd2d4b
8f8f316
6ffec6c
c484f49
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1623,8 +1623,12 @@ def fillna(self, value=None, method=None, limit=None): | |
Method to use for filling holes in reindexed Series | ||
pad / ffill: propagate last valid observation forward to next valid | ||
backfill / bfill: use NEXT valid observation to fill gap | ||
value : scalar | ||
Value to use to fill holes (e.g. 0) | ||
value : scalar, dict, Series | ||
If a scalar value is passed it is used to fill all missing values. | ||
Alternatively, a Series or dict can be used to fill in different | ||
values for each index. The value should not be a list. The | ||
value(s) passed should either be in the categories or should be | ||
NaN. | ||
limit : int, default None | ||
(Not implemented yet for Categorical!) | ||
If method is specified, this is the maximum number of consecutive | ||
|
@@ -1665,16 +1669,33 @@ def fillna(self, value=None, method=None, limit=None): | |
|
||
else: | ||
|
||
if not isna(value) and value not in self.categories: | ||
raise ValueError("fill value must be in categories") | ||
# If value is a dict or a Series (a dict value has already | ||
# been converted to a Series) | ||
if isinstance(value, ABCSeries): | ||
if not value[~value.isin(self.categories)].isna().all(): | ||
raise ValueError("fill value must be in categories") | ||
|
||
values_codes = _get_codes_for_values(value, self.categories) | ||
indexer = np.where(values_codes != -1) | ||
values[indexer] = values_codes[values_codes != -1] | ||
|
||
# If value is not a dict or Series it should be a scalar | ||
elif is_scalar(value): | ||
if not isna(value) and value not in self.categories: | ||
raise ValueError("fill value must be in categories") | ||
|
||
mask = values == -1 | ||
if mask.any(): | ||
values = values.copy() | ||
if isna(value): | ||
values[mask] = -1 | ||
else: | ||
values[mask] = self.categories.get_loc(value) | ||
|
||
mask = values == -1 | ||
if mask.any(): | ||
values = values.copy() | ||
if isna(value): | ||
values[mask] = -1 | ||
else: | ||
values[mask] = self.categories.get_loc(value) | ||
else: | ||
raise TypeError('"value" parameter must be a scalar, dict ' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do we have testing to hit this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is that I can't think of an example where we would actually hit this specific line. If the user passed a list, tuple, or DataFrame as a value in I have included tests in |
||
'or Series, but you passed a ' | ||
'"{0}"'.format(type(value).__name__)) | ||
|
||
return self._constructor(values, categories=self.categories, | ||
ordered=self.ordered, fastpath=True) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4304,8 +4304,9 @@ def fillna(self, value=None, method=None, axis=None, inplace=False, | |
elif not is_list_like(value): | ||
pass | ||
else: | ||
raise ValueError("invalid fill value with a %s" % | ||
type(value)) | ||
raise TypeError('"value" parameter must be a scalar, dict ' | ||
'or Series, but you passed a ' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ideally we would have a parametriezed test that hits this (with multiple invalid things that should raise) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, as I said above, I have included tests in |
||
'"{0}"'.format(type(value).__name__)) | ||
|
||
new_data = self._data.fillna(value=value, limit=limit, | ||
inplace=inplace, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can simplify this to
values[mask] = self.categories.get_indexer([value])[0]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_indexer
seems to cause problems withPeriodIndex
. When I replace the code with your suggestion:There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, this my be a known bug