-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
FIX: REGR: setting numeric value in Categorical Series with enlargement raise internal error #47751
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 3 commits
f520eae
4edc956
a0e2934
5acbf42
59538cd
efdcf1c
83f75ce
7bfb9b7
d25f7eb
d15ce50
c3a7109
476fe52
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -646,6 +646,12 @@ def _maybe_promote(dtype: np.dtype, fill_value=np.nan): | |
|
||
return np.dtype("object"), fill_value | ||
|
||
elif isinstance(dtype, CategoricalDtype): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could you move this before the isna check and add the Categorical handling of nan values into this block? Keeps the Categorical specific code together Ohterwise this looks good |
||
if fill_value in dtype.categories: | ||
return "category", fill_value | ||
else: | ||
return object, ensure_object(fill_value) | ||
|
||
elif is_float(fill_value): | ||
if issubclass(dtype.type, np.bool_): | ||
dtype = np.dtype(np.object_) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,6 +41,7 @@ | |
is_sequence, | ||
) | ||
from pandas.core.dtypes.concat import concat_compat | ||
from pandas.core.dtypes.dtypes import CategoricalDtype | ||
from pandas.core.dtypes.generic import ( | ||
ABCDataFrame, | ||
ABCSeries, | ||
|
@@ -2119,9 +2120,16 @@ def _setitem_with_indexer_missing(self, indexer, value): | |
new_values = Series([value], dtype=new_dtype)._values | ||
|
||
if len(self.obj._values): | ||
# GH#47677 handle enlarging with a scalar as a special case | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is not the right place for this. You have to create new_values correctly instead of avoiding concat_compat There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I do, here is the value of (Pdb++) new_values
['a']
Categories (1, object): ['a'] The point of avoiding concat_compat is, that in there we explicitly check if the dtypes are the same, and because the categories aren't the same the concatenated series is cast to object. If we'd change that then this will also have an effect on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No, this is not what I was talking about. But lets start over: If we want to handle this in maybe_promote, we should return the dtype we get as input, not the string If we decide against handling this in maybe_promote, we have to handle this earlier, e.g. in line 2120 at the latest, not here. tolist() is never a good idea, if you want to keep the dtype, this looses all precision information, e.g.
this is converted into CategoricalDtype with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks a lot. Of course you're right, just returning the given dtype works fine. I updated the PR, handling this in |
||
if ( | ||
isinstance(self.obj.dtype, CategoricalDtype) | ||
and new_dtype == "category" | ||
): | ||
new_values = Series(self.obj.tolist() + [value], dtype="category") | ||
# GH#22717 handle casting compatibility that np.concatenate | ||
# does incorrectly | ||
new_values = concat_compat([self.obj._values, new_values]) | ||
else: | ||
new_values = concat_compat([self.obj._values, new_values]) | ||
self.obj._mgr = self.obj._constructor( | ||
new_values, index=new_index, name=self.obj.name | ||
)._mgr | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
release note not needed if only a fix for issue on main. If changing behavior from 1.4.3, as suggested in #47751 (comment) then will need a release note covering just that change.