-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: CategoricalDtype is not refresh after index categories set #46820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
First bad commit: 126a19d also cc @jbrockmendel |
@Yikun it seems to work with |
@samukweku Yes, I can make it work as workaround:
But I didn't need to do by this way before 1.4.0 |
you are trying to modify an immutable object - it's possible it accidentally worked before but in no way is this correct |
@jreback Would we consider supporting index categories |
@jreback @jbrockmendel @samukweku Any thoughts? Thank! |
we're actually looking at deprecating all the in-place category-setting behaviors xref #37643, so unless im misunderstanding what you're asking for, im -1 on adding this setter |
@jbrockmendel @jreback Thanks! And sorry for late reply. Should we consider deprecating this [1] https://github.com/pandas-dev/pandas/blob/main/pandas/core/arrays/categorical.py#L742-L743 |
### What changes were proposed in this pull request? Since pandas-dev/pandas@126a19d, pandas changes behavior. Before pandas 1.4, the pandas will refresh dtypes according to categories, since panda 1.4, `categories.setter` dtype refresh will not work. According to pandas-dev/pandas#46820 , the complete support of `categories.setter` will never back. And also only categories is refreshed (but dtype not) is useless behavior so we'd better to only fix test and keep current PS behavior, then remove this setter support when we remove all deprecated methods. ### Why are the changes needed? Make CI passed with pandas 1.4.x ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? test_categories_setter passed with 1.3.X and also 1.4.x Closes #36355 from Yikun/SPARK-38982. Authored-by: Yikun Jiang <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
yes. it appears that the index is updated (which should probably raise) but the dtype attribute is not updated.
what code sample was used and what was the behavior change? I'm seeing the same behavior with the snippet above back to pandas 1.0.5 in pandas 0.25.3, it raised |
import pandas as pd
import pandas._testing as tm
import numpy as np
pdf = pd.DataFrame(
{
"a": pd.Categorical([1, 2, 3, 1, 2, 3]),
},
index=pd.Categorical([10, 20, 30, 20, 30, 10], categories=[30, 10, 20], ordered=True),
)
pidx = pdf.index
pidx.categories = ["z", "y", "x"]
# Check `pidx.dtype.categories` is refreshed or not
tm.assert_index_equal(pidx.dtype.categories, pidx.categories)
tm.assert_numpy_array_equal(pidx.dtype.categories._data, pidx.categories._data) |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
categories set failed
Expected Behavior
(also a behavior before 1.4.x)
Installed Versions
1.4.0+
The text was updated successfully, but these errors were encountered: