-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
reset_index() doesn't work with CategoricalIndex columns #19136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think that's my preference, but I don't feel strongly about it. I'd just rather not lose all the categorical information. We could give an example like grouped.rename(columns=str).reset_index().head() |
Hi, I'd like to take this up |
@maykulkarni : Go for it! No need to ask for permission if no one has said they are working on it. |
Actually, CategoricalIndex columns break other things as well, like assigning a new column.
Maybe Error text should be more meaningful? |
Hi, I ran into a similar problem, but I get a different error message. This seems to happens with a
This looks good, but
thows the following error
|
@Gijs-Koot def reset_index(df):
'''Returns DataFrame with index as columns'''
index_df = df.index.to_frame(index=False)
df = df.reset_index(drop=True)
# In merge is important the order in which you pass the dataframes
# if the index contains a Categorical.
# pd.merge(df, index_df, left_index=True, right_index=True) does not work
return pd.merge(index_df, df, left_index=True, right_index=True) It works and keeps the categorical type: problem
solved = reset_index(problem)
solved
solved.info()
|
The same error occurs when using
|
@TomAugspurger I'm not sure this is a "good first issue" with "Effort: Low". I thought I would fix it by changing the error message, but in the various examples given above, the error messages are different, and the paths that raise the Fundamentally, I have to wonder if we should allow a This needs some discussion. I ran into this problem in some work that I'm doing, and now I have a workaround by doing Your thoughts? |
So I did a little more investigation, and I think the fundamental problem is that Note that when the columns are backed by a So I see four possible solutions:
IMHO, (1) is easiest, and I think it makes the most sense. With respect to (3), it seems it would be a bit of work to figure out all the places that new columns are created. @TomAugspurger Need your opinion on this. |
I think my opinion has changed since writing this. There are two issues in the original post. First, there's a bug in Second, there's the issue of what to do for Simple test case: In [21]: pd.DataFrame(columns=pd.Categorical(['a', 'b'])).reset_index() ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-21-5bd5a62d9f80> in <module>()
----> 1 pd.DataFrame(columns=pd.Categorical(['a', 'b'])).reset_index()
~/sandbox/pandas/pandas/core/frame.py in reset_index(self, level, drop, inplace, col_level, col_fill)
4274 # to ndarray and maybe infer different dtype
4275 level_values = _maybe_casted_values(lev, lab)
-> 4276 new_obj.insert(0, name, level_values)
4277
4278 new_obj.index = new_index
~/sandbox/pandas/pandas/core/frame.py in insert(self, loc, column, value, allow_duplicates)
3343 value = self._sanitize_column(column, value, broadcast=False)
3344 self._data.insert(loc, column, value,
-> 3345 allow_duplicates=allow_duplicates)
3346
3347 def assign(self, **kwargs):
~/sandbox/pandas/pandas/core/internals/managers.py in insert(self, loc, item, value, allow_duplicates)
1164
1165 # insert to the axis; this could possibly raise a TypeError
-> 1166 new_axis = self.items.insert(loc, item)
1167
1168 block = make_block(values=value, ndim=self.ndim,
~/sandbox/pandas/pandas/core/indexes/category.py in insert(self, loc, item)
792 code = self.categories.get_indexer([item])
793 if (code == -1) and not (is_scalar(item) and isna(item)):
--> 794 raise TypeError("cannot insert an item into a CategoricalIndex "
795 "that is not already an existing category")
796
TypeError: cannot insert an item into a CategoricalIndex that is not already an existing category
@Dr-Irv I don't think we should change the behavior of |
* Transform Pandas Columns CategoricalIndex in list #8420 pandas-dev/pandas#19136 * add test conversion DF with CategoricalIndex column to CDS * remove white spaces for code quality
I tried to convert the categoricalindex to index and it worked for me.
|
This looks fixed on master now. Could use a test
|
Code Sample, a copy-pastable example if possible
Problem description
reset_index() should work with dataframes that have any types of columns.
Expected Output
Two extra columns with multiindex content, in the example above - Year and Month.
Column's type changes to string? Or documentation should specify that reset_index() doesn't work with specific types of columns.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
pandas: 0.22.0
pytest: None
pip: 9.0.1
setuptools: 36.4.0
Cython: 0.26
numpy: 1.13.3
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.1.0
sphinx: None
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: None
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.1
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 0.9999999
sqlalchemy: 1.2.0
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: