-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
df.append should retain columns type if same type #18359
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
yeah this is kind of messy. this should all use |
Hi, I've started to look into this. ATM it seems like |
no i meant append; you need to append the union of differences (i think this is the symmetric_didferenev) |
>>> d = pd.api.types.CategoricalDtype('A B C'.split())
>>> c1 = pd.CategoricalIndex('A B'.split(), dtype=d)
>>> c2 = pd.CategoricalIndex('B C'.split(), dtype=d)
>>> c1.symmetric_difference(c2)
Index(['A', 'C'], dtype='object') # notice index type and also values are not good to be appended Just >>> c2.append(c1.difference(c2))
CategoricalIndex(['B', 'C', 'A'], categories=['A', 'B', 'C'], ordered=False, dtype='category') Which gives the same result (in this case, maybe generally) as >>> c2.union(c1)
CategoricalIndex(['B', 'C', 'A'], categories=['A', 'B', 'C'], ordered=False, dtype='category') |
This issue still exists with dataframe.append, it happens to change the index type to pandas.core.indexes.base.Index, even though both the dataframes being merged had index type as pandas.core.indexes.datetimes.DatetimeIndex! Can this be looked into please? Have also tried using inplace=True for the append call but it was of no use! |
@avnishbm this is a very old issue if you think u have a bug then pls show a reproducible example in a new issue testing against the latest released version |
Currently
df.append
loses columns index type, if the columns is aCategoricalIndex
:df.append(ser).columns
should return aCategoricalIndex
equal toidx
.pandas 0.21 has the new
CategoricalDtype
, so it's now easy to compareCategoricalIndex
instances for strict type equality. Hence this issue should be much easier to solve than previously.Solution proposal
In
frame.py::DataFrame.append
there is this line:This line converts CategoricalIndex columns to normal indexes. So by making some checks for types and dtypes it should be easy return the correct index. So if the above would be something like this instead:
and I think this issue can be solved (haven't checked yet all details, maybe some adjustments have to be made). I'd appreciate comments if this approach is ok.
The text was updated successfully, but these errors were encountered: