df.append should retain columns type if same type #18359

topper-123 · 2017-11-18T20:49:13Z

Currently df.append loses columns index type, if the columns is a CategoricalIndex:

>>> idx = pd.CategoricalIndex('a b'.split())
>>> df = pd.DataFrame([[1, 2]], columns=idx)
>>> ser = pd.Series([3, 4], index=idx, name=1)
>>> df.append(ser).columns
Index(['a', 'b'], dtype='object')

df.append(ser).columns should return a CategoricalIndex equal to idx.

pandas 0.21 has the new CategoricalDtype, so it's now easy to compare CategoricalIndex instances for strict type equality. Hence this issue should be much easier to solve than previously.

Solution proposal

In frame.py::DataFrame.append there is this line:

combined_columns = self.columns.tolist() + self.columns.union(
                    other.index).difference(self.columns).tolist()

This line converts CategoricalIndex columns to normal indexes. So by making some checks for types and dtypes it should be easy return the correct index. So if the above would be something like this instead:

same_types = type(self.columns) == type(other.index)
same_dtypes = self.columns.dtype == other.index.dtype
if same_types and same_dtypes:
    combined_columns = self.columns.union(other.index)
else:
    combined_columns = self.columns.tolist() + self.columns.union(
        other.index).difference(self.columns).tolist()

and I think this issue can be solved (haven't checked yet all details, maybe some adjustments have to be made). I'd appreciate comments if this approach is ok.

The text was updated successfully, but these errors were encountered:

jreback · 2017-11-19T16:35:21Z

yeah this is kind of messy. this should all use Index.append then none of this is an issue. we shouldn't be using .tolist() at all.

topper-123 · 2017-11-28T21:33:48Z

Hi, I've started to look into this.

ATM it seems like .union actually is very robust, and I'm leaning towards that simply combined_columns = self.columns.union(other.index) is possible, but I wonder why you pointed to Index.append. Did you mean Index.Union?

jreback · 2017-11-28T23:09:10Z

no i meant append; you need to append the union of differences (i think this is the symmetric_didferenev)

topper-123 · 2017-11-29T16:14:05Z

symmetric_difference doesn't work:

>>> d = pd.api.types.CategoricalDtype('A B C'.split())
>>> c1 = pd.CategoricalIndex('A B'.split(), dtype=d)
>>> c2 = pd.CategoricalIndex('B C'.split(), dtype=d)
>>> c1.symmetric_difference(c2)
Index(['A', 'C'], dtype='object')  # notice index type and also values are not good to be appended

Just difference is good:

>>> c2.append(c1.difference(c2))
CategoricalIndex(['B', 'C', 'A'], categories=['A', 'B', 'C'], ordered=False, dtype='category')

Which gives the same result (in this case, maybe generally) as union:

>>> c2.union(c1)
CategoricalIndex(['B', 'C', 'A'], categories=['A', 'B', 'C'], ordered=False, dtype='category')

avnishbm · 2021-03-12T14:28:16Z

This issue still exists with dataframe.append, it happens to change the index type to pandas.core.indexes.base.Index, even though both the dataframes being merged had index type as pandas.core.indexes.datetimes.DatetimeIndex! Can this be looked into please?

Have also tried using inplace=True for the append call but it was of no use!

jreback · 2021-03-12T14:45:25Z

@avnishbm this is a very old issue

if you think u have a bug then pls show a reproducible example in a new issue testing against the latest released version

avnishbm · 2021-03-14T21:32:02Z

@jreback have raised a new issue providing the details: #40435

topper-123 changed the title ~~df.append should retain columns type~~ df.append should retain columns type if same type Nov 18, 2017

jreback added Compat pandas objects compatability with Numpy or Python functions Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Nov 19, 2017

jreback added this to the Next Major Release milestone Nov 19, 2017

jreback added Difficulty Intermediate labels Nov 19, 2017

jreback closed this as completed Nov 28, 2017

jreback reopened this Nov 28, 2017

topper-123 mentioned this issue Dec 31, 2017

ENH: DataFrame.append preserves columns dtype if possible #19021

Merged

4 tasks

jreback modified the milestones: Next Major Release, 0.23.0 Jan 14, 2018

jreback modified the milestones: 0.23.0, Next Major Release Apr 14, 2018

jreback closed this as completed in #19021 Apr 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

df.append should retain columns type if same type #18359

df.append should retain columns type if same type #18359

topper-123 commented Nov 18, 2017 •

edited

Loading

jreback commented Nov 19, 2017

topper-123 commented Nov 28, 2017

jreback commented Nov 28, 2017

topper-123 commented Nov 29, 2017 •

edited

Loading

avnishbm commented Mar 12, 2021

jreback commented Mar 12, 2021

avnishbm commented Mar 14, 2021

df.append should retain columns type if same type #18359

df.append should retain columns type if same type #18359

Comments

topper-123 commented Nov 18, 2017 • edited Loading

Solution proposal

jreback commented Nov 19, 2017

topper-123 commented Nov 28, 2017

jreback commented Nov 28, 2017

topper-123 commented Nov 29, 2017 • edited Loading

avnishbm commented Mar 12, 2021

jreback commented Mar 12, 2021

avnishbm commented Mar 14, 2021

topper-123 commented Nov 18, 2017 •

edited

Loading

topper-123 commented Nov 29, 2017 •

edited

Loading