BUG: `pd.concat(..., axis="columns")` inconsistently keeps/drops index name #37464

DanielFEvans · 2020-10-28T10:06:04Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

Code Sample, a copy-pastable example

import pandas as pd

df1 = pd.DataFrame({"IndexColumn": [1, 2, 3], "DataColumn1": ["x", "y", "z"]})
df2 = pd.DataFrame({"IndexColumn": [4, 1, 3], "DataColumn2": ["j", "k", "l"]})

df3 = pd.DataFrame({"IndexColumn": ["a", "b", "c"], "DataColumn1": ["x", "y", "z"]})
df4 = pd.DataFrame({"IndexColumn": ["d", "a", "c"], "DataColumn2": ["j", "k", "l"]})

df1_2_merged = pd.concat([df1.set_index("IndexColumn"), df2.set_index("IndexColumn")], axis="columns")
print("Integer index retained name:", df1_2_merged.index.name == "IndexColumn")
# True

df3_4_merged = pd.concat([df3.set_index("IndexColumn"), df4.set_index("IndexColumn")], axis="columns")
print("String index retains name:", df3_4_merged.index.name == "IndexColumn")
# False

Problem description

pd.concat([df, other_df], axis="columns") with two dataframes with identical index names sometimes retains the index name, but at other times drops it. The index values themselves are kept, and the behaviour is otherwise as expected.

It seems logical to always keep the index name if it is the same between both, rather than having inconsistent behaviour.

This may be the same as #35847. However, that discusses Index.union(), and taking the union of the df3 and df4 indices directly results in the name being propagated to the output, unlike the example above:

>>> df3_index = df3.set_index("IndexColumn").index
>>> df4_index = df4.set_index("IndexColumn").index
>>> df3_index.union(df4_index)
Index(['a', 'b', 'c', 'd'], dtype='object', name='IndexColumn')

(I've not been able to test the master branch as it appears to now require Python 3.7, and I haven't got a Py3.7 environment handy currently)

Expected Output

In the code example above, I'd expect both to output "True", i.e. that the index name is copied through to the output.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : db08276
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1127.19.1.el7.x86_64
Version : #1 SMP Thu Aug 20 14:39:03 CDT 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.1.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 45.2.0
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.16.0
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext)
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.6.2
fastparquet : None
gcsfs : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : None
tables : 3.5.2
tabulate : 0.8.6
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.0

The text was updated successfully, but these errors were encountered:

jreback · 2020-10-28T10:35:14Z

this is likely fixed in master

simonjayhawkins · 2020-10-28T16:48:46Z

this is likely fixed in master

yep. closing as duplicate of #35847. no additional tests required.

DanielFEvans · 2020-10-28T16:50:46Z

Thanks both - will keep an eye out for Pandas 1.2 (and get on with upgrading my Python environments).

DanielFEvans added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 28, 2020

simonjayhawkins closed this as completed Oct 28, 2020

simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: `pd.concat(..., axis="columns")` inconsistently keeps/drops index name #37464

BUG: `pd.concat(..., axis="columns")` inconsistently keeps/drops index name #37464

DanielFEvans commented Oct 28, 2020

INSTALLED VERSIONS

jreback commented Oct 28, 2020

simonjayhawkins commented Oct 28, 2020

DanielFEvans commented Oct 28, 2020

BUG: pd.concat(..., axis="columns") inconsistently keeps/drops index name #37464

BUG: pd.concat(..., axis="columns") inconsistently keeps/drops index name #37464

Comments

DanielFEvans commented Oct 28, 2020

Code Sample, a copy-pastable example

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

jreback commented Oct 28, 2020

simonjayhawkins commented Oct 28, 2020

DanielFEvans commented Oct 28, 2020

BUG: `pd.concat(..., axis="columns")` inconsistently keeps/drops index name #37464

BUG: `pd.concat(..., axis="columns")` inconsistently keeps/drops index name #37464

Output of `pd.show_versions()`