Skip to content

BUG: pd.concat(..., axis="columns") inconsistently keeps/drops index name #37464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2 of 3 tasks
DanielFEvans opened this issue Oct 28, 2020 · 3 comments
Closed
2 of 3 tasks
Labels
Duplicate Report Duplicate issue or pull request

Comments

@DanielFEvans
Copy link
Contributor

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd

df1 = pd.DataFrame({"IndexColumn": [1, 2, 3], "DataColumn1": ["x", "y", "z"]})
df2 = pd.DataFrame({"IndexColumn": [4, 1, 3], "DataColumn2": ["j", "k", "l"]})

df3 = pd.DataFrame({"IndexColumn": ["a", "b", "c"], "DataColumn1": ["x", "y", "z"]})
df4 = pd.DataFrame({"IndexColumn": ["d", "a", "c"], "DataColumn2": ["j", "k", "l"]})

df1_2_merged = pd.concat([df1.set_index("IndexColumn"), df2.set_index("IndexColumn")], axis="columns")
print("Integer index retained name:", df1_2_merged.index.name == "IndexColumn")
# True

df3_4_merged = pd.concat([df3.set_index("IndexColumn"), df4.set_index("IndexColumn")], axis="columns")
print("String index retains name:", df3_4_merged.index.name == "IndexColumn")
# False

Problem description

pd.concat([df, other_df], axis="columns") with two dataframes with identical index names sometimes retains the index name, but at other times drops it. The index values themselves are kept, and the behaviour is otherwise as expected.

It seems logical to always keep the index name if it is the same between both, rather than having inconsistent behaviour.

This may be the same as #35847. However, that discusses Index.union(), and taking the union of the df3 and df4 indices directly results in the name being propagated to the output, unlike the example above:

>>> df3_index = df3.set_index("IndexColumn").index
>>> df4_index = df4.set_index("IndexColumn").index
>>> df3_index.union(df4_index)
Index(['a', 'b', 'c', 'd'], dtype='object', name='IndexColumn')

(I've not been able to test the master branch as it appears to now require Python 3.7, and I haven't got a Py3.7 environment handy currently)

Expected Output

In the code example above, I'd expect both to output "True", i.e. that the index name is copied through to the output.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : db08276
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 3.10.0-1127.19.1.el7.x86_64
Version : #1 SMP Thu Aug 20 14:39:03 CDT 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_GB.UTF-8
LOCALE : en_GB.UTF-8

pandas : 1.1.3
numpy : 1.18.1
pytz : 2020.1
dateutil : 2.8.1
pip : 20.2.4
setuptools : 45.2.0
Cython : 0.29.21
pytest : 5.4.3
hypothesis : 5.16.0
sphinx : 2.4.0
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext)
jinja2 : 2.11.2
IPython : None
pandas_datareader: None
bs4 : 4.9.3
bottleneck : 1.3.2
fsspec : 0.6.2
fastparquet : None
gcsfs : None
matplotlib : 3.1.2
numexpr : 2.7.1
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.2.1
sqlalchemy : None
tables : 3.5.2
tabulate : 0.8.6
xarray : None
xlrd : 1.2.0
xlwt : None
numba : 0.51.0

@DanielFEvans DanielFEvans added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 28, 2020
@jreback
Copy link
Contributor

jreback commented Oct 28, 2020

this is likely fixed in master

@simonjayhawkins
Copy link
Member

this is likely fixed in master

yep. closing as duplicate of #35847. no additional tests required.

@simonjayhawkins simonjayhawkins added Duplicate Report Duplicate issue or pull request and removed Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 28, 2020
@DanielFEvans
Copy link
Contributor Author

Thanks both - will keep an eye out for Pandas 1.2 (and get on with upgrading my Python environments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request
Projects
None yet
Development

No branches or pull requests

3 participants