Skip to content

BUG: Concat automatically sorts index when axis=1 #37937

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
marlenezw opened this issue Nov 18, 2020 · 3 comments · Fixed by #47685
Closed

BUG: Concat automatically sorts index when axis=1 #37937

marlenezw opened this issue Nov 18, 2020 · 3 comments · Fixed by #47685
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@marlenezw
Copy link

marlenezw commented Nov 18, 2020


Problem description

Hi! I've noticed that when concat'ing DataFrames, along axis=1, integer indexes are automatically sorted even when sort=False. This is not the case when axis = 1 and we have a stringIndex. This also diverges from when axis=0. This behaviour is a bit confusing and so I thought it might be a bug.

In the _unique_indices API, if the indexes are integers they are brought together using a union, result = result.union(other) that sorts the numbers by default. So the resulting index is sorted whether or not sort=True.
I think this can be fixed by changing the code to
result = result.union(other, sort=sort) or something along those lines.
I've added examples below. Thank you :D

Example of difference between stringIndex and intIndex behaviour

#This is what happens with string index (union of indexes ordered)
>>> p1 = pd.DataFrame(
...                 {"a": [1, 2, 3], "b": [4, 5, 6]}, index=["p", "q", "r"]
...             )
>>> p2 = pd.DataFrame(
...                 {"c": [7, 8, 9], "d": [10, 11, 12]}, index=["r", "p", "z"]
...             )
>>> pd.concat([p1,p2], axis=1)
     a    b    c     d
p  1.0  4.0  8.0  11.0
q  2.0  5.0  NaN   NaN
r  3.0  6.0  7.0  10.0
z  NaN  NaN  9.0  12.0
>>> pd.concat([p2,p1], axis=1)
     c     d    a    b
r  7.0  10.0  3.0  6.0
p  8.0  11.0  1.0  4.0
z  9.0  12.0  NaN  NaN
q  NaN   NaN  2.0  5.0

#This is what happens with an int index (sorted, regardless of order of input)

>>> p_int1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, index=[1,2,3])
>>> p_int2 = pd.DataFrame({"c": [7, 8, 9], "d": [10, 11, 12]}, index=[3,1,6])
>>> pd.concat([p_int1,p_int2], axis=1)
     a    b    c     d
1  1.0  4.0  8.0  11.0
2  2.0  5.0  NaN   NaN
3  3.0  6.0  7.0  10.0
6  NaN  NaN  9.0  12.0
>>> pd.concat([p_int2,p_int1], axis=1)
     c     d    a    b
1  8.0  11.0  1.0  4.0
2  NaN   NaN  2.0  5.0
3  7.0  10.0  3.0  6.0
6  9.0  12.0  NaN  NaN

Example of expected output

>>> p_int1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, index=[1,2,3])
>>> p_int2 = pd.DataFrame({"c": [7, 8, 9], "d": [10, 11, 12]}, index=[3,1,6])
>>> pd.concat([p_int2,p_int1], axis=1, sort=False)

     c     d    a    b
3  7.0  10.0  3.0  6.0
1  8.0  11.0  1.0  4.0
6  9.0  12.0  NaN  NaN
2  NaN   NaN  2.0  5.0

#### Output of ``pd.show_versions(1.1.4)``

<details>

>>> pd.concat([p_int2,p_int1], axis=1)
     c     d    a    b
1  8.0  11.0  1.0  4.0
2  NaN   NaN  2.0  5.0
3  7.0  10.0  3.0  6.0
6  9.0  12.0  NaN  NaN

</details>
@marlenezw marlenezw added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Nov 18, 2020
@mzeitlin11
Copy link
Member

Tried making a change like this, looks to be blocked by #36289 because this change causes pandas/tests/reshape/concat/test_dataframe.py::TestDataFrameConcat::test_concat_duplicate_indexes to fail. Once the PR associated with that is merged, this could probably be fixed (though there were some other test failures because of changed index order)

@marlenezw
Copy link
Author

Looks like #36289 was solved by #36299. I'll try give this a shot incase the changes help here too 😄

@ant1j
Copy link

ant1j commented Jun 12, 2021

same situation here, with object/categorical MultiIndex and pandas.concat.
Cannot manage to reproduce yet with a trimmed down example.

It looks like the sort parameter is not propagated on this line, and this is where indexes return sorted although they should not (in my case, sort=False from pandas.concat).
https://github.com/pandas-dev/pandas/blob/master/pandas/core/indexes/api.py#L221

@mzeitlin11 mzeitlin11 added Reshaping Concat, Merge/Join, Stack/Unstack, Explode and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jul 1, 2021
@mzeitlin11 mzeitlin11 added this to the Contributions Welcome milestone Jul 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants