You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I've noticed that when concat'ing DataFrames, along axis=1, integer indexes are automatically sorted even when sort=False. This is not the case when axis = 1 and we have a stringIndex. This also diverges from when axis=0. This behaviour is a bit confusing and so I thought it might be a bug.
In the _unique_indices API, if the indexes are integers they are brought together using a union, result = result.union(other) that sorts the numbers by default. So the resulting index is sorted whether or not sort=True.
I think this can be fixed by changing the code to result = result.union(other, sort=sort) or something along those lines.
I've added examples below. Thank you :D
Example of difference between stringIndex and intIndex behaviour
#This is what happens with string index (union of indexes ordered)>>>p1=pd.DataFrame(
... {"a": [1, 2, 3], "b": [4, 5, 6]}, index=["p", "q", "r"]
... )
>>>p2=pd.DataFrame(
... {"c": [7, 8, 9], "d": [10, 11, 12]}, index=["r", "p", "z"]
... )
>>>pd.concat([p1,p2], axis=1)
abcdp1.04.08.011.0q2.05.0NaNNaNr3.06.07.010.0zNaNNaN9.012.0>>>pd.concat([p2,p1], axis=1)
cdabr7.010.03.06.0p8.011.01.04.0z9.012.0NaNNaNqNaNNaN2.05.0#This is what happens with an int index (sorted, regardless of order of input)>>>p_int1=pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, index=[1,2,3])
>>>p_int2=pd.DataFrame({"c": [7, 8, 9], "d": [10, 11, 12]}, index=[3,1,6])
>>>pd.concat([p_int1,p_int2], axis=1)
abcd11.04.08.011.022.05.0NaNNaN33.06.07.010.06NaNNaN9.012.0>>>pd.concat([p_int2,p_int1], axis=1)
cdab18.011.01.04.02NaNNaN2.05.037.010.03.06.069.012.0NaNNaN
Example of expected output
>>> p_int1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, index=[1,2,3])
>>> p_int2 = pd.DataFrame({"c": [7, 8, 9], "d": [10, 11, 12]}, index=[3,1,6])
>>> pd.concat([p_int2,p_int1], axis=1, sort=False)
c d a b
3 7.0 10.0 3.0 6.0
1 8.0 11.0 1.0 4.0
6 9.0 12.0 NaN NaN
2 NaN NaN 2.0 5.0
#### Output of ``pd.show_versions(1.1.4)``
<details>
>>> pd.concat([p_int2,p_int1], axis=1)
c d a b
1 8.0 11.0 1.0 4.0
2 NaN NaN 2.0 5.0
3 7.0 10.0 3.0 6.0
6 9.0 12.0 NaN NaN
</details>
The text was updated successfully, but these errors were encountered:
Tried making a change like this, looks to be blocked by #36289 because this change causes pandas/tests/reshape/concat/test_dataframe.py::TestDataFrameConcat::test_concat_duplicate_indexes to fail. Once the PR associated with that is merged, this could probably be fixed (though there were some other test failures because of changed index order)
Problem description
Hi! I've noticed that when concat'ing DataFrames, along
axis=1
, integer indexes are automatically sorted even whensort=False
. This is not the case whenaxis = 1
and we have a stringIndex. This also diverges from whenaxis=0
. This behaviour is a bit confusing and so I thought it might be a bug.In the _unique_indices API, if the indexes are integers they are brought together using a union,
result = result.union(other)
that sorts the numbers by default. So the resulting index is sorted whether or not sort=True.I think this can be fixed by changing the code to
result = result.union(other, sort=sort)
or something along those lines.I've added examples below. Thank you :D
Example of difference between stringIndex and intIndex behaviour
Example of expected output
The text was updated successfully, but these errors were encountered: