Skip to content

BUG: pd.concat produces frames with inconsistent order when concating the ones with categorical indices #46019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 24 commits into from

Conversation

tyuyoshi
Copy link
Contributor

@tyuyoshi tyuyoshi commented Feb 16, 2022

@pep8speaks
Copy link

pep8speaks commented Feb 16, 2022

Hello @tyuyoshi! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-03-25 07:30:01 UTC

@tyuyoshi tyuyoshi marked this pull request as ready for review February 16, 2022 23:13
@jreback jreback added Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Feb 27, 2022
@jreback jreback added this to the 1.5 milestone Feb 27, 2022
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the example from the original OP as well and put a test in pandas/tests/reshape/test_concat.py

@@ -267,6 +267,7 @@ Categorical
^^^^^^^^^^^
- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`)
- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`)
- Bug in :meth:`CategoricalIndex._concat` when concatenating categorical indexes that have the same categories returning indexes in improper order (:issue:`44099`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you refer to pd.concat as that's the most user visible bug here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thank you for review!

I added description in this commit(aacc75c)

@tyuyoshi
Copy link
Contributor Author

can you add the example from the original OP as well and put a test in pandas/tests/reshape/test_concat.py

Thanks! I added the test case in this commit(a66a60f)

@tyuyoshi tyuyoshi requested a review from jreback February 28, 2022 06:41
["a", "b", "c", "b", "a", "c"], categories=["a", "b", "c"]
),
)
tm.assert_frame_equal(concat([df1, df2]), expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use
result = concat([df1, df2])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I changed at this commit 91d2155

expected = CategoricalIndex(
["a", "b", "c", "b", "a", "c"], categories=["a", "b", "c"]
)
tm.assert_index_equal(ci1.append(ci2), expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result=ci1.append(ci2)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also changed at this commit 91d2155

@@ -267,6 +267,7 @@ Categorical
^^^^^^^^^^^
- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`)
- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`)
- Bug in :meth:`CategoricalIndex._concat` when concatenating categorical indexes that have the same categories returning indexes in improper order; this method can be called in :meth:`concat` (:issue:`44099`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update the note here; this should be about the user facing case in the OP

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I added description line at reshaping section that refer to concat func in this commit 3425b85

Is this the correct fix that you pointed out?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Hi! Thanks for all your reviews.

I would like to request a review from you again.
I finally change doc/source/whatsnew/v1.5.0.rst in this commit(0e62a36)

@tyuyoshi tyuyoshi requested a review from jreback March 2, 2022 06:41
@github-actions
Copy link
Contributor

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Apr 25, 2022
["a", "b", "c", "b", "a", "c"], categories=["a", "b", "c"]
),
)
result = concat([df1, df2])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make an asv for this type of concatanation and then show how it performs previously. i am worried that the list conversion of the codes is very expensive.

@mroeschke
Copy link
Member

Thanks for the pull request, but it appears to be stale. If interested in continuing, please merge in the main branch, address the review, and we can reopen.

@mroeschke mroeschke closed this May 24, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Categorical Categorical Data Type Reshaping Concat, Merge/Join, Stack/Unstack, Explode Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.concat produces frames with inconsistent order when concating the ones with categorical indices
4 participants