Skip to content

Add base test for Series[ExtensionArray].sort_values with sort=False #23515

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

akulagrawal
Copy link

@akulagrawal akulagrawal commented Nov 5, 2018

Add base test for Series[ExtensionArray].sort_values with sort=False
Addresses pandas-dev#23074
@pep8speaks
Copy link

Hello @akulagrawal! Thanks for submitting the PR.

@gfyoung gfyoung added Testing pandas testing functions or related to the test suite ExtensionArray Extending pandas with custom dtypes or arrays. labels Nov 5, 2018
@gfyoung gfyoung requested a review from TomAugspurger November 5, 2018 20:18
@TomAugspurger
Copy link
Contributor

Some failing tests that'll need to be fixed up. I haven't looked closely.

This raises an ExtensionArray API question: should we require that .value_counts() preserve order? Though IIRC, I think that @jorisvandenbossche wanted to remove .values_counts from the API?

At the very least, we should ensure that our EAs implement value_counts correctly, so that the following match

In [21]: ser = pd.Series(pd.core.arrays.integer_array([1, 2, 2, 2, 1, 0]))

In [22]: ser.value_counts(sort=False)
Out[22]:
2    3
1    2
0    1
dtype: int64

In [23]: ser.astype(int).value_counts(sort=False)
Out[23]:
0    1
1    2
2    3
dtype: int64

@akulagrawal
Copy link
Author

akulagrawal commented Nov 6, 2018

Oh my bad! I am a beginner and this is my first contribution to pandas. As for the match, according to my system the output is:

In [21]: ser = pd.Series([1, 2, 2, 2, 1, 0])
In [22]: ser.value_counts(sort=False)
Out[22]:
0 1
1 2
2 3
dtype: int64
In [23]: ser.astype(int).value_counts(sort=False)
Out[23]:
0 1
1 2
2 3
dtype: int64

I understand they are incorrect but they don't match with yours atleast

@TomAugspurger
Copy link
Contributor

Yeah, I'm not sure why those don't match. Presumably there's whatever internally does this hashing doesn't preserve order.

I'd recommend holding off on this PR and maybe finding another issue until we can decide exactly what we want the requirements on order to be here.

@akulagrawal
Copy link
Author

Oh... first of all there was some error in my last comment. I meant they matched on my system (output updated in the comment), though both seem to be incorrect.
Alright, moving onto other issue. Thanks @TomAugspurger !

@jorisvandenbossche
Copy link
Member

Closing the PR (to shorten our backlog of open PRs). We still have the issue to track it, so if we decide to keep value_counts on that issue, we can simply reopen this PR. @akulagrawal That is OK?

@akulagrawal
Copy link
Author

yes sure :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants