Skip to content

The indexes of DataFrame.describe(percentiles=[0.29, 0.57, 0.58]) are incorrect #48298

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Sep 6, 2022

Conversation

weikhor
Copy link
Contributor

@weikhor weikhor commented Aug 29, 2022

@mroeschke mroeschke added the Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff label Aug 29, 2022
@@ -1721,10 +1721,10 @@ def format_percentiles(

percentiles = 100 * percentiles

int_idx = np.isclose(percentiles.astype(int), percentiles)
int_idx = np.isclose(percentiles.round().astype(int), percentiles)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make percentiles.round().astype(int) a dedicated variable so it can be used here and below?

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks fairly good. Could use a whatsnew note in 1.6.rst

@weikhor weikhor requested a review from mroeschke August 31, 2022 01:03
@weikhor weikhor requested a review from mroeschke August 31, 2022 18:06

if np.all(int_idx):
out = percentiles.astype(int).astype(str)
out = percentiles_round_type.astype(str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the example in the original issue hit this path or the one below? Just want to make sure we have 2 tests that didn't work before and hit this change and the one below to work correctly now

Copy link
Contributor Author

@weikhor weikhor Sep 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the example in the original issue hit this path or the one below? Just want to make sure we have 2 tests that didn't work before and hit this change and the one below to work correctly now

a) The example from issue will hit this path.

pd.DataFrame(data=[1, 2, 3], columns=['value']).describe(percentiles=[0.29, 0.57, 0.58])
if np.all(int_idx):
        out = percentiles_round_type.astype(str)
        return [i + "%" for i in out]

b) The example below I own create will hit below path.

pd.DataFrame(data=[1, 2, 3], columns=['value']).describe(percentiles=[0.291, 0.57, 0.58])
out[int_idx] = percentiles[int_idx].round().astype(int).astype(str)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add the pd.DataFrame(data=[1, 2, 3], columns=['value']).describe(percentiles=[0.29, 0.57, 0.58]) case as a unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have add test

@weikhor weikhor requested a review from mroeschke September 2, 2022 21:33
@mroeschke mroeschke added this to the 1.6 milestone Sep 6, 2022
@mroeschke mroeschke merged commit 82ce975 into pandas-dev:main Sep 6, 2022
@mroeschke
Copy link
Member

Thanks @weikhor!

@weikhor weikhor deleted the dataframe_percentiles_46362 branch September 6, 2022 17:35
@mroeschke mroeschke modified the milestones: 1.6, 2.0 Oct 13, 2022
noatamir pushed a commit to noatamir/pandas that referenced this pull request Nov 9, 2022
… incorrect (pandas-dev#48298)

* add test

* add test

* add

* Update doc/source/whatsnew/v1.6.0.rst

Co-authored-by: Matthew Roeschke <[email protected]>

* add

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: The indexes of DataFrame.describe(percentiles=[0.29, 0.57, 0.58]) are incorrect
2 participants