Skip to content

BUG: Use float64 for row counter in rank() #18274

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

BUG: Use float64 for row counter in rank() #18274

wants to merge 1 commit into from

Conversation

proinsias
Copy link
Contributor

@proinsias proinsias commented Nov 14, 2017

As reported in #18271, for a large Series s of floats, Series.rank(pct=True).max() may not be <=1 as expected.

This is due to the use of a float for the row counter. This updates the counter to use a float64.

@proinsias proinsias changed the title BUG: Use float64 for row counter in rank() GH18271 BUG: Use float64 for row counter in rank() Nov 14, 2017
@codecov
Copy link

codecov bot commented Nov 14, 2017

Codecov Report

Merging #18274 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18274      +/-   ##
==========================================
- Coverage    91.4%   91.38%   -0.03%     
==========================================
  Files         164      164              
  Lines       49878    49878              
==========================================
- Hits        45590    45580      -10     
- Misses       4288     4298      +10
Flag Coverage Δ
#multiple 89.18% <ø> (-0.01%) ⬇️
#single 39.41% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.38% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 69472f9...bfee0fe. Read the comment docs.

@codecov
Copy link

codecov bot commented Nov 14, 2017

Codecov Report

Merging #18274 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #18274      +/-   ##
==========================================
+ Coverage   91.37%   91.38%   +<.01%     
==========================================
  Files         164      164              
  Lines       49880    49880              
==========================================
+ Hits        45580    45583       +3     
+ Misses       4300     4297       -3
Flag Coverage Δ
#multiple 89.19% <ø> (+0.02%) ⬆️
#single 39.42% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.8% <0%> (-0.1%) ⬇️
pandas/core/series.py 95% <0%> (ø) ⬆️
pandas/plotting/_converter.py 65.25% <0%> (+1.81%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ef4e30b...d509360. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does your test from the original issue repro the failure?

@@ -162,5 +162,5 @@ Other
^^^^^

- Improved error message when attempting to use a Python keyword as an identifier in a numexpr query (:issue:`18221`)
-
- Fixed incorrect maximum :func:`Series.rank` percentile for large arrays (:issue:`18271`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.21.1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to Numeric section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both done. Thanks!

@jreback jreback added the Numeric Operations Arithmetic, Comparison, and Logical operations label Nov 14, 2017
@proinsias
Copy link
Contributor Author

When I rerun the same code from #18271, I get the same result in Rank_Pct and Rank_Pct_Manual:

                abc          Rank      Rank_Pct  Rank_Pct_Manual
count  2.000000e+07  2.000000e+07  2.000000e+07     2.000000e+07
mean   4.999223e-01  1.000000e+07  5.000000e-01     5.000000e-01
std    2.886891e-01  5.773503e+06  2.886751e-01     2.886751e-01
min    1.036192e-08  1.000000e+00  5.000000e-08     5.000000e-08
25%    2.498756e-01  5.000001e+06  2.500000e-01     2.500000e-01
50%    4.999781e-01  1.000000e+07  5.000000e-01     5.000000e-01
75%    7.499111e-01  1.500000e+07  7.500000e-01     7.500000e-01
max    1.000000e+00  2.000000e+07  1.000000e+00     1.000000e+00

As reported in #18271, for a large `Series` `s` of `float`s,
`Series.rank(pct=True).max()` may not be `<=1` as expected.

This is due to the use of a `float` for the row counter. This updates
the counter to use a `float64`.
@jreback
Copy link
Contributor

jreback commented Nov 15, 2017

tests

@jreback
Copy link
Contributor

jreback commented Nov 22, 2017

pls rebase and update to comments

@jreback
Copy link
Contributor

jreback commented Jan 21, 2018

closing as stale. if you want to continue working, pls ping.

@jreback jreback closed this Jan 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series.rank(pct=True).max() != 1 for a large series of floats
2 participants