-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: series rank has a percentage rank option #5978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
can you add a vbench for both cases (the original and new one?) |
@@ -940,6 +940,7 @@ New features | |||
- Access to historical Google Finance data in pandas.io.data (:issue:`3814`) | |||
- DataFrame plotting methods can sample column colors from a Matplotlib | |||
colormap via the ``colormap`` keyword. (:issue:`3860`) | |||
- ``Series.rank()`` has a percentage rank option (:issue: `5971`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you put it accidentally in pandas 0.12 section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that is fixed.
any thoughts on 'percentage' vs. say 'per'? @jorisvandenbossche @y-p ? |
I prefer 'pct' to 'per' @jorisvandenbossche @y-p |
@MichaelWS so change to that.. |
done |
also, I have never worked with vbench before, so I am not sure if I am doing what is intended |
add a benchmark in vb_suite/groupby.py.....just copy-paste what you want to do
|
I am getting some odd errors. I will have to do some more research later today on it. sqlalchemy.exc.IntegrityError: (IntegrityError) column checksum is not unique u'INSERT INTO benchmarks (checksum, name, description) VALUES (?, ?, ?)' ('ea1993ef61c3cc4e871d2cce3c5d983c', 'eval_frame_chained_cmp_python', None) |
@@ -3894,6 +3894,11 @@ def test_rank(self): | |||
iranks = iseries.rank() | |||
exp = iseries.astype(float).rank() | |||
assert_series_equal(iranks, exp) | |||
iseries = Series(np.arange(5)) + 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a couple of more tests, maybe all nan series and for groupby, group that has 1 element
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added more tests with partial nan's and duplicate values. nan's will always be nan's so not sure if we would ever catch a bug if all nan's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but that is the check; make sure you propogate nans; the edge cases are always important to test (and usually the hardest to get right)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easy enough. I added that as well.
this looks fine |
easy enough. That is done. |
@@ -79,9 +79,12 @@ Improvements to existing features | |||
- ``plot(legend='reverse')`` will now reverse the order of legend labels for most plot kinds. | |||
(:issue:`6014`) | |||
- Allow multi-index slicers (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :issue:`5641`) | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your rebase introduced these....need to edit
can you reset the release.rst to master HEAD then add your change again. Their were some formatting updates which you are reverting....this should only be your 1 line change |
ENH: series rank has a percentage rank option
thanks @MichaelWS ! |
Thanks @jreback |
@MichaelWS I think we need to have this issue is here: #6717 |
closes #5971
This pull allows people to compute percentage ranks in cython for a series. I found myself computing this all the time and it will make this calculation much faster.