FIX: add support for desc order when ranking infs with nans #19538 #20091

peterpanmj · 2018-03-10T02:19:39Z

Please include the output of the validation script below between the "```" ticks:

################################################################################
################ Docstring (pandas._libs.algos.rank_1d_object)  ################
################################################################################

Fast NaN-friendly version of scipy.stats.rankdata

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Summary does not end with dot
        No extended summary found
        No returns section found
        See Also section not found
        No examples section found
(pandas_dev)

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

closes Rank Mixes np.nan with np.inf values #19538
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-03-10T02:19:41Z

Hello @peterpanmj! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 25, 2018 at 10:11 Hours UTC

codecov · 2018-03-10T05:19:54Z

Codecov Report

Merging #20091 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #20091      +/-   ##
==========================================
- Coverage   91.72%    91.7%   -0.03%     
==========================================
  Files         150      150              
  Lines       49149    49149              
==========================================
- Hits        45083    45071      -12     
- Misses       4066     4078      +12

Flag	Coverage Δ
#multiple	`90.08% <ø> (-0.03%)`	⬇️
#single	`41.85% <ø> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/plotting/_converter.py	`65.07% <0%> (-1.74%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52cffa3...1622515. Read the comment docs.

codecov · 2018-03-10T05:20:25Z

Codecov Report

Merging #20091 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #20091      +/-   ##
==========================================
- Coverage   91.84%   91.82%   -0.02%     
==========================================
  Files         152      152              
  Lines       49259    49249      -10     
==========================================
- Hits        45241    45225      -16     
- Misses       4018     4024       +6

Flag	Coverage Δ
#multiple	`90.21% <0%> (-0.02%)`	⬇️
#single	`41.89% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/formats/terminal.py	`16.43% <0%> (-4.55%)`	⬇️
pandas/plotting/_converter.py	`65.07% <0%> (-1.74%)`	⬇️
pandas/core/series.py	`93.84% <0%> (-0.01%)`	⬇️
pandas/core/panel.py	`97.29% <0%> (-0.01%)`	⬇️
pandas/core/frame.py	`97.18% <0%> (-0.01%)`	⬇️
pandas/core/strings.py	`98.32% <0%> (ø)`	⬆️
pandas/io/formats/format.py	`98.24% <0%> (ø)`	⬆️
pandas/core/generic.py	`95.85% <0%> (ø)`	⬆️
pandas/core/base.py	`96.8% <0%> (ø)`	⬆️
pandas/util/_decorators.py	`82.4% <0%> (+0.14%)`	⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63a662d...9e47c0b. Read the comment docs.

jreback

can you add a whatsnew note

jreback · 2018-03-10T13:02:49Z

pandas/tests/series/test_rank.py

@@ -263,8 +264,11 @@ def test_rank_tie_methods_on_infs_nans(self):
        chunk = 3
        disabled = set([('object', 'first')])

-        def _check(s, expected, method='average', na_option='keep'):
-            result = s.rank(method=method, na_option=na_option)
+        def _check(s, expected, method='average', na_option='keep',


can you parametrize this test

jreback · 2018-03-10T13:04:01Z

pandas/tests/series/test_rank.py

-                _check(iseries, order, method, na_opt)
+                    order = [ranks[0], [np.nan] * chunk, ranks[1]]
+                _check(iseries, order, method, na_opt, True)
+                _check(iseries, order[::-1], method, na_opt, False)

    def test_rank_methods_series(self):


ca you change this to use the @td.skip_if_no_scipy decorator instead

jreback · 2018-03-10T13:04:49Z

can you add the simple example from the issue as a test as well.

…andas-dev#19538

peterpanmj · 2018-03-13T02:28:57Z

The command "ci/lint.sh" exited with 1.

The Travis CI build failed after some modification on "pandas\tests\series\test_rank.py". I have no idea what is the cause.

jreback · 2018-03-13T10:29:19Z

somehow lots of commits go there. merge in master and push again.

jreback · 2018-03-14T10:53:30Z

pandas/tests/series/test_rank.py

        exp_ranks = {
            'average': ([2, 2, 2], [5, 5, 5], [8, 8, 8]),
            'min': ([1, 1, 1], [4, 4, 4], [7, 7, 7]),
            'max': ([3, 3, 3], [6, 6, 6], [9, 9, 9]),
            'first': ([1, 2, 3], [4, 5, 6], [7, 8, 9]),
            'dense': ([1, 1, 1], [2, 2, 2], [3, 3, 3])
        }
-        na_options = ('top', 'bottom', 'keep')
+
+        def _check(s, method, na_option, ascending):


can you inline these below

jreback · 2018-03-14T10:53:41Z

pandas/tests/series/test_rank.py

+            _check(iseries, method, na_option, ascending)
+
+    def test_rank_desc_mix_nans_infs(self):
+        iseries = Series([1, np.nan, np.inf, -np.inf, 25])


can you add the issue number

, add some comments

peterpanmj · 2018-03-16T08:25:33Z

lint.sh gives error. Don't know why.

TomAugspurger · 2018-03-16T12:49:18Z

pandas/tests/series/test_rank.py

-            result = s.rank(method=method, na_option=na_option)
+        def _check(s, method, na_option, ascending):
+            exp_ranks = {
+                    'average': ([2, 2, 2], [5, 5, 5], [8, 8, 8]),


These should start four spaces to the right of the e in exp_ranks.

Likewise with each one below it.

TomAugspurger

Added comments on the linking errors in https://travis-ci.org/pandas-dev/pandas/jobs/353646265#L3004

Run flake8 pandas/tests/series/test_rank.py locally before pushing to validate your fixes.

TomAugspurger · 2018-03-16T12:49:29Z

pandas/tests/series/test_rank.py

-            result = s.rank(method=method, na_option=na_option)
+        def _check(s, method, na_option, ascending):
+            exp_ranks = {
+                    'average': ([2, 2, 2], [5, 5, 5], [8, 8, 8]),


Likewise with each one below it.

TomAugspurger · 2018-03-16T12:49:41Z

pandas/tests/series/test_rank.py

+            _check(iseries, method, na_option, ascending)
+
+    def test_rank_desc_mix_nans_infs(self):
+        #GH 19538


Space after #

TomAugspurger · 2018-03-16T12:49:49Z

pandas/tests/series/test_rank.py

+
+    def test_rank_desc_mix_nans_infs(self):
+        #GH 19538
+        #check descending ranking when mix nans and infs


Space after #

jreback · 2018-03-25T14:01:47Z

@WillAyd if you'd have a look

WillAyd · 2018-03-25T16:30:02Z

@peterpanmj nice work - does this work for GroupBy ranking the same way?

peterpanmj · 2018-03-27T12:28:46Z

@WillAyd As long as rank_1d_{dtype} is called, it is fine, but group by rank is still broken.

In [51]: df = pd.DataFrame([1, np.nan, np.inf, -np.inf, 25])

In [52]: df['key'] = 'foo'

In [53]: df.groupby("key").rank()      # not working properly for infinity
Out[53]:
     0
0  2.0
1  NaN
2  NaN
3  1.0
4  3.0

In [54]: df.rank()     # this is ok
Out[54]:
     0  key
0  2.0  3.0
1  NaN  3.0
2  4.0  3.0
3  1.0  3.0
4  3.0  3.0

In [55]: df.groupby("key").apply(lambda x:x.rank())   # this is also ok
Out[55]:
     0  key
0  2.0  3.0
1  NaN  3.0
2  4.0  3.0
3  1.0  3.0
4  3.0  3.0

WillAyd · 2018-03-27T17:43:08Z

Do you want to try and tackle the groupby implementation as part of this? I've linked the relevant function below - it isn't that drastically different from the algos rank implementation

pandas/pandas/_libs/groupby_helper.pxi.in

Line 415 in 766a480

def group_rank_{{name}}(ndarray[float64_t, ndim=2] out,

peterpanmj · 2018-03-29T12:53:55Z

I will try but it should take a lot of efforts.
I don't under stand why out has ndim=2 ? What is the dtype of labels ?

WillAyd · 2018-03-29T14:57:39Z

Labels are int64 - each unique grouping will have it's own label. out has ndim=2 to match the other Cython call signatures, but you'll see in this that only the first dimension ever gets written to (I think this will be cleaned up in the future).

jreback · 2018-03-30T20:27:51Z

can do the groupby fixes in another PR :> @peterpanmj or @WillAyd can you open an issue?

jreback · 2018-03-30T20:37:46Z

thanks @peterpanmj

WillAyd · 2018-03-30T23:22:06Z

Opened #20561 for the GroupBy piece. @peterpanmj if you started working on it then by all means continue and let me know if you need help

…v#19538 (pandas-dev#20091)

peterpanmj added 3 commits March 10, 2018 09:38

fix rank issue when asec is false

9a6a4b9

fix the non_na_idx

dba2258

add test ranking inf/nan in descending order

5eb57b0

jreback added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 10, 2018

fix some style errors in test_rank

1622515

jreback requested changes Mar 10, 2018

View reviewed changes

parametrize test_rank_tie_methods_on_infs_nans, add a small test for p…

f6d0fd0

…andas-dev#19538

Merge branch 'master' into rank_desc

45f60fe

peterpanmj force-pushed the rank_desc branch from 8254cec to 45f60fe Compare March 14, 2018 07:21

jreback requested changes Mar 14, 2018

View reviewed changes

add issue number and move expected results into comparing method

8b0467f

, add some comments

TomAugspurger reviewed Mar 16, 2018

View reviewed changes

peterpanmj added 2 commits March 19, 2018 13:04

fix some pep8 errors

b503743

update whatsnew

62e96f6

jreback approved these changes Mar 25, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone Mar 25, 2018

WillAyd mentioned this pull request Mar 29, 2018

PERF: non-numeric fillna #20300

Open

Merge branch 'master' into PR_TOOL_MERGE_PR_20091

9e47c0b

jreback merged commit 0a00365 into pandas-dev:master Mar 30, 2018

WillAyd mentioned this pull request Mar 30, 2018

GroupBy Rank Operations With Infinity Incorrect #20561

Closed

jorisvandenbossche mentioned this pull request Apr 19, 2018

PERF: Cythonize Groupby Rank #19481

Merged

4 tasks

peterpanmj deleted the rank_desc branch April 20, 2018 06:16

kornilova203 pushed a commit to kornilova203/pandas that referenced this pull request Apr 23, 2018

FIX: add support for desc order when ranking infs with nans pandas-de…

b3e5292

…v#19538 (pandas-dev#20091)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX: add support for desc order when ranking infs with nans #19538 #20091

FIX: add support for desc order when ranking infs with nans #19538 #20091

peterpanmj commented Mar 10, 2018 •

edited

Loading

pep8speaks commented Mar 10, 2018 •

edited

Loading

codecov bot commented Mar 10, 2018

codecov bot commented Mar 10, 2018 •

edited

Loading

jreback left a comment

jreback Mar 10, 2018

jreback Mar 10, 2018

jreback commented Mar 10, 2018

peterpanmj commented Mar 13, 2018

jreback commented Mar 13, 2018

jreback Mar 14, 2018

jreback Mar 14, 2018

peterpanmj commented Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger left a comment

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

TomAugspurger Mar 16, 2018

jreback commented Mar 25, 2018

WillAyd commented Mar 25, 2018

peterpanmj commented Mar 27, 2018

WillAyd commented Mar 27, 2018

peterpanmj commented Mar 29, 2018

WillAyd commented Mar 29, 2018

jreback commented Mar 30, 2018

jreback commented Mar 30, 2018

WillAyd commented Mar 30, 2018

FIX: add support for desc order when ranking infs with nans #19538 #20091

FIX: add support for desc order when ranking infs with nans #19538 #20091

Conversation

peterpanmj commented Mar 10, 2018 • edited Loading

pep8speaks commented Mar 10, 2018 • edited Loading

Comment last updated on March 25, 2018 at 10:11 Hours UTC

codecov bot commented Mar 10, 2018

Codecov Report

codecov bot commented Mar 10, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 10, 2018

peterpanmj commented Mar 13, 2018

jreback commented Mar 13, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peterpanmj commented Mar 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 25, 2018

WillAyd commented Mar 25, 2018

peterpanmj commented Mar 27, 2018

WillAyd commented Mar 27, 2018

peterpanmj commented Mar 29, 2018

WillAyd commented Mar 29, 2018

jreback commented Mar 30, 2018

jreback commented Mar 30, 2018

WillAyd commented Mar 30, 2018

peterpanmj commented Mar 10, 2018 •

edited

Loading

pep8speaks commented Mar 10, 2018 •

edited

Loading

codecov bot commented Mar 10, 2018 •

edited

Loading