Skip to content

FIX: add support for desc order when ranking infs with nans #19538 #20091

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 30, 2018

Conversation

peterpanmj
Copy link
Contributor

@peterpanmj peterpanmj commented Mar 10, 2018

Please include the output of the validation script below between the "```" ticks:

################################################################################
################ Docstring (pandas._libs.algos.rank_1d_object)  ################
################################################################################

Fast NaN-friendly version of scipy.stats.rankdata

################################################################################
################################## Validation ##################################
################################################################################

Errors found:
        Summary does not end with dot
        No extended summary found
        No returns section found
        See Also section not found
        No examples section found
(pandas_dev)

Checklist for other PRs (remove this part if you are doing a PR for the pandas documentation sprint):

@pep8speaks
Copy link

pep8speaks commented Mar 10, 2018

Hello @peterpanmj! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on March 25, 2018 at 10:11 Hours UTC

@jreback jreback added Bug Numeric Operations Arithmetic, Comparison, and Logical operations labels Mar 10, 2018
@codecov
Copy link

codecov bot commented Mar 10, 2018

Codecov Report

Merging #20091 into master will decrease coverage by 0.02%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20091      +/-   ##
==========================================
- Coverage   91.72%    91.7%   -0.03%     
==========================================
  Files         150      150              
  Lines       49149    49149              
==========================================
- Hits        45083    45071      -12     
- Misses       4066     4078      +12
Flag Coverage Δ
#multiple 90.08% <ø> (-0.03%) ⬇️
#single 41.85% <ø> (ø) ⬆️
Impacted Files Coverage Δ
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 52cffa3...1622515. Read the comment docs.

@codecov
Copy link

codecov bot commented Mar 10, 2018

Codecov Report

Merging #20091 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #20091      +/-   ##
==========================================
- Coverage   91.84%   91.82%   -0.02%     
==========================================
  Files         152      152              
  Lines       49259    49249      -10     
==========================================
- Hits        45241    45225      -16     
- Misses       4018     4024       +6
Flag Coverage Δ
#multiple 90.21% <0%> (-0.02%) ⬇️
#single 41.89% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/io/formats/terminal.py 16.43% <0%> (-4.55%) ⬇️
pandas/plotting/_converter.py 65.07% <0%> (-1.74%) ⬇️
pandas/core/series.py 93.84% <0%> (-0.01%) ⬇️
pandas/core/panel.py 97.29% <0%> (-0.01%) ⬇️
pandas/core/frame.py 97.18% <0%> (-0.01%) ⬇️
pandas/core/strings.py 98.32% <0%> (ø) ⬆️
pandas/io/formats/format.py 98.24% <0%> (ø) ⬆️
pandas/core/generic.py 95.85% <0%> (ø) ⬆️
pandas/core/base.py 96.8% <0%> (ø) ⬆️
pandas/util/_decorators.py 82.4% <0%> (+0.14%) ⬆️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 63a662d...9e47c0b. Read the comment docs.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a whatsnew note

@@ -263,8 +264,11 @@ def test_rank_tie_methods_on_infs_nans(self):
chunk = 3
disabled = set([('object', 'first')])

def _check(s, expected, method='average', na_option='keep'):
result = s.rank(method=method, na_option=na_option)
def _check(s, expected, method='average', na_option='keep',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parametrize this test

_check(iseries, order, method, na_opt)
order = [ranks[0], [np.nan] * chunk, ranks[1]]
_check(iseries, order, method, na_opt, True)
_check(iseries, order[::-1], method, na_opt, False)

def test_rank_methods_series(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ca you change this to use the @td.skip_if_no_scipy decorator instead

@jreback
Copy link
Contributor

jreback commented Mar 10, 2018

can you add the simple example from the issue as a test as well.

@peterpanmj
Copy link
Contributor Author

The command "ci/lint.sh" exited with 1.

The Travis CI build failed after some modification on "pandas\tests\series\test_rank.py". I have no idea what is the cause.

@jreback
Copy link
Contributor

jreback commented Mar 13, 2018

somehow lots of commits go there. merge in master and push again.

exp_ranks = {
'average': ([2, 2, 2], [5, 5, 5], [8, 8, 8]),
'min': ([1, 1, 1], [4, 4, 4], [7, 7, 7]),
'max': ([3, 3, 3], [6, 6, 6], [9, 9, 9]),
'first': ([1, 2, 3], [4, 5, 6], [7, 8, 9]),
'dense': ([1, 1, 1], [2, 2, 2], [3, 3, 3])
}
na_options = ('top', 'bottom', 'keep')

def _check(s, method, na_option, ascending):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you inline these below

_check(iseries, method, na_option, ascending)

def test_rank_desc_mix_nans_infs(self):
iseries = Series([1, np.nan, np.inf, -np.inf, 25])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add the issue number

@peterpanmj
Copy link
Contributor Author

lint.sh gives error. Don't know why.

result = s.rank(method=method, na_option=na_option)
def _check(s, method, na_option, ascending):
exp_ranks = {
'average': ([2, 2, 2], [5, 5, 5], [8, 8, 8]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should start four spaces to the right of the e in exp_ranks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise with each one below it.

Copy link
Contributor

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added comments on the linking errors in https://travis-ci.org/pandas-dev/pandas/jobs/353646265#L3004

Run flake8 pandas/tests/series/test_rank.py locally before pushing to validate your fixes.

result = s.rank(method=method, na_option=na_option)
def _check(s, method, na_option, ascending):
exp_ranks = {
'average': ([2, 2, 2], [5, 5, 5], [8, 8, 8]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise with each one below it.

_check(iseries, method, na_option, ascending)

def test_rank_desc_mix_nans_infs(self):
#GH 19538
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after #


def test_rank_desc_mix_nans_infs(self):
#GH 19538
#check descending ranking when mix nans and infs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Space after #

@jreback
Copy link
Contributor

jreback commented Mar 25, 2018

@WillAyd if you'd have a look

@jreback jreback added this to the 0.23.0 milestone Mar 25, 2018
@WillAyd
Copy link
Member

WillAyd commented Mar 25, 2018

@peterpanmj nice work - does this work for GroupBy ranking the same way?

@peterpanmj
Copy link
Contributor Author

@WillAyd As long as rank_1d_{dtype} is called, it is fine, but group by rank is still broken.

In [51]: df = pd.DataFrame([1, np.nan, np.inf, -np.inf, 25])

In [52]: df['key'] = 'foo'

In [53]: df.groupby("key").rank()      # not working properly for infinity
Out[53]:
     0
0  2.0
1  NaN
2  NaN
3  1.0
4  3.0

In [54]: df.rank()     # this is ok
Out[54]:
     0  key
0  2.0  3.0
1  NaN  3.0
2  4.0  3.0
3  1.0  3.0
4  3.0  3.0

In [55]: df.groupby("key").apply(lambda x:x.rank())   # this is also ok
Out[55]:
     0  key
0  2.0  3.0
1  NaN  3.0
2  4.0  3.0
3  1.0  3.0
4  3.0  3.0

@WillAyd
Copy link
Member

WillAyd commented Mar 27, 2018

Do you want to try and tackle the groupby implementation as part of this? I've linked the relevant function below - it isn't that drastically different from the algos rank implementation

def group_rank_{{name}}(ndarray[float64_t, ndim=2] out,

@peterpanmj
Copy link
Contributor Author

I will try but it should take a lot of efforts.
I don't under stand why out has ndim=2 ? What is the dtype of labels ?

@WillAyd
Copy link
Member

WillAyd commented Mar 29, 2018

Labels are int64 - each unique grouping will have it's own label. out has ndim=2 to match the other Cython call signatures, but you'll see in this that only the first dimension ever gets written to (I think this will be cleaned up in the future).

@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

can do the groupby fixes in another PR :> @peterpanmj or @WillAyd can you open an issue?

@jreback jreback merged commit 0a00365 into pandas-dev:master Mar 30, 2018
@jreback
Copy link
Contributor

jreback commented Mar 30, 2018

thanks @peterpanmj

@WillAyd
Copy link
Member

WillAyd commented Mar 30, 2018

Opened #20561 for the GroupBy piece. @peterpanmj if you started working on it then by all means continue and let me know if you need help

@peterpanmj peterpanmj deleted the rank_desc branch April 20, 2018 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Numeric Operations Arithmetic, Comparison, and Logical operations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Rank Mixes np.nan with np.inf values
5 participants