Skip to content

Replace (private) _checknull with (public) is_nan #22146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Aug 7, 2018

Conversation

jbrockmendel
Copy link
Member

util._checknull(x) is equivalent to x is None or isnan(x). On several occasions I have had to go remind myself what _checknull covers. It is much clearer to write out x is None or util.is_nan(x), which is what this PR does.

In at least one place where _checknull is currently used, the is None part of it can be skipped.

Also gets rid of usages of private functions, and trades a no-docstring func for a nice-docstring func.

Small assorted cleanups along the way.

@codecov
Copy link

codecov bot commented Aug 1, 2018

Codecov Report

Merging #22146 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22146   +/-   ##
=======================================
  Coverage   92.06%   92.06%           
=======================================
  Files         169      169           
  Lines       50694    50694           
=======================================
  Hits        46671    46671           
  Misses       4023     4023
Flag Coverage Δ
#multiple 90.47% <ø> (ø) ⬆️
#single 42.32% <ø> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update def50e2...70e339e. Read the comment docs.

-------
is_nan : bool
"""
return isinstance(val, float) and val != val
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be is_float_object (e.g. this excludes compex)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is functionally equivalent to the current implementation.

(The current version uses PyFloatCheck(val) instead of isinstance(val, float), but cython converts the latter to the former)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this handle complex as well? I would rather use the same check as we do elsewhere, as this is confusing when to use one vs the other.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this handle complex as well?

No. This also only checks for python float, as opposed to is_float_object which also checks for numpy float (the examples I've checked with isinstance(np.float64(2.3), float) have come back True, not sure what the Falsey case is).

_libs.missing.checknull begins with the check you might have in mind:

if is_float_object(val) or is_complex_object(val):
    return val != val
[...]

ATM _checknull is used in exactly 19 places, at least one of which is in a float-only branch. Based on a quick look it isn't obvious which usages (at least one) are specifically-for-float. I'll take a look.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After looking these over, I'm inclined to think it makes sense to use is_float_object but continue excluding complex. At least two of the usages are floating-specific, and complex is sufficiently corner-case-y that I'd rather check for it explicitly-or-not-at-all.

It also looks like checknull_with_nat and is_null_period may be unnecessary, will look at that in an upcoming pass.

@jreback jreback added Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate Clean labels Aug 1, 2018
@jreback jreback added this to the 0.24.0 milestone Aug 1, 2018
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comments

@jreback
Copy link
Contributor

jreback commented Aug 7, 2018

can you rebase and merge on green (just to be sure).

@jbrockmendel jbrockmendel merged commit 0041681 into pandas-dev:master Aug 7, 2018
@jbrockmendel jbrockmendel deleted the priv5 branch August 7, 2018 17:12
Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018
* make public is_nan instead of private _checknull

* cleanup non-py syntax

* fixup missed usage

* Cleanup import and whitespace

* have is_nan check for np.float_

* fix segfaults
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants