-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: GH11206 where pd.isnull did not consider numpy NaT null #11212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
d4a2acc
to
e937ebc
Compare
lgtm. can you run a perf check on the null benchmarks. just confirm nothing has substantially changed. |
597d9d4
to
d6b80ce
Compare
Updated. Timing for
I also changed the code slightly so that the logic is further short-circuited short-circuited. |
the benchmark u are showing is 3x increase? |
Yes.That's the with lots of nulls. I will dig a little to see if that can be avoided. |
these routines are hit everywhere, so i am for special casing certain things. IOW, that is why |
d6b80ce
to
97a5c6a
Compare
Updated. |
result[i] = val is NaT or _checknull(val) | ||
return result.view(np.bool_) | ||
result[i] = _check_all_nulls(val) | ||
return result.view(dtype=np.bool_) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot to remove this. I will remove in the next update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what are you fixing here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean the _check_all_nulls
? Only object arrays hit this path so I thought it'd be safe to have the most generic version of checknull
here?
For the dtype
thing, I forgot to remove it after some experiment, but will remove it in the next update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep that is fine
97a5c6a
to
24d174b
Compare
udpated |
@@ -1146,7 +1146,7 @@ Bug Fixes | |||
- Bug in ``Categorical.__iter__`` may not returning correct ``datetime`` and ``Period`` (:issue:`10713`) | |||
- Bug in indexing with a ``PeriodIndex`` on an object with a ``PeriodIndex`` (:issue:`4125`) | |||
- Bug in ``read_csv`` with ``engine='c'``: EOF preceded by a comment, blank line, etc. was not handled correctly (:issue:`10728`, :issue:`10548`) | |||
|
|||
- Bug in ``isnull`` where ``numpy.datetime64('NaT')`` in a ``numpy.array`` was not determined to be null(:issue:`11206`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's put this in 0.17.1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls move to 0.17.1
ok. I'm traveling and will get to this next week. |
@kawochen can you update (move whatsnew to 0.17.1) |
Will do (still on vacation)
|
24d174b
to
a0ca190
Compare
updated |
can u rebase and run a perf check |
Saw some horrifying perf hit (30X). I will look at the compiled code later. |
ok gr8! |
so using
the perf hit is 4X. But if I that put into an |
are you using |
The perf hit is different for numpy 1.10 vs 1.9 (and asv probably uses the newest version) |
@kawochen pushing, but if you finish in the next few days, pls ping |
a0ca190
to
d38bd8a
Compare
@@ -707,12 +708,28 @@ NaT = NaTType() | |||
|
|||
iNaT = util.get_nat() | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe blow away _checknull_with_nat
(and replace with _check_all_nulls
and see if any timings are affected?
Replacing wouldn't work in all cases, because sometimes you don't want But I see that So maybe when others or I clean up this module, we can see how to organize or deal with all the lower level scalar check null functions |
BUG: GH11206 where pd.isnull did not consider numpy NaT null
@kawochen thanks! always nice PR's from you! keep it up! |
closes #11206