Skip to content

PERF: remove auto-boxing on isnull/notnull #5154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 8, 2013
Merged

Conversation

jreback
Copy link
Contributor

@jreback jreback commented Oct 8, 2013

  • API/CLN: provide isnull/notnull on NDFrame objects
  • PERF: remove auto-boxing on isnull/notnull

related #5135

PERF: remove auto-boxing on isnull/notnull
@jreback jreback mentioned this pull request Oct 8, 2013
13 tasks
jreback added a commit that referenced this pull request Oct 8, 2013
PERF: remove auto-boxing on isnull/notnull
@jreback jreback merged commit 35b0b0e into pandas-dev:master Oct 8, 2013
@jtratner
Copy link
Contributor

jtratner commented Oct 8, 2013

hold on - this is a not-insignificant change here - doesn't this mean you lose indexing on isnull() and notnull()? I think it's fine to have pd.isnull() return unboxed, but NDFrame.isnull() and NDFrame.notnull() should definitely box by default. Maybe it could just short-circuit the constructor instead? The boxing behavior is available on 0.10 and later, so it's not reasonable to change without discussing. You lose a lot of expressiveness this way. Maybe you could add a keyword argument "box=True" to it so that internals can call and not box result.

@jtratner
Copy link
Contributor

jtratner commented Oct 8, 2013

(I'm assuming it's available in 0.11 and 0.12, can't test it right now)

@jreback
Copy link
Contributor Author

jreback commented Oct 8, 2013

@jtratner I preserved the orginal behavior (I had actually changed this in internal refactoring for 0.13). the isnull called directly don't need to actually create a Series (I was going to add a box argument, but its not necessary.

This doesn't affect > 1dim objects anyhow didn't even have df.isnull(); pd.isnull(df) returns the correct (boxed version)
Here's 0.12

In [1]: s = Series([1,np.nan])

In [2]: df = DataFrame(dict(A = s))

In [3]: s.isnull()
Out[3]: 
0    False
1     True
dtype: bool

In [4]: df.isnull()
AttributeError

In [5]: pd.isnull(s)
Out[5]: 
0    False
1     True
dtype: bool

In [6]: pd.isnull(df)
Out[6]: 
       A
0  False
1   True

The difference is:
a Series will not box (this happens to work in 0.12)

In [5]: pd.isnull(s)
Out[5]: array([False,  True], dtype=bool)

I could make this work (but will need a box keyword that is True by default by set False by almost
every routine that used this internally. Which I think is a needless complication.

The API is actually much cleaner as isnull/notnull exist on all objects.

and boolean indexing will work exactly the same. I don't actually think this is an API change at all.

@jreback
Copy link
Contributor Author

jreback commented Oct 8, 2013

hmm...on second thought it DOES box Series.....ok...easy enough to do

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants