Skip to content

Comparisons result in different dtypes for empty DataFrames #15077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jcrist opened this issue Jan 7, 2017 · 1 comment
Closed

Comparisons result in different dtypes for empty DataFrames #15077

jcrist opened this issue Jan 7, 2017 · 1 comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Milestone

Comments

@jcrist
Copy link
Contributor

jcrist commented Jan 7, 2017

The comparison methods (lt, gt, etc...) return incorrect dtypes for empty dataframes. Interestingly, using the operators instead results in correct dtypes. Correct dtypes are also returned for empty series.

In [1]: import pandas as pd

In [2]: df = pd.DataFrame({'x': [1, 2, 3], 'y': [1., 2., 3.]})

In [3]: empty = df.iloc[:0]

In [4]: df.lt(2).dtypes
Out[4]:
x    bool
y    bool
dtype: object

In [5]: empty.lt(2).dtypes   # Should be all bool, but isn't
Out[5]:
x      int64
y    float64
dtype: object

In [6]: (df < 2).dtypes
Out[6]:
x    bool
y    bool
dtype: object

In [7]: (empty < 2).dtypes   # Things do work if you use the operator though
Out[7]:
x    bool
y    bool
dtype: object

In [8]: df.x.lt(2).dtype
Out[8]: dtype('bool')

In [9]: empty.x.lt(2).dtype    # Correct dtype for empty series
Out[9]: dtype('bool')

In [10]: pd.__version__
Out[10]: '0.19.2'
@TomAugspurger TomAugspurger added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Jan 7, 2017
@TomAugspurger TomAugspurger added this to the 0.20.0 milestone Jan 7, 2017
@mralgos
Copy link
Contributor

mralgos commented Jan 11, 2017

@jcrist If the dataframe is empty, the comparison methods against a constant do not perform any operation. They simply return the input dataframe.

To fix the problem it seems to be enough to remove the first if statement in _combine_const function in pandas/core/frame.py. I'm running the tests.

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this issue Mar 21, 2017
…as-dev#15077

closes pandas-dev#15077

Author: Giacomo Ferroni <[email protected]>
Author: Giacomo Ferroni <[email protected]>
Author: mralgos <[email protected]>

Closes pandas-dev#15115 from mralgos/gh15077 and squashes the following commits:

a5ca359 [mralgos] Merge branch 'master' into gh15077
dc0803b [Giacomo Ferroni] Merge branch 'master' into gh15077
b2f2d1e [Giacomo Ferroni] Merge branch 'gh15077' of https://github.com/mralgos/pandas into gh15077
fcbcb5b [Giacomo Ferroni] Apply review changes
9723c5d [Giacomo Ferroni] Merge branch 'master' into gh15077
eb7d9fd [Giacomo Ferroni] Delete blank lines
28437bb [Giacomo Ferroni] Check for bool dtype return added also for Series. Minor update to whatsnew
19296f1 [Giacomo Ferroni] Added test for gh15077 (cf. gh15115) and whatsnew note
ea11867 [Giacomo Ferroni] [gh15077] Bugfix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants