Skip to content

.ne fails if comparing a list of columns containing column name 'dtype' #22383 #22416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 31, 2018
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1426,6 +1426,7 @@ Interval
Indexing
^^^^^^^^

- Bug in :meth:`DataFrame.ne` fails if columns contain column name "dtype" (:issue:`22383`)
- The traceback from a ``KeyError`` when asking ``.loc`` for a single missing label is now shorter and more clear (:issue:`21557`)
- :class:`PeriodIndex` now emits a ``KeyError`` when a malformed string is looked up, which is consistent with the behavior of :class:`DatetimeIndex` (:issue:`22803`)
- When ``.ix`` is asked for a missing integer label in a :class:`MultiIndex` with a first level of integer type, it now raises a ``KeyError``, consistently with the case of a flat :class:`Int64Index`, rather than falling back to positional indexing (:issue:`21593`)
Expand Down
11 changes: 6 additions & 5 deletions pandas/core/computation/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@

import numpy as np

from pandas import DataFrame
import pandas.core.common as com
from pandas.core.computation.check import _NUMEXPR_INSTALLED
from pandas.core.config import get_option
Expand Down Expand Up @@ -160,12 +161,12 @@ def _where_numexpr(cond, a, b):

def _has_bool_dtype(x):
try:
return x.dtype == bool
except AttributeError:
try:
if isinstance(x, DataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use
from pandas.core.dtypes.generic import ABCDataFrame
instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback ok! i fixed.

return 'bool' in x.dtypes
except AttributeError:
return isinstance(x, (bool, np.bool_))
else:
return x.dtype == bool
except AttributeError:
return isinstance(x, (bool, np.bool_))


def _bool_arith_check(op_str, a, b, not_allowed=frozenset(('/', '//', '**')),
Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/test_expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,13 @@

_frame = DataFrame(randn(10000, 4), columns=list('ABCD'), dtype='float64')
_frame2 = DataFrame(randn(100, 4), columns=list('ABCD'), dtype='float64')
_issued_frame = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are only used once, so just construct them inside the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback if i construct them on @pytest.mark.parametrize's argument, code is so long. is it okay?
example)
@pytest.mark.parametrize("test_input,expected", [
(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']).
loc[:, ['a', 'dtype']].
ne(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']).
loc[:, ['a', 'dtype']]),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False]], columns=['a', 'dtype'])),

so, could you give me some tip to good code for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback i remove them. and construct in the test. thank you

[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype'])
_issued_frame2 = DataFrame([[0, 3, 2, 'aa'], [0, 4, 2, 'aa'],
[0, 1, 1, 'bb'], [0, 1, 2, 'bb'],
[0, 2, 9, 'bb'], ['cc', 4, 3, 1]],
columns=['a', 'b', 'c', 'dtype'])
_mixed = DataFrame({'A': _frame['A'].copy(),
'B': _frame['B'].astype('float32'),
'C': _frame['C'].astype('int64'),
Expand Down Expand Up @@ -443,3 +450,20 @@ def test_bool_ops_warn_on_arithmetic(self):
r = f(df, True)
e = fe(df, True)
tm.assert_frame_equal(r, e)

@pytest.mark.parametrize("test_input,expected", [
(_issued_frame.loc[:, ['a', 'dtype']].
ne(_issued_frame.loc[:, ['a', 'dtype']]),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False]], columns=['a', 'dtype'])),
(_issued_frame2.loc[:, ['a', 'dtype']].
ne(_issued_frame2.loc[:, ['a', 'dtype']]),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False], [False, False]],
columns=['a', 'dtype'])),
])
def test_bool_ops_column_name_dtype(self, test_input, expected):
# GH 22383 - .ne fails if columns containing column name 'dtype'
assert_frame_equal(test_input, expected)