Skip to content

.ne fails if comparing a list of columns containing column name 'dtype' #22383 #22416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 31, 2018
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1420,6 +1420,7 @@ Interval
Indexing
^^^^^^^^

- Bug in :meth:`DataFrame.ne` fails if columns contain column name "dtype" (:issue:`22383`)
- The traceback from a ``KeyError`` when asking ``.loc`` for a single missing label is now shorter and more clear (:issue:`21557`)
- :class:`PeriodIndex` now emits a ``KeyError`` when a malformed string is looked up, which is consistent with the behavior of :class:`DatetimeIndex` (:issue:`22803`)
- When ``.ix`` is asked for a missing integer label in a :class:`MultiIndex` with a first level of integer type, it now raises a ``KeyError``, consistently with the case of a flat :class:`Int64Index`, rather than falling back to positional indexing (:issue:`21593`)
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/computation/expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,8 @@ def _where_numexpr(cond, a, b):

def _has_bool_dtype(x):
try:
if not isinstance(x.dtype, np.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? what is this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback i change the code if x is dataframe type, dont access x.dtype. i will wait your review. thank you!

x = x.rename({'dtype': 'temporary_dtype'}, axis=1)
return x.dtype == bool
except AttributeError:
try:
Expand Down
20 changes: 20 additions & 0 deletions pandas/tests/test_expressions.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,3 +443,23 @@ def test_bool_ops_warn_on_arithmetic(self):
r = f(df, True)
e = fe(df, True)
tm.assert_frame_equal(r, e)

@pytest.mark.parametrize("test_input,expected", [
(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False]], columns=['a', 'dtype'])),
(DataFrame([[0, 3, 2, 'aa'], [0, 4, 2, 'aa'],
[0, 1, 1, 'bb'], [0, 1, 2, 'bb'],
[0, 2, 9, 'bb'], ['cc', 4, 3, 1]],
columns=['a', 'b', 'c', 'dtype']),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False], [False, False]],
columns=['a', 'dtype'])),
])
def test_bool_ops_column_name_dtype(self, test_input, expected):
# GH 22383 - .ne fails if columns containing column name 'dtype'
assert_frame_equal(test_input.loc[:, ['a', 'dtype']].ne(test_input.loc[:, ['a', 'dtype']]), expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

result = 
expected = 
assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
if i follow your guide, i think code to be long. so, could i declare dataframe variable on the top?

example

_issued_frame = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']

@pytest.mark.parametrize("test_input,expected", [
    (_issued_frame.loc[:, ['a', 'dtype']].ne(_issued_frame.loc[:, ['a', 'dtype']]),
     DataFrame([[False, False], [False, False],
                [False, False], [False, False],
                [False, False]], columns=['a', 'dtype']))
])
def test_bool_ops_column_name_dtype(self, test_input, expected):
    # GH 22383 - .ne fails if columns containing column name 'dtype'
    assert_frame_equal(test_input, expected)