.ne fails if comparing a list of columns containing column name 'dtype' #22383 #22416

baidoosik · 2018-08-19T06:53:34Z

hello, this is my first pr in open source.
so , i think some things have problem.

if this pr has problem , tell me. i will fix again.

this issue occur Dataframe has column defined 'dtype' when execute .ne function.

i asked some advice which way is more better.

i) when make dataframe instance column name is dtype, make exception.
ii) make exception in ne function.

i receive answer try first. so i try to modifiy _has_bool_dtype function.

closes .ne fails with abiguous message if comparing a list of columns containing column name 'dtype' #22383
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

gfyoung · 2018-08-19T07:19:30Z

doc/source/whatsnew/v0.24.0.txt

@@ -654,6 +654,8 @@ Indexing
 - Fixed ``DataFrame[np.nan]`` when columns are non-unique (:issue:`21428`)
 - Bug when indexing :class:`DatetimeIndex` with nanosecond resolution dates and timezones (:issue:`11679`)
 - Bug where indexing with a Numpy array containing negative values would mutate the indexer (:issue:`21867`)
+- Bug Dataframe.ne fails if columns containing column name 'dtype'
+


Rewrite as follows:

:meth:`DataFrame.ne` fails if columns contain column name "dtype" (:issue:`22383`)

The "meth" syntax enables nicer rendering in our online docs.

We always reference an issue if possible with our whatsnew entries. The syntax is to enable a nicer rendering in our online docs.

pandas/core/computation/expressions.py

gfyoung · 2018-08-19T07:21:52Z

pandas/tests/test_expressions.py

@@ -442,3 +442,19 @@ def test_bool_ops_warn_on_arithmetic(self):
                    r = f(df, True)
                    e = fe(df, True)
                    tm.assert_frame_equal(r, e)
+
+    def test_bool_ops_column_name_dtype(self):
+        # GH 22383 -  .ne fails if columns containing column name 'dtype'


Nit: extra space after the dash.

baidoosik · 2018-08-19T10:11:57Z

@gfyoung thank you your review :) i modified the code you reviewed.

and reason i rename 'dtype' to 'temporary_dtype' is when dataframe has 'dtype' column and call ne() function, dataframe's dtype member variable don't use and occur error as issue 22383.
so i temporarily change column name.

thank you!

gfyoung · 2018-08-19T10:13:01Z

Indeed, I saw! It seems like you have test failures (based on your previous commit). Should look into that.

baidoosik · 2018-08-19T11:22:27Z

@gfyoung
could i ask how to check test failures (based on your previous commit). i just run test file which i added test. shoud i check all tests?

thank you your help.

baidoosik · 2018-08-19T12:48:26Z

@gfyoung ah, sorry. i understood now. i will check ci/circleci . thank you :)

baidoosik · 2018-08-19T14:52:44Z

@gfyoung just left continuous intergration test thanks to you. but i don't know how to run that test.
if error's cause is some branch hare merged , but i did not rebase, tell me you need rebase.
i'm really thank you!

codecov · 2018-08-19T14:57:34Z

Codecov Report

Merging #22416 into master will not change coverage.
The diff coverage is 60%.

@@           Coverage Diff           @@
##           master   #22416   +/-   ##
=======================================
  Coverage   31.89%   31.89%           
=======================================
  Files         166      166           
  Lines       52421    52421           
=======================================
  Hits        16722    16722           
  Misses      35699    35699

Flag	Coverage Δ
#multiple	`30.29% <60%> (ø)`	⬆️
#single	`31.89% <60%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/computation/expressions.py	`58.82% <60%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 100ffff...1fe9eff. Read the comment docs.

pandas/core/computation/expressions.py

TomAugspurger · 2018-08-21T01:56:01Z

FYI, http://pandas-docs.github.io/pandas-docs-travis/contributing.html may be helpful.

jreback · 2018-08-23T10:29:46Z

pandas/tests/test_expressions.py

+
+    def test_bool_ops_column_name_dtype(self):
+        # GH 22383 - .ne fails if columns containing column name 'dtype'
+        df_has_error = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],


call this df

jreback · 2018-08-23T10:30:10Z

pandas/tests/test_expressions.py

+                        [0, 1, 5, 'bb'], ['cc', 4, 4, 4]],
+                       columns=['a', 'b', 'c', 'd'])
+        result = df_has_error.loc[:, ['a', 'dtype']].ne(df_has_error.loc[:,
+                                                        ['a', 'dtype']])


make an expected= line

instead of calling the function on a good df, just construct the resulting dataframe directly

jreback · 2018-08-23T10:30:50Z

pandas/tests/test_expressions.py

@@ -442,3 +442,19 @@ def test_bool_ops_warn_on_arithmetic(self):
                    r = f(df, True)
                    e = fe(df, True)
                    tm.assert_frame_equal(r, e)
+
+    def test_bool_ops_column_name_dtype(self):


can you parameterize on both eq and ne here

@jreback thank you your advice~! i can not understand how to parameterize eq and ne and why do parameterize... sorry i am not good at undertstanding english... thank you!

@baidoosik : This is what @jreback is talking about:

https://docs.pytest.org/en/latest/parametrize.html

Perhaps that will help clarify?

@pytest.mark.parametrize("index,has_tz", [ (pd.date_range('2015-01-01 10:00', freq='D', periods=3, tz='US/Eastern'), True), # datetimetz (pd.timedelta_range('1 days', freq='D', periods=3), False), # td (pd.period_range('2015-01-01', freq='D', periods=3), False) # period ])

you recommend like above code style, is it right?

Yes! That's right.

pep8speaks · 2018-09-08T05:25:39Z

Hello @baidoosik! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 27, 2018 at 17:46 Hours UTC

gfyoung · 2018-09-08T05:30:02Z

pandas/tests/test_expressions.py

@@ -442,3 +441,16 @@ def test_bool_ops_warn_on_arithmetic(self):
                    r = f(df, True)
                    e = fe(df, True)
                    tm.assert_frame_equal(r, e)
+
+    def test_has_bool_ops_column_name_dtype(self, eq, ne):


Hmm...that's not quite what @jreback was looking for:

https://docs.pytest.org/en/latest/parametrize.html

The idea is that you have only one variable in your signature, and for each possible value of that variable (ne and eq in this case), you run the test.

jreback · 2018-11-23T21:19:28Z

@baidoosik can you merge master and update

baidoosik · 2018-11-25T01:58:05Z

@jreback okay thank you ! i try it

baidoosik · 2018-11-25T03:13:03Z

@jreback now my development environment is not properly working. so i will fix my environment and f test code to use parameterize style. and i will push. thank you!

jreback · 2018-12-03T01:41:28Z

@baidoosik can you merge master

baidoosik · 2018-12-04T07:03:50Z

@jreback today i will update ! really sorry !

jreback · 2018-12-23T23:11:16Z

@baidoosik pls merge master and update

baidoosik · 2018-12-25T15:56:33Z

@jreback today i merge master and apply to parameterize in test code..! thank you. if CI is failed, i will fix it !

jreback · 2018-12-25T16:34:18Z

pandas/core/computation/expressions.py

@@ -160,6 +160,8 @@ def _where_numexpr(cond, a, b):

 def _has_bool_dtype(x):
    try:
+        if not isinstance(x.dtype, np.dtype):


huh? what is this

@jreback i change the code if x is dataframe type, dont access x.dtype. i will wait your review. thank you!

jreback · 2018-12-25T16:34:44Z

pandas/tests/test_expressions.py

+    ])
+    def test_bool_ops_column_name_dtype(self, test_input, expected):
+        # GH 22383 - .ne fails if columns containing column name 'dtype'
+        assert_frame_equal(test_input.loc[:, ['a', 'dtype']].ne(test_input.loc[:, ['a', 'dtype']]), expected)


use

result = expected = assert_frame_equal(result, expected)

@jreback
if i follow your guide, i think code to be long. so, could i declare dataframe variable on the top?

example

_issued_frame = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']

@pytest.mark.parametrize("test_input,expected", [ (_issued_frame.loc[:, ['a', 'dtype']].ne(_issued_frame.loc[:, ['a', 'dtype']]), DataFrame([[False, False], [False, False], [False, False], [False, False], [False, False]], columns=['a', 'dtype'])) ]) def test_bool_ops_column_name_dtype(self, test_input, expected): # GH 22383 - .ne fails if columns containing column name 'dtype' assert_frame_equal(test_input, expected)

jreback · 2018-12-27T17:01:44Z

pandas/tests/test_expressions.py

@@ -22,6 +22,13 @@

 _frame = DataFrame(randn(10000, 4), columns=list('ABCD'), dtype='float64')
 _frame2 = DataFrame(randn(100, 4), columns=list('ABCD'), dtype='float64')
+_issued_frame = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],


these are only used once, so just construct them inside the test

@jreback if i construct them on @pytest.mark.parametrize's argument, code is so long. is it okay?
example)
@pytest.mark.parametrize("test_input,expected", [
(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']).
loc[:, ['a', 'dtype']].
ne(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']).
loc[:, ['a', 'dtype']]),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False]], columns=['a', 'dtype'])),

so, could you give me some tip to good code for this case?

@jreback i remove them. and construct in the test. thank you

jreback · 2018-12-29T15:51:04Z

pandas/tests/test_expressions.py

+                      columns=['a', 'b', 'c', 'dtype'])
+            .loc[:, ['a', 'dtype']]),
+         DataFrame([[False, False], [False, False],
+                   [False, False]], columns=['a', 'dtype'])),


can you update as I indicated

@jreback i modify code! is it okay?

jreback

looks good. couple of minor things. ping on green.

jreback · 2018-12-30T20:18:06Z

pandas/tests/test_expressions.py

+    ])
+    def test_bool_ops_column_name_dtype(self, test_input, expected):
+        # GH 22383 - .ne fails if columns containing column name 'dtype'
+        result = test_input.loc[:, ['a', 'dtype']].\


can you use parens rather than \ around this.

@jreback i changed new line position

jreback · 2018-12-30T20:18:48Z

pandas/core/computation/expressions.py

-        return x.dtype == bool
-    except AttributeError:
-        try:
+        if isinstance(x, DataFrame):


can you use
from pandas.core.dtypes.generic import ABCDataFrame
instead

@jreback ok! i fixed.

jreback · 2018-12-31T13:19:09Z

thanks!

baidoosik · 2018-12-31T14:44:40Z

@jreback thanks you so much your kind review! thanks to your kindness, i contributed opensource at first! thank you!

pandas-dev#22383 (pandas-dev#22416)

gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 19, 2018

gfyoung reviewed Aug 19, 2018

View reviewed changes

pandas/core/computation/expressions.py Show resolved Hide resolved

gfyoung reviewed Aug 19, 2018

View reviewed changes

baidoosik force-pushed the myfirstissue branch from b5b1660 to c0d88b8 Compare August 19, 2018 14:57

baidoosik force-pushed the myfirstissue branch from c0d88b8 to 621fb1d Compare August 20, 2018 13:12

TomAugspurger reviewed Aug 21, 2018

View reviewed changes

pandas/core/computation/expressions.py Show resolved Hide resolved

jreback requested changes Aug 23, 2018

View reviewed changes

gfyoung reviewed Sep 8, 2018

View reviewed changes

fix test code and update master

f0e1fbb

baidoosik force-pushed the myfirstissue branch from 36ab522 to f0e1fbb Compare December 25, 2018 15:53

jreback requested changes Dec 25, 2018

View reviewed changes

baidoosik added 2 commits December 27, 2018 15:24

Merge branch 'master' into myfirstissue

0b11c33

reflect review

cf9ed13

fix import sorting error

17c0b9e

jreback requested changes Dec 27, 2018

View reviewed changes

baidoosik added 2 commits December 28, 2018 02:42

Merge branch 'master' into myfirstissue

07b90f6

delete global variable

7e586ce

jreback requested changes Dec 29, 2018

View reviewed changes

baidoosik added 2 commits December 30, 2018 10:45

Merge branch 'master' into myfirstissue

e49f1dd

reflect review

390f93e

jreback requested changes Dec 30, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Dec 30, 2018

baidoosik added 3 commits December 31, 2018 12:06

Merge branch 'master' into myfirstissue

4d0a515

replace DataFrame with ABCDataFrame

5546cae

fix parens

1fe9eff

jreback approved these changes Dec 31, 2018

View reviewed changes

jreback merged commit cae4616 into pandas-dev:master Dec 31, 2018

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

.ne fails if comparing a list of columns containing column name 'dtype'

ee0a7f1

pandas-dev#22383 (pandas-dev#22416)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

.ne fails if comparing a list of columns containing column name 'dtype'

d232832

pandas-dev#22383 (pandas-dev#22416)

.ne fails if comparing a list of columns containing column name 'dtype' #22383 #22416

.ne fails if comparing a list of columns containing column name 'dtype' #22383 #22416

Conversation

baidoosik commented Aug 19, 2018 • edited by gfyoung Loading

gfyoung Aug 19, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

baidoosik commented Aug 19, 2018

gfyoung commented Aug 19, 2018

baidoosik commented Aug 19, 2018 • edited Loading

baidoosik commented Aug 19, 2018

baidoosik commented Aug 19, 2018 • edited Loading

codecov bot commented Aug 19, 2018 • edited Loading

Codecov Report

TomAugspurger commented Aug 21, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Sep 8, 2018 • edited Loading

Comment last updated on December 27, 2018 at 17:46 Hours UTC

gfyoung Sep 8, 2018 • edited Loading

Choose a reason for hiding this comment

jreback commented Nov 23, 2018

baidoosik commented Nov 25, 2018

baidoosik commented Nov 25, 2018

jreback commented Dec 3, 2018

baidoosik commented Dec 4, 2018

jreback commented Dec 23, 2018

baidoosik commented Dec 25, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 31, 2018

baidoosik commented Dec 31, 2018

baidoosik commented Aug 19, 2018 •

edited by gfyoung

Loading

gfyoung Aug 19, 2018 •

edited

Loading

baidoosik commented Aug 19, 2018 •

edited

Loading

baidoosik commented Aug 19, 2018 •

edited

Loading

codecov bot commented Aug 19, 2018 •

edited

Loading

pep8speaks commented Sep 8, 2018 •

edited

Loading

gfyoung Sep 8, 2018 •

edited

Loading