Skip to content

.ne fails if comparing a list of columns containing column name 'dtype' #22383 #22416

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 31, 2018

Conversation

baidoosik
Copy link
Contributor

@baidoosik baidoosik commented Aug 19, 2018

hello, this is my first pr in open source.
so , i think some things have problem.

if this pr has problem , tell me. i will fix again.

this issue occur Dataframe has column defined 'dtype' when execute .ne function.

i asked some advice which way is more better.

i) when make dataframe instance column name is dtype, make exception.
ii) make exception in ne function.

i receive answer try first. so i try to modifiy _has_bool_dtype function.

@gfyoung gfyoung added Bug Indexing Related to indexing on series/frames, not to indexes themselves labels Aug 19, 2018
@@ -654,6 +654,8 @@ Indexing
- Fixed ``DataFrame[np.nan]`` when columns are non-unique (:issue:`21428`)
- Bug when indexing :class:`DatetimeIndex` with nanosecond resolution dates and timezones (:issue:`11679`)
- Bug where indexing with a Numpy array containing negative values would mutate the indexer (:issue:`21867`)
- Bug Dataframe.ne fails if columns containing column name 'dtype'

Copy link
Member

@gfyoung gfyoung Aug 19, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rewrite as follows:

:meth:`DataFrame.ne` fails if columns contain column name "dtype" (:issue:`22383`)
  • The "meth" syntax enables nicer rendering in our online docs.
  • We always reference an issue if possible with our whatsnew entries. The syntax is to enable a nicer rendering in our online docs.

@@ -442,3 +442,19 @@ def test_bool_ops_warn_on_arithmetic(self):
r = f(df, True)
e = fe(df, True)
tm.assert_frame_equal(r, e)

def test_bool_ops_column_name_dtype(self):
# GH 22383 - .ne fails if columns containing column name 'dtype'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: extra space after the dash.

@baidoosik
Copy link
Contributor Author

@gfyoung thank you your review :) i modified the code you reviewed.

and reason i rename 'dtype' to 'temporary_dtype' is when dataframe has 'dtype' column and call ne() function, dataframe's dtype member variable don't use and occur error as issue 22383.
so i temporarily change column name.

thank you!

@gfyoung
Copy link
Member

gfyoung commented Aug 19, 2018

Indeed, I saw! It seems like you have test failures (based on your previous commit). Should look into that.

@baidoosik
Copy link
Contributor Author

baidoosik commented Aug 19, 2018

image

@gfyoung
could i ask how to check test failures (based on your previous commit). i just run test file which i added test. shoud i check all tests?

thank you your help.

@baidoosik
Copy link
Contributor Author

@gfyoung ah, sorry. i understood now. i will check ci/circleci . thank you :)

@baidoosik
Copy link
Contributor Author

baidoosik commented Aug 19, 2018

@gfyoung just left continuous intergration test thanks to you. but i don't know how to run that test.
if error's cause is some branch hare merged , but i did not rebase, tell me you need rebase.
i'm really thank you!

@codecov
Copy link

codecov bot commented Aug 19, 2018

Codecov Report

Merging #22416 into master will not change coverage.
The diff coverage is 60%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master   #22416   +/-   ##
=======================================
  Coverage   31.89%   31.89%           
=======================================
  Files         166      166           
  Lines       52421    52421           
=======================================
  Hits        16722    16722           
  Misses      35699    35699
Flag Coverage Δ
#multiple 30.29% <60%> (ø) ⬆️
#single 31.89% <60%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/computation/expressions.py 58.82% <60%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 100ffff...1fe9eff. Read the comment docs.

@TomAugspurger
Copy link
Contributor

FYI, http://pandas-docs.github.io/pandas-docs-travis/contributing.html may be helpful.


def test_bool_ops_column_name_dtype(self):
# GH 22383 - .ne fails if columns containing column name 'dtype'
df_has_error = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call this df

[0, 1, 5, 'bb'], ['cc', 4, 4, 4]],
columns=['a', 'b', 'c', 'd'])
result = df_has_error.loc[:, ['a', 'dtype']].ne(df_has_error.loc[:,
['a', 'dtype']])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make an expected= line

instead of calling the function on a good df, just construct the resulting dataframe directly

@@ -442,3 +442,19 @@ def test_bool_ops_warn_on_arithmetic(self):
r = f(df, True)
e = fe(df, True)
tm.assert_frame_equal(r, e)

def test_bool_ops_column_name_dtype(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you parameterize on both eq and ne here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback thank you your advice~! i can not understand how to parameterize eq and ne and why do parameterize... sorry i am not good at undertstanding english... thank you!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@baidoosik : This is what @jreback is talking about:

https://docs.pytest.org/en/latest/parametrize.html

Perhaps that will help clarify?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pytest.mark.parametrize("index,has_tz", [
    (pd.date_range('2015-01-01 10:00', freq='D', periods=3,
                   tz='US/Eastern'), True),  # datetimetz
    (pd.timedelta_range('1 days', freq='D', periods=3), False),  # td
    (pd.period_range('2015-01-01', freq='D', periods=3), False)  # period
])

you recommend like above code style, is it right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! That's right.

@pep8speaks
Copy link

pep8speaks commented Sep 8, 2018

Hello @baidoosik! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on December 27, 2018 at 17:46 Hours UTC

@@ -442,3 +441,16 @@ def test_bool_ops_warn_on_arithmetic(self):
r = f(df, True)
e = fe(df, True)
tm.assert_frame_equal(r, e)

def test_has_bool_ops_column_name_dtype(self, eq, ne):
Copy link
Member

@gfyoung gfyoung Sep 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm...that's not quite what @jreback was looking for:

https://docs.pytest.org/en/latest/parametrize.html

The idea is that you have only one variable in your signature, and for each possible value of that variable (ne and eq in this case), you run the test.

@jreback
Copy link
Contributor

jreback commented Nov 23, 2018

@baidoosik can you merge master and update

@baidoosik
Copy link
Contributor Author

@jreback okay thank you ! i try it

@baidoosik
Copy link
Contributor Author

@jreback now my development environment is not properly working. so i will fix my environment and f test code to use parameterize style. and i will push. thank you!

@jreback
Copy link
Contributor

jreback commented Dec 3, 2018

@baidoosik can you merge master

@baidoosik
Copy link
Contributor Author

@jreback today i will update ! really sorry !

@jreback
Copy link
Contributor

jreback commented Dec 23, 2018

@baidoosik pls merge master and update

@baidoosik
Copy link
Contributor Author

@jreback today i merge master and apply to parameterize in test code..! thank you. if CI is failed, i will fix it !

@@ -160,6 +160,8 @@ def _where_numexpr(cond, a, b):

def _has_bool_dtype(x):
try:
if not isinstance(x.dtype, np.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh? what is this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback i change the code if x is dataframe type, dont access x.dtype. i will wait your review. thank you!

])
def test_bool_ops_column_name_dtype(self, test_input, expected):
# GH 22383 - .ne fails if columns containing column name 'dtype'
assert_frame_equal(test_input.loc[:, ['a', 'dtype']].ne(test_input.loc[:, ['a', 'dtype']]), expected)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use

result = 
expected = 
assert_frame_equal(result, expected)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback
if i follow your guide, i think code to be long. so, could i declare dataframe variable on the top?

example

_issued_frame = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']

@pytest.mark.parametrize("test_input,expected", [
    (_issued_frame.loc[:, ['a', 'dtype']].ne(_issued_frame.loc[:, ['a', 'dtype']]),
     DataFrame([[False, False], [False, False],
                [False, False], [False, False],
                [False, False]], columns=['a', 'dtype']))
])
def test_bool_ops_column_name_dtype(self, test_input, expected):
    # GH 22383 - .ne fails if columns containing column name 'dtype'
    assert_frame_equal(test_input, expected)

@@ -22,6 +22,13 @@

_frame = DataFrame(randn(10000, 4), columns=list('ABCD'), dtype='float64')
_frame2 = DataFrame(randn(100, 4), columns=list('ABCD'), dtype='float64')
_issued_frame = DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are only used once, so just construct them inside the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback if i construct them on @pytest.mark.parametrize's argument, code is so long. is it okay?
example)
@pytest.mark.parametrize("test_input,expected", [
(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']).
loc[:, ['a', 'dtype']].
ne(DataFrame([[0, 1, 2, 'aa'], [0, 1, 2, 'aa'],
[0, 1, 5, 'bb'], [0, 1, 5, 'bb'],
[0, 1, 5, 'bb']], columns=['a', 'b', 'c', 'dtype']).
loc[:, ['a', 'dtype']]),
DataFrame([[False, False], [False, False],
[False, False], [False, False],
[False, False]], columns=['a', 'dtype'])),

so, could you give me some tip to good code for this case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback i remove them. and construct in the test. thank you

columns=['a', 'b', 'c', 'dtype'])
.loc[:, ['a', 'dtype']]),
DataFrame([[False, False], [False, False],
[False, False]], columns=['a', 'dtype'])),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update as I indicated

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback i modify code! is it okay?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. couple of minor things. ping on green.

])
def test_bool_ops_column_name_dtype(self, test_input, expected):
# GH 22383 - .ne fails if columns containing column name 'dtype'
result = test_input.loc[:, ['a', 'dtype']].\
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use parens rather than \ around this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback i changed new line position

return x.dtype == bool
except AttributeError:
try:
if isinstance(x, DataFrame):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use
from pandas.core.dtypes.generic import ABCDataFrame
instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback ok! i fixed.

@jreback jreback added this to the 0.24.0 milestone Dec 30, 2018
@jreback jreback merged commit cae4616 into pandas-dev:master Dec 31, 2018
@jreback
Copy link
Contributor

jreback commented Dec 31, 2018

thanks!

@baidoosik
Copy link
Contributor Author

@jreback thanks you so much your kind review! thanks to your kindness, i contributed opensource at first! thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.ne fails with abiguous message if comparing a list of columns containing column name 'dtype'
5 participants