Skip to content

BUG: Pandas any() returning false with true values present (GH #23070) #24434

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Dec 30, 2018

Conversation

makbigc
Copy link
Contributor

@makbigc makbigc commented Dec 26, 2018

@@ -7291,7 +7291,7 @@ def f(x):
if filter_type is None or filter_type == 'numeric':
data = self._get_numeric_data()
elif filter_type == 'bool':
data = self._get_bool_data()
data = self._get_data()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just make this:

data = self

no need for other things here

@@ -1394,6 +1394,23 @@ def test_any_all_extra(self):
# df.any(1, bool_only=True)
# df.all(1, bool_only=True)

def test_any_datetime(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you remove the commented code above (or see if it does anything)

@jreback jreback added Bug Dtype Conversions Unexpected or buggy dtype conversions labels Dec 27, 2018
@codecov
Copy link

codecov bot commented Dec 27, 2018

Codecov Report

Merging #24434 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24434      +/-   ##
==========================================
- Coverage    92.3%    92.3%   -0.01%     
==========================================
  Files         163      163              
  Lines       51987    51982       -5     
==========================================
- Hits        47989    47984       -5     
  Misses       3998     3998
Flag Coverage Δ
#multiple 90.71% <100%> (-0.01%) ⬇️
#single 42.99% <33.33%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/internals/managers.py 95.94% <100%> (+0.01%) ⬆️
pandas/core/frame.py 96.91% <100%> (ø) ⬆️
pandas/core/generic.py 96.63% <100%> (ø) ⬆️
pandas/core/tools/datetimes.py 85.58% <0%> (-0.18%) ⬇️
pandas/core/arrays/interval.py 93.02% <0%> (-0.03%) ⬇️
pandas/core/arrays/base.py 97.59% <0%> (-0.02%) ⬇️
pandas/core/indexes/datetimelike.py 97.58% <0%> (-0.01%) ⬇️
pandas/core/arrays/categorical.py 95.34% <0%> (-0.01%) ⬇️
pandas/core/indexes/multi.py 95.58% <0%> (-0.01%) ⬇️
pandas/core/indexes/base.py 96.27% <0%> (-0.01%) ⬇️
... and 2 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 08c920e...85a702d. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 27, 2018

Codecov Report

Merging #24434 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #24434      +/-   ##
==========================================
- Coverage   92.31%   92.31%   -0.01%     
==========================================
  Files         166      166              
  Lines       52335    52412      +77     
==========================================
+ Hits        48313    48382      +69     
- Misses       4022     4030       +8
Flag Coverage Δ
#multiple 90.73% <100%> (-0.01%) ⬇️
#single 43.05% <0%> (ø) ⬆️
Impacted Files Coverage Δ
pandas/core/frame.py 96.91% <100%> (ø) ⬆️
pandas/core/indexes/datetimelike.py 96.23% <0%> (-1.37%) ⬇️
pandas/core/indexes/period.py 92.46% <0%> (-0.23%) ⬇️
pandas/core/arrays/datetimelike.py 95.5% <0%> (-0.16%) ⬇️
pandas/core/arrays/datetimes.py 97.74% <0%> (-0.03%) ⬇️
pandas/core/indexes/datetimes.py 96.33% <0%> (+0.19%) ⬆️
pandas/core/indexes/timedeltas.py 90.65% <0%> (+0.4%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update aeff38d...9007017. Read the comment docs.

@@ -1369,6 +1369,7 @@ Timezones
- Bug in :class:`Timestamp` constructor where a ``dateutil.tz.tzutc`` timezone passed with a ``datetime.datetime`` argument would be converted to a ``pytz.UTC`` timezone (:issue:`23807`)
- Bug in :func:`to_datetime` where ``utc=True`` was not respected when specifying a ``unit`` and ``errors='ignore'`` (:issue:`23758`)
- Bug in :func:`to_datetime` where ``utc=True`` was not respected when passing a :class:`Timestamp` (:issue:`24415`)
- Bug in :meth:`DataFrame.any` returns wrong value when the axis is one and the data is of datetime type (:issue:`23070`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in double-backticks say axis=1 and data type is datetimelike

@makbigc
Copy link
Contributor Author

makbigc commented Dec 29, 2018

The test_any_all is failed in python2. An error is raised in line 239.

class NonzeroFail(object):
def __nonzero__(self):
raise ValueError
mixed['_nonzero_fail_'] = NonzeroFail()
if has_bool_only:
getattr(mixed, opname)(axis=0, bool_only=True)
getattr(mixed, opname)(axis=1, bool_only=True)

  1. Reason why test_any_all is passed in python3, but failed in python2
    The class NonzeroFail implements the __nonzero__ method. In python2, bool() calls __nonzero__ method which raises the error. In python3, bool calls __bool__ and then __len__. If none of the above methods were implemented, True is returned. https://docs.python.org/3/reference/datamodel.html

  2. Reason why test_any_all is failed in this PR
    self is given to data, instead of self._get_bool_data. _get_bool_data gets rid of the object block of NonzeroFail.

pandas/pandas/core/frame.py

Lines 7293 to 7294 in 70bc919

elif filter_type == 'bool':
data = self._get_bool_data()

The test with DataFrame containing NonzeroFail was passed due to the bug we are fixing. So in the comment block we deleted, the writer had been surprised by that the test with NonzeroFail would pass. The NonzeroFail is removed from the test.

@jreback jreback added this to the 0.24.0 milestone Dec 30, 2018
@jreback jreback merged commit 36ab8c9 into pandas-dev:master Dec 30, 2018
@jreback
Copy link
Contributor

jreback commented Dec 30, 2018

thanks @makbigc

thoo added a commit to thoo/pandas that referenced this pull request Dec 30, 2018
* upstream/master:
  DOC: Fixing broken references in the docs (pandas-dev#24497)
  DOC: Splitting api.rst in several files (pandas-dev#24462)
  Fix misdescription in escapechar (pandas-dev#24490)
  Floor and ceil methods during pandas.eval which are provided by numexpr (pandas-dev#24355)
  BUG: Pandas any() returning false with true values present (GH pandas-dev#23070) (pandas-dev#24434)
  Misc separable pieces of pandas-dev#24024 (pandas-dev#24488)
  use capsys.readouterr() as named tuple (pandas-dev#24489)
  REF/TST: replace capture_stderr with pytest capsys fixture (pandas-dev#24496)
  TST- Fixing issue with test_parquet test unexpectedly passing (pandas-dev#24480)
  DOC: Doc build for a single doc made much faster, and clean up (pandas-dev#24428)
  BUG: Fix+test timezone-preservation in DTA.repeat (pandas-dev#24483)
  Implement reductions from pandas-dev#24024 (pandas-dev#24484)
thoo added a commit to thoo/pandas that referenced this pull request Dec 30, 2018
…strings

* upstream/master:
  TST: Skip db tests unless explicitly specified in -m pattern (pandas-dev#24492)
  Mix EA into DTA/TDA; part of 24024 (pandas-dev#24502)
  DOC: Fix building of a single API document (pandas-dev#24506)
  DOC: Fixing broken references in the docs (pandas-dev#24497)
  DOC: Splitting api.rst in several files (pandas-dev#24462)
  Fix misdescription in escapechar (pandas-dev#24490)
  Floor and ceil methods during pandas.eval which are provided by numexpr (pandas-dev#24355)
  BUG: Pandas any() returning false with true values present (GH pandas-dev#23070) (pandas-dev#24434)
  Misc separable pieces of pandas-dev#24024 (pandas-dev#24488)
  use capsys.readouterr() as named tuple (pandas-dev#24489)
  REF/TST: replace capture_stderr with pytest capsys fixture (pandas-dev#24496)
  TST- Fixing issue with test_parquet test unexpectedly passing (pandas-dev#24480)
  DOC: Doc build for a single doc made much faster, and clean up (pandas-dev#24428)
  BUG: Fix+test timezone-preservation in DTA.repeat (pandas-dev#24483)
  Implement reductions from pandas-dev#24024 (pandas-dev#24484)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas any() returning false with true values present
2 participants