Skip to content

DataFrame.any(axis={0, 1}) returns inconsistent values for non-zero timedelta #17667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nmusolino opened this issue Sep 25, 2017 · 6 comments · Fixed by #28942
Closed

DataFrame.any(axis={0, 1}) returns inconsistent values for non-zero timedelta #17667

nmusolino opened this issue Sep 25, 2017 · 6 comments · Fixed by #28942
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Milestone

Comments

@nmusolino
Copy link
Contributor

nmusolino commented Sep 25, 2017

Steps to reproduce

In [1]: import pandas

In [2]: df = pandas.DataFrame({'a': pandas.Series([0, 0]),
   ...:                        't': pandas.Series([pandas.to_timedelta(0, 's'), pandas.to_timedelta(1, 'ms')])})

In [3]: df
Out[3]:
   a               t
0  0        00:00:00
1  0 00:00:00.001000

In [4]: df.dtypes
Out[4]:
a              int64
t    timedelta64[ns]
dtype: object

In [5]: df.any(axis=0)
Out[5]:
a    False
t     True
dtype: bool

In [6]: df.any(axis=1)
Out[6]:
0    False
1    False
dtype: bool

Problem description

pandas.DataFrame.any() returns "whether any element is True over the selected axis".

In the example above, the results of df.any(axis=ax) is clearly inconsistent for the two axis directions (cells [5] and [6]).

If an element in column "t" leads to a result of True in df.any(axis=0), then the same element should lead to a result of True in df.any(axis=1).

Expected output

As a user, I expected DataFrame.any() to return True for rows or columns with a non-zero timedelta value. That is, I expected a timedelta column to have the same semantics as integers: zero values are treated as False, while other values are treated as True.

In other words, this is the output I expected:

In [6]: df.any(axis=1)
Out[6]:
0    False
1    True
dtype: bool

Output of pd.show_versions()

In [7]: pandas.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.4.5.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.19.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.8.2
IPython: 5.1.0
sphinx: 1.4.8
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.7
blosc: 1.5.0
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.4.0
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: 0.9.3
lxml: 3.6.4
bs4: 4.5.3
html5lib: 0.999
httplib2: 0.9.2
apiclient: None
sqlalchemy: 1.1.3
pymysql: None
psycopg2: 2.6.2 (dt dec pq3 ext lo64)
jinja2: 2.8
boto: 2.43.0
pandas_datareader: None
@nmusolino nmusolino changed the title DataFrame.any() returns inconsistent values for non-zero timedelta depending on the value of axis DataFrame.any(axis={0, 1}) returns inconsistent values for non-zero timedelta Sep 25, 2017
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Sep 25, 2017
@louispotok
Copy link
Contributor

louispotok commented Sep 29, 2017

I agree this behavior seems odd. I was surprised to discover that the code is explicitly checking for this situation. This is the relevant code: https://github.com/pandas-dev/pandas/blob/v0.20.3/pandas/core/frame.py#L5055

if axis == 1 and self._is_mixed_type and self._is_datelike_mixed_type:
    numeric_only = True

This check was added here, though I don't know why.

@jreback
Copy link
Contributor

jreback commented Oct 1, 2017

DataFrame._reduce is quite complex because it handles many things. This should actually work block-by-block and dispatch based on the uniform dtype. see how something like isna works on the internals. This would be very similar.

@mroeschke
Copy link
Member

This looks fixed on master. Could use a test.

In [47]: df.any(axis=1)
Out[47]:
0    False
1     True
dtype: bool

In [48]: pd.__version__
Out[48]: '0.26.0.dev0+533.gd8f9be7e3'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Difficulty Intermediate Timedelta Timedelta data type labels Oct 11, 2019
@rohitsanj
Copy link
Contributor

@mroeschke: Should the test be specifically for this dataframe:

df = pandas.DataFrame({'a': pandas.Series([0, 0]),
't': pandas.Series([pandas.to_timedelta(0, 's'), pandas.to_timedelta(1, 'ms')])})`

@mroeschke
Copy link
Member

@rohitsanj correct

@rohitsanj
Copy link
Contributor

rohitsanj commented Oct 12, 2019

@mroeschke I've submitted a PR to add the test - #28942
Please take a look and let me know what you think. Thanks!

@simonjayhawkins simonjayhawkins modified the milestones: Contributions Welcome, 1.0 Oct 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants