Skip to content

BUG: tz-aware datetime with column-wise comparisions failing with np.minmum/maximum #15552

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adbull opened this issue Mar 2, 2017 · 5 comments · Fixed by #27367
Closed

BUG: tz-aware datetime with column-wise comparisions failing with np.minmum/maximum #15552

adbull opened this issue Mar 2, 2017 · 5 comments · Fixed by #27367
Assignees
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations Timezones Timezone data dtype
Milestone

Comments

@adbull
Copy link
Contributor

adbull commented Mar 2, 2017

Code Sample, a copy-pastable example if possible

>>> import numpy as np
>>> import pandas as pd

>>> naive = pd.to_datetime(['now'])
>>> utc = naive.tz_localize('UTC')

>>> np.minimum(naive, naive)
>>> np.minimum(utc, utc)
>>> np.minimum(pd.Series(naive), pd.Series(naive))
>>> np.minimum(pd.Series(utc), pd.Series(utc))

  File "bug.py", line 10, in <module>
    np.minimum(pd.Series(utc), pd.Series(utc))
  File "~/anaconda3/lib/python3.5/site-packages/pandas/core/series.py", line 498, in __array_prepare__
    op=context[0].__name__))
TypeError: Series with dtype datetime64[ns, UTC] cannot perform the numpy op minimum

Problem description

When a tz-aware datetime is placed in a Series, the numpy operations fmin/fmax/minimum/maximum throw an error, even though these operations work fine on a tz-aware DatetimeIndex.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Linux
OS-release: 4.9.8-100.fc24.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: C
LOCALE: None.None

pandas: 0.19.2
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.8.0
xarray: 0.9.1
IPython: 4.2.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.3.0
numexpr: 2.6.2
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: None

@jreback
Copy link
Contributor

jreback commented Mar 2, 2017

this is related to this: #15553

but that said, there is only so much that can be done when passing things directly to numpy arrays like this (its actually not the passing, but returning, some numpy functions are friendly and some are not).

I suppose this could be made to work, below is a much more idiomatic way to do this. See the 2nd part for the actual issue.

naive

In [61]: s = Series(pd.date_range('20130101', periods=4))

In [62]: s2 = s[::-1]

In [63]: df = DataFrame({'A':s, 'B':s2.values})

In [64]: df
Out[64]: 
           A          B
0 2013-01-01 2013-01-04
1 2013-01-02 2013-01-03
2 2013-01-03 2013-01-02
3 2013-01-04 2013-01-01

In [65]: df.max(axis=1)
Out[65]: 
0   2013-01-04
1   2013-01-03
2   2013-01-03
3   2013-01-04
dtype: datetime64[ns]

doesn't raise, but incorrect results for tz-aware

In [67]: s = Series(pd.date_range('20130101', periods=4, tz='US/Eastern'))

In [68]: s2 = s[::-1]

In [69]: df = DataFrame({'A':s, 'B':s2.values})

In [70]: df
Out[70]: 
                          A                   B
0 2013-01-01 00:00:00-05:00 2013-01-04 05:00:00
1 2013-01-02 00:00:00-05:00 2013-01-03 05:00:00
2 2013-01-03 00:00:00-05:00 2013-01-02 05:00:00
3 2013-01-04 00:00:00-05:00 2013-01-01 05:00:00

In [71]: df.max(axis=1)
Out[71]: 
0   NaN
1   NaN
2   NaN
3   NaN
dtype: float64

@jreback jreback added Bug Difficulty Intermediate Indexing Related to indexing on series/frames, not to indexes themselves Numeric Operations Arithmetic, Comparison, and Logical operations Timezones Timezone data dtype labels Mar 2, 2017
@jreback jreback added this to the Next Major Release milestone Mar 2, 2017
@jreback jreback changed the title BUG: tz-aware datetime Series throws error on fmin/fmax/minimum/maximum BUG: tz-aware datetime with column-wise comparisions failing Mar 2, 2017
@jreback
Copy link
Contributor

jreback commented Mar 2, 2017

and @adbull

When a tz-aware datetime is placed in a Series, the numpy operations fmin/fmax/minimum/maximum throw an error, even though these operations work fine on a tz-aware DatetimeIndex.

this is not true at all, they naively look like they are working, but because of the same issue above (numpy has no clue about timezones, and forget about missing values), these are completely wrong (they are tz shifted incorrectly)

In [78]: i = pd.DatetimeIndex(df.A)
Out[78]: 
0   2013-01-01 00:00:00-05:00
1   2013-01-02 00:00:00-05:00
2   2013-01-03 00:00:00-05:00
3   2013-01-04 00:00:00-05:00
Name: A, dtype: datetime64[ns, US/Eastern]

In [79]: np.maximum(i, i)
Out[79]: DatetimeIndex(['2013-01-01 05:00:00-05:00', '2013-01-02 05:00:00-05:00', '2013-01-03 05:00:00-05:00', '2013-01-04 05:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', name='A', freq='D')

@mroeschke mroeschke changed the title BUG: tz-aware datetime with column-wise comparisions failing BUG: tz-aware datetime with column-wise comparisions failing with np.minmum/maximum Jul 26, 2018
@mroeschke
Copy link
Member

This work on master now. Could use a test as always.

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions labels Jan 4, 2019
@jbrockmendel jbrockmendel self-assigned this Jan 28, 2019
@jbrockmendel
Copy link
Member

It looks like the np.minimum(pd.Series(utc), pd.Series(utc)) case incorrectly returns tz-naive on master

@mroeschke mroeschke removed Needs Tests Unit test(s) needed to prevent regressions good first issue labels Jun 30, 2019
@mroeschke
Copy link
Member

This looks fixed on master again:

In [24]: np.minimum(pd.Series(utc), pd.Series(utc))
Out[24]:
0   2019-07-12 05:14:37.896979+00:00
dtype: datetime64[ns, UTC]

In [25]: pd.__version__
Out[25]: '0.25.0rc0+50.g5a7a8e1de'

@mroeschke mroeschke added good first issue Needs Tests Unit test(s) needed to prevent regressions and removed Bug Difficulty Intermediate Indexing Related to indexing on series/frames, not to indexes themselves labels Jul 12, 2019
@jreback jreback modified the milestones: Contributions Welcome, 0.25.0 Jul 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Needs Tests Unit test(s) needed to prevent regressions Numeric Operations Arithmetic, Comparison, and Logical operations Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants