Skip to content

ERR: tz conversions at the min/max boundaries should fail if overflow #12677

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ciamac opened this issue Mar 20, 2016 · 6 comments
Closed

ERR: tz conversions at the min/max boundaries should fail if overflow #12677

ciamac opened this issue Mar 20, 2016 · 6 comments
Labels
Error Reporting Incorrect or improved errors from pandas Timezones Timezone data dtype
Milestone

Comments

@ciamac
Copy link

ciamac commented Mar 20, 2016

Timestamp comparisons with pd.Timestamp.max don't seem to work correctly when there are timezones, as in the code sample below. This is with Pandas 0.16.2.

Code Sample, a copy-pastable example if possible

import pandas as pd
import datetime

# should print True and does
print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('UTC')<pd.Timestamp.max.tz_localize('UTC')
# should print True but prints False
print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('US/Eastern')<pd.Timestamp.max.tz_localize('US/Eastern')

output of pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.10.final.0
python-bits: 64
OS: Darwin
OS-release: 15.3.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.16.2
nose: None
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.1
statsmodels: None
IPython: 4.0.1
sphinx: None
patsy: 0.4.1
dateutil: 2.4.2
pytz: 2015.7
bottleneck: 1.0.0
numexpr: 2.4.6
matplotlib: 1.5.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Mar 20, 2016

This works for me in 0.16.2 (e.g. True/False) are printed.

In [1]: # should print True and does

In [2]: print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('UTC')<pd.Timestamp.max.tz_localize('UTC')
True

In [3]: # should print False but prints True

In [4]: print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('US/Eastern')<pd.Timestamp.max.tz_localize('US/Eastern')
False

In [5]: pd.__version__
Out[5]: u'0.18.0'

Show an ipython session where you print the version & run the code.

@jreback jreback closed this as completed Mar 20, 2016
@jreback jreback added Usage Question Timezones Timezone data dtype labels Mar 20, 2016
@ciamac
Copy link
Author

ciamac commented Mar 20, 2016

I've pasted an ipython session below. Sorry, my comments were wrong. Both instances should print True but the latter prints False.

In [1]: import pandas as pd

In [2]: import datetime

In [3]: 

In [3]: # should print True and does

In [4]: print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('UTC')<pd.Timestamp.max.tz_localize('UTC')
True

In [5]: # should print True but prints False

In [6]: print pd.Timestamp(datetime.date(2010,1,1)).tz_localize('US/Eastern')<pd.Timestamp.max.tz_localize('US/Eastern')
False

In [7]: pd.__version__
Out[7]: '0.16.2'

@ciamac
Copy link
Author

ciamac commented Mar 20, 2016

Based on your snippet, it looks like the bug is still in 0.18.0.

@jreback
Copy link
Contributor

jreback commented Mar 21, 2016

So this is fine

In [5]: pd.Timestamp(datetime.date(2010,1,1)).tz_localize('US/Eastern')<(pd.Timestamp.max-pd.Timedelta('1d')).tz_localize('US/Eastern')
Out[5]: True

This happens because of wrap-around, IOW, once you can't represent a number at the limits it wraps around to the other side.

In [6]: pd.Timestamp.max.tz_localize('US/Eastern').tz_convert('UTC').value
Out[6]: -9223354276854775809

In [7]: pd.Timestamp.max.value
Out[7]: 9223372036854775807

I suppose you do a doc-note. But trying to anything very near the edge points is bound to hit this issue (not restricted to time zones specifically)

@jreback
Copy link
Contributor

jreback commented Mar 21, 2016

I suppose we could raise an error like this on: pd.Timestamp.max.tz_localize('US/Eastern')

In [8]: pd.Timestamp.max + pd.Timedelta('1d')
OverflowError: Python int too large to convert to C long

interested in doing a PR?

@jreback jreback reopened this Mar 21, 2016
@jreback jreback added Error Reporting Incorrect or improved errors from pandas Difficulty Intermediate and removed Usage Question labels Mar 21, 2016
@jreback jreback added this to the Next Major Release milestone Mar 21, 2016
@jreback jreback changed the title timestamp comparison with timezones ERR: tz conversions at the min/max boundaries should fail if overflow Mar 21, 2016
@ciamac
Copy link
Author

ciamac commented Mar 22, 2016

I think raising an error is a good way to handle this. It would also be good to have a function pd.Timestamp.max_tz_localize(tz) which would return the maximal timestamp in that time zone.

I would love to help with the PR, but I don't have enough experience in this code base to fix the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

2 participants