Skip to content

BUG: pd.date_range gives wrong overflow error type when freq='B' #24252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
topper-123 opened this issue Dec 12, 2018 · 7 comments · Fixed by #26651
Closed

BUG: pd.date_range gives wrong overflow error type when freq='B' #24252

topper-123 opened this issue Dec 12, 2018 · 7 comments · Fixed by #26651
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Frequency DateOffsets
Milestone

Comments

@topper-123
Copy link
Contributor

Code Sample, a copy-pastable example if possible

>>> date = pd.Timestamp.max.floor("D").to_pydatetime().date()  # datetime.date(2262, 4, 11)
>>> freq = 'B'
>>> pd.date_range(date, periods=1, freq=freq)
OverflowError: int too big to convert
>>> freq = 'D'
>>> pd.date_range(date, periods=1, freq=freq)
OutOfBoundsDatetime: Cannot generate range with start=9223286400000000000 and periods=1

Problem description

These two date_range overflows return different error types. Both should reasonablyreturn OutOfBoundsDatetime.

INSTALLED VERSIONS

commit: 8221ee9
python: 3.7.1.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.24.0.dev0+1257.g8221ee91d
pytest: 4.0.0
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.15.4
scipy: None
pyarrow: 0.11.1
xarray: None
IPython: 7.1.1
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: 0.4.0
matplotlib: 3.0.1
openpyxl: None
xlrd: 1.1.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

@mroeschke
Copy link
Member

I suspect that generally Tick based offset will raise the later error while DateOffset based offsets will raise the former.

Additionally the OutOfBoundsDatetime looks like it can be improved; it's probably more beneficial to specify the date instead of the nanoseconds epoch time.

@mroeschke mroeschke added Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Frequency DateOffsets labels Dec 12, 2018
@topper-123
Copy link
Contributor Author

Hey, @mroeschke, did your PR resolve this issue?

I'm in doubt, as the second example now raises a OverflowError, while I assumed the correct solution would be for both to raise a OutOfBoundsDatetime.

@mroeschke
Copy link
Member

I don't think @jbrockmendel solved this issue. On master the first example still raises an OverflowError while the 2nd example actually returns something on master

In [5]: >>> freq = 'D'
   ...: >>> pd.date_range(date, periods=1, freq=freq)
Out[5]: DatetimeIndex(['2262-04-11'], dtype='datetime64[ns]', freq='D')

@jbrockmendel
Copy link
Member

@mroeschke just opened #26651 that addresses the example here. I had trouble coming up with an example to test non-overflow at the bottom of the overflow range. Suggestions welcome.

@mroeschke
Copy link
Member

This seems a little suspicious toward the bottom of the overflow range (on master):

In [5]: end = pd.Timestamp.min.ceil("D").to_pydatetime()

In [6]: end
Out[6]: datetime.datetime(1677, 9, 22, 0, 0)

In [7]: rng = pd.date_range(start=None, end=end, periods=-1, freq='B')

In [8]: rng
Out[8]: DatetimeIndex([], dtype='datetime64[ns]', freq='B')

In [11]: rng = pd.date_range(start=None, end=end, periods=1, freq='-1B')
OverflowError: Python int too large to convert to C long

@jreback jreback added this to the 0.25.0 milestone Jun 6, 2019
@jbrockmendel
Copy link
Member

@mroeschke under #24252 [8] above is unchanged but [11] returns DatetimeIndex(['1677-09-22'], dtype='datetime64[ns]', freq='-1B'). The latter looks right to me, while the empty DatetimeIndex still looks wrong. Does that match what you expect? If so, I'll add a test to 24252 for [11].

@mroeschke
Copy link
Member

I think [7] and [11] should both raise an OutOfBoundsDatetime as well since they are both (essentially) equivalent to pd.Timestamp.min - pd.Timedelta('1D')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Datetime Datetime data dtype Error Reporting Incorrect or improved errors from pandas Frequency DateOffsets
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants