Skip to content

BUG: int Overflow with DateFormatter #52895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
joooeey opened this issue Apr 24, 2023 · 16 comments
Open
2 of 3 tasks

BUG: int Overflow with DateFormatter #52895

joooeey opened this issue Apr 24, 2023 · 16 comments
Assignees
Labels
Bug datetime.date stdlib datetime.date support Visualization plotting

Comments

@joooeey
Copy link
Contributor

joooeey commented Apr 24, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import datetime as dt
from matplotlib import dates
import pandas as pd

dates.set_epoch("2023-01-01")  # suggested workaround doesn't help
start = dt.datetime(2022, 4, 4, 7)
end = dt.datetime(2022, 4, 4, 16)
index = pd.date_range(start, end, freq="s")
df = pd.DataFrame(index=index)
df["col"] = 1
ax = df.plot()
ax.xaxis.set_major_formatter(dates.DateFormatter("%H:%M"))  # breaks

Issue Description

See also: #18348

When setting a matplotlib.dates.DateFormatter, I get the following exception:

OverflowError: int too big to convert

Full Traceback:

runfile('/home/lukas/.config/spyder-py3/temp.py', wdir='/home/lukas/.config/spyder-py3')
[autoreload of pandas.core.arrays.timedeltas failed: Traceback (most recent call last):
  File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/extensions/autoreload.py", line 273, in check
    superreload(m, reload, self.old_objects)
  File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/extensions/autoreload.py", line 471, in superreload
    module = reload(module)
             ^^^^^^^^^^^^^^
  File "/home/lukas/mambaforge/envs/moma/lib/python3.11/importlib/__init__.py", line 169, in reload
    _bootstrap._exec(spec, module)
  File "<frozen importlib._bootstrap>", line 621, in _exec
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/pandas/core/arrays/timedeltas.py", line 34, in <module>
    from pandas._libs.tslibs.fields import (
ImportError: cannot import name 'get_timedelta_days' from 'pandas._libs.tslibs.fields' (/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/pandas/_libs/tslibs/fields.cpython-311-x86_64-linux-gnu.so)
]
[autoreload of pandas._testing failed: Traceback (most recent call last):
  File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/extensions/autoreload.py", line 273, in check
    superreload(m, reload, self.old_objects)
  File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/extensions/autoreload.py", line 471, in superreload
    module = reload(module)
             ^^^^^^^^^^^^^^
  File "/home/lukas/mambaforge/envs/moma/lib/python3.11/importlib/__init__.py", line 169, in reload
    _bootstrap._exec(spec, module)
  File "<frozen importlib._bootstrap>", line 621, in _exec
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/pandas/_testing/__init__.py", line 914, in <module>
    cython_table = pd.core.common._cython_table.items()
                   ^^^^^^^^^^^^^^
AttributeError: module 'pandas.core' has no attribute 'common'
]
Error in callback <function _draw_all_if_interactive at 0x7f4dc0f96020> (for post_execute):
Traceback (most recent call last):

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/pyplot.py:120 in _draw_all_if_interactive
    draw_all()

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/_pylab_helpers.py:132 in draw_all
    manager.canvas.draw_idle()

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/backend_bases.py:2082 in draw_idle
    self.draw(*args, **kwargs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/backends/backend_agg.py:400 in draw
    self.figure.draw(self.renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:95 in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
    return draw(artist, renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/figure.py:3140 in draw
    mimage._draw_list_compositing_images(

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/image.py:131 in _draw_list_compositing_images
    a.draw(renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
    return draw(artist, renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axes/_base.py:3064 in draw
    mimage._draw_list_compositing_images(

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/image.py:131 in _draw_list_compositing_images
    a.draw(renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
    return draw(artist, renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axis.py:1376 in draw
    ticks_to_draw = self._update_ticks()

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axis.py:1263 in _update_ticks
    major_labels = self.major.formatter.format_ticks(major_locs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/ticker.py:218 in format_ticks
    return [self(value, i) for i, value in enumerate(values)]

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/ticker.py:218 in <listcomp>
    return [self(value, i) for i, value in enumerate(values)]

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:651 in __call__
    result = num2date(x, self.tz).strftime(self.fmt)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:544 in num2date
    return _from_ordinalf_np_vectorized(x, tz).tolist()

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/numpy/lib/function_base.py:2329 in __call__
    return self._vectorize_call(func=func, args=vargs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/numpy/lib/function_base.py:2412 in _vectorize_call
    outputs = ufunc(*inputs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:359 in _from_ordinalf
    np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))

OverflowError: int too big to convert

Traceback (most recent call last):

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/core/formatters.py:340 in __call__
    return printer(obj)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/IPython/core/pylabtools.py:152 in print_figure
    fig.canvas.print_figure(bytes_io, **kw)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/backend_bases.py:2342 in print_figure
    self.figure.draw(renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:95 in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
    return draw(artist, renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/figure.py:3140 in draw
    mimage._draw_list_compositing_images(

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/image.py:131 in _draw_list_compositing_images
    a.draw(renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
    return draw(artist, renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axes/_base.py:3064 in draw
    mimage._draw_list_compositing_images(

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/image.py:131 in _draw_list_compositing_images
    a.draw(renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/artist.py:72 in draw_wrapper
    return draw(artist, renderer)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axis.py:1376 in draw
    ticks_to_draw = self._update_ticks()

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/axis.py:1263 in _update_ticks
    major_labels = self.major.formatter.format_ticks(major_locs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/ticker.py:218 in format_ticks
    return [self(value, i) for i, value in enumerate(values)]

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/ticker.py:218 in <listcomp>
    return [self(value, i) for i, value in enumerate(values)]

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:651 in __call__
    result = num2date(x, self.tz).strftime(self.fmt)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:544 in num2date
    return _from_ordinalf_np_vectorized(x, tz).tolist()

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/numpy/lib/function_base.py:2329 in __call__
    return self._vectorize_call(func=func, args=vargs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/numpy/lib/function_base.py:2412 in _vectorize_call
    outputs = ufunc(*inputs)

  File ~/mambaforge/envs/moma/lib/python3.11/site-packages/matplotlib/dates.py:359 in _from_ordinalf
    np.timedelta64(int(np.round(x * MUSECONDS_PER_DAY)), 'us'))

OverflowError: int too big to convert

<Figure size 432x288 with 1 Axes>

Expected Behavior

No exceptions are raised, and the major ticks selected by the AutoDateLocator are formatted in hh:mm format.

Installed Versions

INSTALLED VERSIONS

commit : 37ea63d
python : 3.11.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-135-generic
Version : #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.0.1
numpy : 1.24.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")

@joooeey joooeey added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Apr 24, 2023
@rhshadrach rhshadrach added the datetime.date stdlib datetime.date support label Apr 25, 2023
@DeaMariaLeon DeaMariaLeon removed the Needs Triage Issue that has not been reviewed by a pandas team member label Apr 25, 2023
@MarcoGorelli
Copy link
Member

MarcoGorelli commented Apr 25, 2023

Thanks @joooeey for the report

this actually works if you use matplotlib directly

image

This confirms that the issue's indeed on the pandas.plotting side

I suspect (but haven't checked) that somewhere in pandas.plotting, there's an assumption that pandas datetimes still all use nano-second resolution

Contributions would be welcome, this one might not be too tricky for newcomers

@MarcoGorelli MarcoGorelli added the Visualization plotting label Apr 25, 2023
@PrimeF
Copy link
Contributor

PrimeF commented Apr 25, 2023

take

@PrimeF
Copy link
Contributor

PrimeF commented Apr 25, 2023

From my understanding so far, the issue is that matplotlib requires the number of days from the epoch (01-01-1970) for the datetime conversion, while pandas passes a float value of seconds elapsed since the epoch.

The missing piece is to understand where exactly this float value in seconds comes from in pandas. Possibly from TimeSeries_DateFormatter.set_locs() but need to investigate further to better understand, which I'll continue to do in the next days.

@MarcoGorelli
Copy link
Member

thanks for the investigation @PrimeF ! sounds good - take your time, no hurry

@PrimeF
Copy link
Contributor

PrimeF commented Apr 26, 2023

@MarcoGorelli: I've looked into the issue further and the Period values in seconds actually originate from:

elif lib.infer_dtype(values, skipna=False) == "period":
# https://github.com/pandas-dev/pandas/issues/24304
# convert ndarray[period] -> PeriodIndex
return PeriodIndex(values, freq=axis.freq).asi8

In order to get the date in elapsed days since epoch as required by the operations performed in the DateFormatter (see https://github.com/matplotlib/matplotlib/blob/2e2d2d5f574ad43ba87fc893098345db5eb1eacc/lib/matplotlib/dates.py#L357) one could instead leverage this branch

elif isinstance(values, (list, tuple, np.ndarray, Index)):
return [get_datevalue(x, axis.freq) for x in values]
and pass freq='D'.

However, this solutions seems to be highly "customised" to this issue, thus would require some specific conditions to avoid erroneously jumping in the branch and breaking other things down the line.
I am actually not sure if this approach is the best one to follow, hence would appreciate any type of feedback.

@MarcoGorelli
Copy link
Member

thank for the investigation, I'll take a look

@ford--prefect
Copy link

This bug has been discussed on the matplotlib side and there it sounded like being an issue with a different definition of the epoch. Maybe this helps. I'll also add a reference to this issue on the matplotlib issue.

@azjps
Copy link

azjps commented Dec 13, 2023

I dug into this a bit since the PeriodConverter logic looked very strange to me (as an end-user), and it seems like the bulk of the PeriodConverter logic was written in 2012.

It seems to me possibly that PeriodConverter had only ever been intended to be used with freq="D". Otherwise, PeriodConverter._convert_1d() can return very different values depending on the frequency of the given periodic time series:

period = Period("2023-11-11 00:01.000", "T")
get_datevalue(period, "D")  # 19672, convention that matplotlib uses e.g. matplotlib.dates.date2num(pd.Timestamp("2023-11-11 00:01.000"))
get_datevalue(period, "T")  # 28327681
get_datevalue(period, "L")  # 169947861900

So if a small time frequency like "T" (seconds) is given, at some point matplotlib will call something like num2date() on view limits around 28327681 to format the x-labels, at which point an OverflowError will be raised. Contrast with DatetimeConverter which carefully calls mdates.date2num(), and so will return values around 19672.

and pass freq='D'.

Unfortunately changing the frequency from "T" to "D" here would lose resolution, e.g. all of the timestamps in the above example would get cast to 19672. It seems to me that the most reasonable thing to do would be to just defer to DatetimeConverter and possibly remove PeriodConverter entirely, although I could be missing some context where PeriodConverter is actually useful.

Another frustrating issue that led me here is that the pandas plotting converters will try to auto-infer a periodic frequency, even if the index to be plotted is a non-periodic DatetimeIndex:

import pandas as pd
import matplotlib.dates as mdates
ts = pd.Timestamp("2023-11-11 00:00:00", tz="UTC")
ts_plus_n = lambda x: ts + pd.Timedelta("1S") * x  # add x seconds
s = pd.Series([0, 1, 2, 3], index=[ts, ts_plus_n(1), ts_plus_n(2), ts_plus_n(99)])
# assert s.index.inferred_freq is None and s.iloc[0:3].index.inferred_freq == "S"

In [2]: s.iloc[0:4].plot().xaxis.set_major_formatter(mdates.DateFormatter("%X"))  # plots as expected
In [3]: s.iloc[0:3].plot().xaxis.set_major_formatter(mdates.DateFormatter("%X"))  # OverflowError!

@MarcoGorelli
Copy link
Member

MarcoGorelli commented Dec 13, 2023

It seems to me that the most reasonable thing to do would be to just defer to DatetimeConverter and possibly remove PeriodConverter entirely, although I could be missing some context where PeriodConverter is actually useful.

I welcome removing anything Period-related, fancy trying this out and making a PR?

@azjps
Copy link

azjps commented Dec 17, 2023

Seems like there is some history to this and its been debated a few times: #7670 #9053 #15071 #18768 #26253

#15071 provides the best summary, the main benefit (or drawback, depending on one's viewpoint) of PeriodConverter is that some custom tick labelling can be done by pandas.

jorisvandenbossche: The question is what the reason is that we convert DatetimeIndex to periods for plotting. The reasons I can think of:
Performance. Currently, the regular plotting is faster (so for a regular series ts.plot() is faster as ts.plot(x_compat=True)). However, I think this could be solved as most of the time is spent in converting the datetimes to floats (which should be vectorizable).
Nicer tick label locations and formatting. This is a clear plus, our (convoluted) ticklocators and formatters give much nicer results as the default matplotlib (IMO)

Wes even replied at one point

I am not especially attached to it -- if you can unify / have a single code path for plotting without significantly changing functionality, sounds good to me.

IMHO the user should explicitly request the period-based formatting instead of pandas trying to automagically figure it out, but removing PeriodConverter will likely cause the default tick formatting for periodic time series to change, so it might still be a bit controversial.

@azjps
Copy link

azjps commented Dec 24, 2023

I think the following could be done as first steps:

  1. No longer infer periodic frequency when plotting DatetimeIndex. It seems to drive more confusion than help: for example the issues linked above and a few stackoverflow qs like https://stackoverflow.com/questions/21189954/. It's sufficient to remove this branch (can emit a FutureWarning if a softer transition is desired):
    freq = getattr(index, "inferred_freq", None)
  2. In pandas/plotting/_matplotlib/converter.py, we can change PeriodConverter._convert_1d
    return PeriodIndex(values, freq=axis.freq).asi8
    to call mdates.date2num() (for example by calling DatetimeConverter) so that all time series plots will be plotted with the x value units that matplotlib expects. To soften the transition and allow the period tick formatting code in TimeSeries_Date{Locator,Formatter} to still work, the reverse mapping lambda x: Period(mdates.date2num(x), freq=self.freq).ordinal can be applied inside their methods.
  3. TimeSeries_* can be renamed to PeriodSeries_* to make the intent more clear.

The pandas period tick formatter does seem a bit nicer than matplotlib's in some situations (although worse in others), so I'm more hesitant than before to try to remove the period-related code entirely.

@azjps
Copy link

azjps commented Dec 26, 2023

I linked a few more directly related issues, and there's a few more that are linked to the issues in my comment above, so about double-digit issues on the tracker due to the same underlying issue here. Since pandas is changing the time unit when plotting, any other matplotlib feature involving x-coordinates (mdates.DateFormatter, format_coord, tooltips, axvspan or any other artist plotted with datetimes, shared x-axes, etc) may break. I've checked most of these issues are resolved by the fix in the previous comment.

For tracking, there's a few other classes of issues related to Period plotting:

@TonalidadeHidrica
Copy link

Sorry if this is off-topic, but I'm posting this for those who may encounter the same issue. I'm using matplotlib and encountered this error. I was feeding unix epoch as seconds to the x axis, and then tried to use matplotlib.time to format the data, but I got the overflow error. It turns out that _from_ordinalf expects "days after epoch" instead of "seconds after epoch," so I tried dividing the timestamp by 24 * 60 * 60, then everything worked fine.

@azhu-tower
Copy link
Contributor

Didn't get a chance to try completing this since the previous comments, but here's a proof-of-concept with using DatetimeConverter instead of PeriodConverter for PeriodIndex: main...azhu-tower:pandas:period-converter An example with this commit is the plot in this comment: #55110 (comment). IIRC had profiled that the commit introduces some slowness likely due to calling date2num on each tick instead of across all ticks.

@ba05
Copy link

ba05 commented Jun 24, 2024

Anyone have a PR? I'm still encountering this issue with the latest version.

@alecmkrueger
Copy link

I am also encountering this issue with the latest version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug datetime.date stdlib datetime.date support Visualization plotting
Projects
None yet
Development

No branches or pull requests