-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: int Overflow with DateFormatter #52895
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @joooeey for the report this actually works if you use matplotlib directly This confirms that the issue's indeed on the pandas.plotting side I suspect (but haven't checked) that somewhere in pandas.plotting, there's an assumption that pandas datetimes still all use nano-second resolution Contributions would be welcome, this one might not be too tricky for newcomers |
take |
From my understanding so far, the issue is that matplotlib requires the number of days from the epoch (01-01-1970) for the datetime conversion, while pandas passes a float value of seconds elapsed since the epoch. The missing piece is to understand where exactly this float value in seconds comes from in pandas. Possibly from |
thanks for the investigation @PrimeF ! sounds good - take your time, no hurry |
@MarcoGorelli: I've looked into the issue further and the Period values in seconds actually originate from: pandas/pandas/plotting/_matplotlib/converter.py Lines 252 to 255 in 103d3b2
In order to get the date in elapsed days since epoch as required by the operations performed in the DateFormatter (see https://github.com/matplotlib/matplotlib/blob/2e2d2d5f574ad43ba87fc893098345db5eb1eacc/lib/matplotlib/dates.py#L357) one could instead leverage this branch pandas/pandas/plotting/_matplotlib/converter.py Lines 256 to 257 in 103d3b2
freq='D' .
However, this solutions seems to be highly "customised" to this issue, thus would require some specific conditions to avoid erroneously jumping in the branch and breaking other things down the line. |
thank for the investigation, I'll take a look |
This bug has been discussed on the matplotlib side and there it sounded like being an issue with a different definition of the epoch. Maybe this helps. I'll also add a reference to this issue on the matplotlib issue. |
I dug into this a bit since the It seems to me possibly that period = Period("2023-11-11 00:01.000", "T")
get_datevalue(period, "D") # 19672, convention that matplotlib uses e.g. matplotlib.dates.date2num(pd.Timestamp("2023-11-11 00:01.000"))
get_datevalue(period, "T") # 28327681
get_datevalue(period, "L") # 169947861900 So if a small time frequency like "T" (seconds) is given, at some point matplotlib will call something like
Unfortunately changing the frequency from "T" to "D" here would lose resolution, e.g. all of the timestamps in the above example would get cast to 19672. It seems to me that the most reasonable thing to do would be to just defer to Another frustrating issue that led me here is that the pandas plotting converters will try to auto-infer a periodic frequency, even if the index to be plotted is a non-periodic import pandas as pd
import matplotlib.dates as mdates
ts = pd.Timestamp("2023-11-11 00:00:00", tz="UTC")
ts_plus_n = lambda x: ts + pd.Timedelta("1S") * x # add x seconds
s = pd.Series([0, 1, 2, 3], index=[ts, ts_plus_n(1), ts_plus_n(2), ts_plus_n(99)])
# assert s.index.inferred_freq is None and s.iloc[0:3].index.inferred_freq == "S"
In [2]: s.iloc[0:4].plot().xaxis.set_major_formatter(mdates.DateFormatter("%X")) # plots as expected
In [3]: s.iloc[0:3].plot().xaxis.set_major_formatter(mdates.DateFormatter("%X")) # OverflowError! |
I welcome removing anything Period-related, fancy trying this out and making a PR? |
Seems like there is some history to this and its been debated a few times: #7670 #9053 #15071 #18768 #26253 #15071 provides the best summary, the main benefit (or drawback, depending on one's viewpoint) of
Wes even replied at one point
IMHO the user should explicitly request the period-based formatting instead of |
I think the following could be done as first steps:
The pandas period tick formatter does seem a bit nicer than matplotlib's in some situations (although worse in others), so I'm more hesitant than before to try to remove the period-related code entirely. |
I linked a few more directly related issues, and there's a few more that are linked to the issues in my comment above, so about double-digit issues on the tracker due to the same underlying issue here. Since pandas is changing the time unit when plotting, any other matplotlib feature involving x-coordinates ( For tracking, there's a few other classes of issues related to Period plotting:
|
Sorry if this is off-topic, but I'm posting this for those who may encounter the same issue. I'm using matplotlib and encountered this error. I was feeding unix epoch as seconds to the x axis, and then tried to use matplotlib.time to format the data, but I got the overflow error. It turns out that |
Didn't get a chance to try completing this since the previous comments, but here's a proof-of-concept with using |
Anyone have a PR? I'm still encountering this issue with the latest version. |
I am also encountering this issue with the latest version. |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
See also: #18348
When setting a
matplotlib.dates.DateFormatter
, I get the following exception:Full Traceback:
Expected Behavior
No exceptions are raised, and the major ticks selected by the AutoDateLocator are formatted in hh:mm format.
Installed Versions
INSTALLED VERSIONS
commit : 37ea63d
python : 3.11.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-135-generic
Version : #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 2.0.1
numpy : 1.24.2
pytz : 2023.3
dateutil : 2.8.2
setuptools : 67.6.1
pip : 23.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : 8.12.0
pandas_datareader: None
bs4 : None
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.7.1
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : None
snappy : None
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
/home/lukas/mambaforge/envs/moma/lib/python3.11/site-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
The text was updated successfully, but these errors were encountered: