Skip to content

Plotting on a datetime axis with different levels of granularity creates misleading plots #15874

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nedyoxall opened this issue Apr 3, 2017 · 3 comments
Labels

Comments

@nedyoxall
Copy link

Issue setup:

import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
import numpy as np

np.random.seed(123)
df = pd.DataFrame(index=pd.date_range("2017-01-01", "2017-01-07"), 
                  data=np.random.random(size=7))

Problem description

If you plot a Series with a DateTimeIndex of 'daily granularity', and subsequently add other things (e.g. axvline, scatter...) to that plot with a lower level of granularity (e.g. hourly), then the other things you add will be 'snapped' to the nearest day.

If, for example, we try and plot a vertical line at 12pm on the 4th Feb, the line will snap to midnight:

fig, ax = plt.subplots()

df.plot(ax=ax)
ax.axvline(dt.datetime(2017,1,4,12), color ='r')

image

Reversing the order in which the things are plotted fixes the problem:

ax.axvline(dt.datetime(2017,1,4,12), color ='r')
df.plot(ax=ax)

image

An alternative fix is to specify the DateTimeIndex at a lower granularity:

df.index = df.index + pd.to_timedelta("1 sec")

So, to wrap up, not a major problem, but certainly something that could catch you unawares... Perhaps this is more an issue on the matplotlib side though?

Output of pd.show_versions()

# Paste the output here pd.show_versions() here INSTALLED VERSIONS ------------------ commit: None python: 3.6.0.final.0 python-bits: 64 OS: Darwin OS-release: 16.5.0 machine: x86_64 processor: i386 byteorder: little LC_ALL: None LANG: en_US.UTF-8 LOCALE: en_US.UTF-8

pandas: 0.19.2
nose: None
pip: 9.0.1
setuptools: 27.2.0
Cython: None
numpy: 1.12.0
scipy: 0.18.1
statsmodels: None
xarray: None
IPython: 5.2.2
sphinx: None
patsy: None
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: None
tables: None
numexpr: None
matplotlib: 2.0.0
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.5.3
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: None
pandas_datareader: None

@TomAugspurger
Copy link
Contributor

So, this is unfortunate... The real fix will be fixing #15071, which is a good bit of work. Since the index is regular frequency, we use Periods instead of date times / timestamps.

Another workaround is to draw the axvline with a Period

In [89]: ax.axvline(pd.Period('2017-01-04T12'), color='r')
Out[89]: <matplotlib.lines.Line2D at 0x111f91518>

I don't think there's any way we can reliably detect this and warn the user though, at least not without false positives :/

@TomAugspurger TomAugspurger added the Visualization plotting label Apr 3, 2017
@nedyoxall
Copy link
Author

Aha. I suspected this could be pretty complicated under the hood!

Is there anywhere sensible we could update things in the documentation to warn people? If there is, I'll try and submit a PR sometime over the next couple of days.

(By the way that Period fix didn't work for me - at least not when substituted directly for the ax.axvline... line that I had previously.)

@TomAugspurger
Copy link
Contributor

Is there anywhere sensible we could update things in the documentation to warn people?

The best is probably near where we document x_compat (which you could use instead).
We can make a small section about this whole issue.

(By the way that Period fix didn't work for me - at least not when substituted directly for the ax.axvline... line that I had previously.)

Huh, I thought that would do it, sorry.

jgehrcke added a commit to jgehrcke/ci-analysis that referenced this issue Dec 7, 2020
There's a lot of magic going on between how the datetime64 values
actually encode datetime in plots. Sharing an axis across (sub)plots is
brittle w.r.t. these differences. Work around this, here: make it so that
individual timestamps have a non-zero value for seconds, by simply adding
one second, shifting the whole data set by one second to the left. That
prevents, I guess, an optimization to hit in which would see that
individual timestamps hit the full hour or integer  multiples of 30 or 15
minutes. Also see

 pandas-dev/pandas#15874
 pandas-dev/pandas#15071
 pandas-dev/pandas#31074
 pandas-dev/pandas#29705
 pandas-dev/pandas#29719
 pandas-dev/pandas#18571
 pandas-dev/pandas#11574
 pandas-dev/pandas#22586
@mroeschke mroeschke added the Bug label May 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants