-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Misaligned X axis when plotting datetime indexed series with regular and irregular time index #29705
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think I too have encountered the same/similar issue. I have irregular transactional data with a date timestamp. I perform some grouping and aggregating of the data to produce uniform mean balances. When plotting data with different timestamps there is an issue with an offset in the data representation. import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
tr = pd.date_range("2014-04-01", "2019-12-31")
n = 300
ts = np.random.choice(tr, n)
tx = np.random.sample(size=n) * np.random.choice(
(10, -15, -1, -0.5), n, p=[0.1, 0.025, 0.8, 0.075]
)
df = pd.DataFrame(data={"date": ts, "tx": tx})
df = df.sort_values(by=["date"]).reset_index()
df["balance"] = df["tx"].cumsum()
fig, axs = plt.subplots(2, 2)
mbalance = df.groupby(pd.Grouper(key="date", freq="M"))[["balance"]].mean()
print("monthly")
print(mbalance)
mbalance.plot(linestyle="none", marker="o", title="month", ax=axs[0, 0])
# company year ending is March
qbalance = df.groupby(pd.Grouper(key="date", freq="Q-MAR"))[["balance"]].mean()
print("quarterly")
print(qbalance)
qbalance.plot(linestyle="none", marker="s", title="quarter", ax=axs[0, 1])
abalance = df.groupby(pd.Grouper(key="date", freq="A-MAR"))[["balance"]].mean()
print("annually")
print(abalance)
abalance.plot(linestyle="none", marker="d", title="annum", ax=axs[1, 0])
mbalance.plot(linestyle="none", marker="o", ax=axs[1, 1])
qbalance.plot(linestyle="none", marker="s", ax=axs[1, 1])
abalance.plot(linestyle="none", marker="d", ax=axs[1, 1])
plt.legend() When plotted onto individual subplot axes the data is rendered correctly (each is as expected). When the datasets are overlaid the quarterly and annual data are offset from their true position. |
A further example to present the issue. When the original data is plotted first there is no issue with alignment (all display correctly). import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
tr = pd.date_range("2014-04-01", "2019-12-31")
n = 300
ts = np.random.choice(tr, n)
tx = np.random.sample(size=n) * np.random.choice(
(10, -15, -1, -0.5), n, p=[0.1, 0.025, 0.8, 0.075]
)
df = pd.DataFrame(data={"date": ts, "tx": tx})
df = df.sort_values(by=["date"]).reset_index()
df["balance"] = df["tx"].cumsum()
fig, axs = plt.subplots(1, 2)
mbalance = df.groupby(pd.Grouper(key="date", freq="M"))[["balance"]].mean()
print("monthly")
print(mbalance)
qbalance = df.groupby(pd.Grouper(key="date", freq="Q-MAR"))[["balance"]].mean()
print("quarterly")
print(qbalance)
abalance = df.groupby(pd.Grouper(key="date", freq="A-MAR"))[["balance"]].mean()
print("annually")
print(abalance)
df.plot(x="date", y="balance", marker="+", ax=axs[0])
mbalance.plot(linestyle="none", marker="o", ax=axs[0])
qbalance.plot(linestyle="none", marker="s", ax=axs[0])
abalance.plot(linestyle="none", marker="d", ax=axs[0])
plt.legend()
mbalance.plot(linestyle="none", marker="o", ax=axs[1])
qbalance.plot(linestyle="none", marker="s", ax=axs[1])
abalance.plot(linestyle="none", marker="d", ax=axs[1])
plt.legend() |
Since the order of plotting clearly matters it might be how pandas decides on representing dates on a numeric scale? Here's another example showing the problem and how the order matters. This time with two regular spaced series, so the initial title I used might not be so accurate. It seems to happen when pandas deals with known frequencies, either defined on the index or when its able to infer the frequency. Why does it convert the dates to different numeric values depending on the order of plotting?
Looking at the
|
Came across this 3 year old issue #15071, and seems to be related to this. |
There's a lot of magic going on between how the datetime64 values actually encode datetime in plots. Sharing an axis across (sub)plots is brittle w.r.t. these differences. Work around this, here: make it so that individual timestamps have a non-zero value for seconds, by simply adding one second, shifting the whole data set by one second to the left. That prevents, I guess, an optimization to hit in which would see that individual timestamps hit the full hour or integer multiples of 30 or 15 minutes. Also see pandas-dev/pandas#15874 pandas-dev/pandas#15071 pandas-dev/pandas#31074 pandas-dev/pandas#29705 pandas-dev/pandas#29719 pandas-dev/pandas#18571 pandas-dev/pandas#11574 pandas-dev/pandas#22586
Code Sample, a copy-pastable example if possible
Problem description
Data on the second axis is not plotted correctly on the X axis. The irregular point data is shifted relative to the other data. It seems this is caused by the uncommon timestamp between the first entry of the irregular timeseries and the regular timeseries.
The following changes causes the data to be plotted correctly, I assume due to having matching timestamps of the first entries:
It does not matter if later data points in the irregular series have timestamps that are not common with the regular series
Changing the plot order for the second axis also causes the data to be plotted on the correct place along the X axis, so plotting the irregular timeseries before the regular:
p2.plot(ax=axs_ts[1],style='o', zorder=10)
ts2.plot(ax=axs_ts[1], zorder=1)
This does however cause other issues such as different X axis labels, and using this method it will also fail if using sharex=True.
Possibly related issues:
#11574 - Misaligned x axis using shared axis (one series plotted per axis), not when plotted on the same axis as here.
#18571 - Misaligned x axis using twinx(), possibly same issue as here?
Expected Output
Plotting the data at the correct x axis coordinates.
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None
pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 5.2.4
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6
The text was updated successfully, but these errors were encountered: