Misaligned X axis when plotting datetime indexed series with regular and irregular time index #29705

rhkarls · 2019-11-19T07:39:01Z

Code Sample, a copy-pastable example if possible

# Create sample data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

p1d = {'2016-08-10 10:00:00':     2.290438,
'2016-10-12 08:20:00':     1.314112,
'2016-11-15 12:45:00':     0.213702,
'2017-04-27 18:30:00':     0.256794,
'2017-05-30 11:10:00':     4.112614,
'2017-07-19 09:18:00':    10.600000}

p1 = pd.Series(p1d)
p1.index=pd.to_datetime(p1.index)

p2d = {'2016-08-09 09:15:00':    1.57970,
'2016-10-11 13:15:00':    0.73000,
'2017-04-27 12:30:00':    0.15900,
'2017-05-31 16:10:00':    1.65440,
'2018-05-24 12:00:00':    0.79260,
'2018-10-25 11:20:00':    0.34500}

p2 = pd.Series(p2d)
p2.index=pd.to_datetime(p2.index)


p3d = {'2016-11-15 09:00:00':    0.094900,
'2017-04-28 11:10:00':    0.055600,
'2017-05-30 16:00:00':    0.659600,
'2017-06-09 17:15:00':    0.300200,
'2018-05-24 16:45:00':    0.329800,
'2018-09-18 15:40:00':    0.200452}

p3 = pd.Series(p3d)
p3.index = pd.to_datetime(p3.index)

ts_index = pd.date_range('2016-01-01','2018-12-31',freq='H')

ts1 = pd.Series(index=ts_index, data=np.random.uniform(low=0.2,high=10,
                                                          size=ts_index.size))
ts2 = pd.Series(index=ts_index, data=np.random.uniform(low=0.1,high=3,
                                                          size=ts_index.size))
ts3 = pd.Series(index=ts_index, data=np.random.uniform(low=0.05,high=1,
                                                          size=ts_index.size))

# plot
fig_ts, axs_ts = plt.subplots(3,1,sharex=False)

ts1.plot(ax=axs_ts[0])
p1.plot(ax=axs_ts[0],style='o')

ts2.plot(ax=axs_ts[1])
p2.plot(ax=axs_ts[1],style='o')

ts3.plot(ax=axs_ts[2])
p3.plot(ax=axs_ts[2],style='o')

fig_ts.tight_layout()

Problem description

Data on the second axis is not plotted correctly on the X axis. The irregular point data is shifted relative to the other data. It seems this is caused by the uncommon timestamp between the first entry of the irregular timeseries and the regular timeseries.

The following changes causes the data to be plotted correctly, I assume due to having matching timestamps of the first entries:

Changing the first datetime p2 to full hour (e.g. '2016-08-09 09:15:00' to '2016-08-09 09:00:00'
Changing frequency of ts2 to '15T', 15 minutes
It does not matter if later data points in the irregular series have timestamps that are not common with the regular series

Changing the plot order for the second axis also causes the data to be plotted on the correct place along the X axis, so plotting the irregular timeseries before the regular:
p2.plot(ax=axs_ts[1],style='o', zorder=10)
ts2.plot(ax=axs_ts[1], zorder=1)

This does however cause other issues such as different X axis labels, and using this method it will also fail if using sharex=True.

Possibly related issues:
#11574 - Misaligned x axis using shared axis (one series plotted per axis), not when plotted on the same axis as here.
#18571 - Misaligned x axis using twinx(), possibly same issue as here?

Expected Output

Plotting the data at the correct x axis coordinates.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None

pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 5.2.4
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6

The text was updated successfully, but these errors were encountered:

kurtforrester · 2020-01-04T18:59:51Z

I think I too have encountered the same/similar issue.

I have irregular transactional data with a date timestamp. I perform some grouping and aggregating of the data to produce uniform mean balances. When plotting data with different timestamps there is an issue with an offset in the data representation.

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

tr = pd.date_range("2014-04-01", "2019-12-31")

n = 300
ts = np.random.choice(tr, n)
tx = np.random.sample(size=n) * np.random.choice(
    (10, -15, -1, -0.5), n, p=[0.1, 0.025, 0.8, 0.075]
)

df = pd.DataFrame(data={"date": ts, "tx": tx})
df = df.sort_values(by=["date"]).reset_index()
df["balance"] = df["tx"].cumsum()

fig, axs = plt.subplots(2, 2)

mbalance = df.groupby(pd.Grouper(key="date", freq="M"))[["balance"]].mean()
print("monthly")
print(mbalance)
mbalance.plot(linestyle="none", marker="o", title="month", ax=axs[0, 0])

# company year ending is March
qbalance = df.groupby(pd.Grouper(key="date", freq="Q-MAR"))[["balance"]].mean()
print("quarterly")
print(qbalance)
qbalance.plot(linestyle="none", marker="s", title="quarter", ax=axs[0, 1])

abalance = df.groupby(pd.Grouper(key="date", freq="A-MAR"))[["balance"]].mean()
print("annually")
print(abalance)
abalance.plot(linestyle="none", marker="d", title="annum", ax=axs[1, 0])

mbalance.plot(linestyle="none", marker="o", ax=axs[1, 1])
qbalance.plot(linestyle="none", marker="s", ax=axs[1, 1])
abalance.plot(linestyle="none", marker="d", ax=axs[1, 1])

plt.legend()

When plotted onto individual subplot axes the data is rendered correctly (each is as expected). When the datasets are overlaid the quarterly and annual data are offset from their true position.

kurtforrester · 2020-01-04T19:27:55Z

A further example to present the issue. When the original data is plotted first there is no issue with alignment (all display correctly).

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

tr = pd.date_range("2014-04-01", "2019-12-31")

n = 300
ts = np.random.choice(tr, n)
tx = np.random.sample(size=n) * np.random.choice(
    (10, -15, -1, -0.5), n, p=[0.1, 0.025, 0.8, 0.075]
)

df = pd.DataFrame(data={"date": ts, "tx": tx})
df = df.sort_values(by=["date"]).reset_index()
df["balance"] = df["tx"].cumsum()

fig, axs = plt.subplots(1, 2)

mbalance = df.groupby(pd.Grouper(key="date", freq="M"))[["balance"]].mean()
print("monthly")
print(mbalance)

qbalance = df.groupby(pd.Grouper(key="date", freq="Q-MAR"))[["balance"]].mean()
print("quarterly")
print(qbalance)

abalance = df.groupby(pd.Grouper(key="date", freq="A-MAR"))[["balance"]].mean()
print("annually")
print(abalance)

df.plot(x="date", y="balance", marker="+", ax=axs[0])
mbalance.plot(linestyle="none", marker="o", ax=axs[0])
qbalance.plot(linestyle="none", marker="s", ax=axs[0])
abalance.plot(linestyle="none", marker="d", ax=axs[0])

plt.legend()

mbalance.plot(linestyle="none", marker="o", ax=axs[1])
qbalance.plot(linestyle="none", marker="s", ax=axs[1])
abalance.plot(linestyle="none", marker="d", ax=axs[1])

plt.legend()

rhkarls · 2020-01-16T19:50:50Z

Since the order of plotting clearly matters it might be how pandas decides on representing dates on a numeric scale?

Here's another example showing the problem and how the order matters. This time with two regular spaced series, so the initial title I used might not be so accurate. It seems to happen when pandas deals with known frequencies, either defined on the index or when its able to infer the frequency. Why does it convert the dates to different numeric values depending on the order of plotting?

import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
import pandas as pd

N = 1000
y = np.linspace(0, 10, N)

base_dt = dt.datetime(2000,1,1)
dt_range = [base_dt + dt.timedelta(hours=x) for x in range(N)]
s = pd.Series(index=dt_range, data=y*2)

# Alignment is good when plotting with pyplot first
plt.figure()
plt.plot(dt_range,y, color='b')
s.plot(color='r')

# Alignment is bad when plotting with pandas.plot() first
plt.figure()
s.plot(color='r')
plt.plot(dt_range,y, color='b') # ends up in the 2050's

Looking at the xlim values in when plotting - very different values:

plt.figure()
s.plot(color='r')
plt.xlim() # (262968.0, 263967.0)

plt.figure()
plt.plot(dt_range,y, color='b')
s.plot(color='r')
plt.xlim() # (730120.0, 730161.625)

rhkarls · 2020-01-16T20:10:44Z

Came across this 3 year old issue #15071, and seems to be related to this.

There's a lot of magic going on between how the datetime64 values actually encode datetime in plots. Sharing an axis across (sub)plots is brittle w.r.t. these differences. Work around this, here: make it so that individual timestamps have a non-zero value for seconds, by simply adding one second, shifting the whole data set by one second to the left. That prevents, I guess, an optimization to hit in which would see that individual timestamps hit the full hour or integer multiples of 30 or 15 minutes. Also see pandas-dev/pandas#15874 pandas-dev/pandas#15071 pandas-dev/pandas#31074 pandas-dev/pandas#29705 pandas-dev/pandas#29719 pandas-dev/pandas#18571 pandas-dev/pandas#11574 pandas-dev/pandas#22586

rhkarls changed the title ~~Misaligned X axis plotting datetime indexed series with and without frequency~~ Misaligned X axis when plotting datetime indexed series with regular and irregular time index Nov 19, 2019

HenrikSpa mentioned this issue Nov 19, 2019

Misaligned X axis when plotting datetime with same frequency but different step base #29719

Open

jbrockmendel added the Visualization plotting label Dec 11, 2019

jgehrcke mentioned this issue Dec 7, 2020

Abstract figures. A multi-plot figure as a summary. Dragons. jgehrcke/ci-analysis#4

Merged

mroeschke added the Bug label Jul 23, 2021

astrojuanlu mentioned this issue Jan 30, 2022

Misaligned shared axes between timeseries plots of different frequency. #11574

Open

azjps mentioned this issue Dec 26, 2023

BUG: int Overflow with DateFormatter #52895

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misaligned X axis when plotting datetime indexed series with regular and irregular time index #29705

Misaligned X axis when plotting datetime indexed series with regular and irregular time index #29705

rhkarls commented Nov 19, 2019

INSTALLED VERSIONS

kurtforrester commented Jan 4, 2020

kurtforrester commented Jan 4, 2020

rhkarls commented Jan 16, 2020 •

edited

Loading

rhkarls commented Jan 16, 2020

Misaligned X axis when plotting datetime indexed series with regular and irregular time index #29705

Misaligned X axis when plotting datetime indexed series with regular and irregular time index #29705

Comments

rhkarls commented Nov 19, 2019

Code Sample, a copy-pastable example if possible

Problem description

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

kurtforrester commented Jan 4, 2020

kurtforrester commented Jan 4, 2020

rhkarls commented Jan 16, 2020 • edited Loading

rhkarls commented Jan 16, 2020

Output of `pd.show_versions()`

rhkarls commented Jan 16, 2020 •

edited

Loading