Skip to content

Misaligned X axis when plotting datetime indexed series with regular and irregular time index #29705

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
rhkarls opened this issue Nov 19, 2019 · 4 comments
Labels

Comments

@rhkarls
Copy link
Contributor

rhkarls commented Nov 19, 2019

Code Sample, a copy-pastable example if possible

# Create sample data
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

p1d = {'2016-08-10 10:00:00':     2.290438,
'2016-10-12 08:20:00':     1.314112,
'2016-11-15 12:45:00':     0.213702,
'2017-04-27 18:30:00':     0.256794,
'2017-05-30 11:10:00':     4.112614,
'2017-07-19 09:18:00':    10.600000}

p1 = pd.Series(p1d)
p1.index=pd.to_datetime(p1.index)

p2d = {'2016-08-09 09:15:00':    1.57970,
'2016-10-11 13:15:00':    0.73000,
'2017-04-27 12:30:00':    0.15900,
'2017-05-31 16:10:00':    1.65440,
'2018-05-24 12:00:00':    0.79260,
'2018-10-25 11:20:00':    0.34500}

p2 = pd.Series(p2d)
p2.index=pd.to_datetime(p2.index)


p3d = {'2016-11-15 09:00:00':    0.094900,
'2017-04-28 11:10:00':    0.055600,
'2017-05-30 16:00:00':    0.659600,
'2017-06-09 17:15:00':    0.300200,
'2018-05-24 16:45:00':    0.329800,
'2018-09-18 15:40:00':    0.200452}

p3 = pd.Series(p3d)
p3.index = pd.to_datetime(p3.index)

ts_index = pd.date_range('2016-01-01','2018-12-31',freq='H')

ts1 = pd.Series(index=ts_index, data=np.random.uniform(low=0.2,high=10,
                                                          size=ts_index.size))
ts2 = pd.Series(index=ts_index, data=np.random.uniform(low=0.1,high=3,
                                                          size=ts_index.size))
ts3 = pd.Series(index=ts_index, data=np.random.uniform(low=0.05,high=1,
                                                          size=ts_index.size))

# plot
fig_ts, axs_ts = plt.subplots(3,1,sharex=False)

ts1.plot(ax=axs_ts[0])
p1.plot(ax=axs_ts[0],style='o')

ts2.plot(ax=axs_ts[1])
p2.plot(ax=axs_ts[1],style='o')

ts3.plot(ax=axs_ts[2])
p3.plot(ax=axs_ts[2],style='o')

fig_ts.tight_layout()

image

Problem description

Data on the second axis is not plotted correctly on the X axis. The irregular point data is shifted relative to the other data. It seems this is caused by the uncommon timestamp between the first entry of the irregular timeseries and the regular timeseries.

The following changes causes the data to be plotted correctly, I assume due to having matching timestamps of the first entries:

  • Changing the first datetime p2 to full hour (e.g. '2016-08-09 09:15:00' to '2016-08-09 09:00:00'
  • Changing frequency of ts2 to '15T', 15 minutes
    It does not matter if later data points in the irregular series have timestamps that are not common with the regular series

Changing the plot order for the second axis also causes the data to be plotted on the correct place along the X axis, so plotting the irregular timeseries before the regular:
p2.plot(ax=axs_ts[1],style='o', zorder=10)
ts2.plot(ax=axs_ts[1], zorder=1)

This does however cause other issues such as different X axis labels, and using this method it will also fail if using sharex=True.

Possibly related issues:
#11574 - Misaligned x axis using shared axis (one series plotted per axis), not when plotted on the same axis as here.
#18571 - Misaligned x axis using twinx(), possibly same issue as here?

Expected Output

Plotting the data at the correct x axis coordinates.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 78 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : en
LOCALE : None.None

pandas : 0.25.3
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.1
pip : 19.3.1
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 5.2.4
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.6
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.1
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.11
tables : 3.6.1
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.6

@rhkarls rhkarls changed the title Misaligned X axis plotting datetime indexed series with and without frequency Misaligned X axis when plotting datetime indexed series with regular and irregular time index Nov 19, 2019
@jbrockmendel jbrockmendel added the Visualization plotting label Dec 11, 2019
@kurtforrester
Copy link

I think I too have encountered the same/similar issue.

I have irregular transactional data with a date timestamp. I perform some grouping and aggregating of the data to produce uniform mean balances. When plotting data with different timestamps there is an issue with an offset in the data representation.

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

tr = pd.date_range("2014-04-01", "2019-12-31")

n = 300
ts = np.random.choice(tr, n)
tx = np.random.sample(size=n) * np.random.choice(
    (10, -15, -1, -0.5), n, p=[0.1, 0.025, 0.8, 0.075]
)

df = pd.DataFrame(data={"date": ts, "tx": tx})
df = df.sort_values(by=["date"]).reset_index()
df["balance"] = df["tx"].cumsum()

fig, axs = plt.subplots(2, 2)

mbalance = df.groupby(pd.Grouper(key="date", freq="M"))[["balance"]].mean()
print("monthly")
print(mbalance)
mbalance.plot(linestyle="none", marker="o", title="month", ax=axs[0, 0])

# company year ending is March
qbalance = df.groupby(pd.Grouper(key="date", freq="Q-MAR"))[["balance"]].mean()
print("quarterly")
print(qbalance)
qbalance.plot(linestyle="none", marker="s", title="quarter", ax=axs[0, 1])

abalance = df.groupby(pd.Grouper(key="date", freq="A-MAR"))[["balance"]].mean()
print("annually")
print(abalance)
abalance.plot(linestyle="none", marker="d", title="annum", ax=axs[1, 0])

mbalance.plot(linestyle="none", marker="o", ax=axs[1, 1])
qbalance.plot(linestyle="none", marker="s", ax=axs[1, 1])
abalance.plot(linestyle="none", marker="d", ax=axs[1, 1])

plt.legend()

When plotted onto individual subplot axes the data is rendered correctly (each is as expected). When the datasets are overlaid the quarterly and annual data are offset from their true position.

@kurtforrester
Copy link

A further example to present the issue. When the original data is plotted first there is no issue with alignment (all display correctly).

import numpy as np
import pandas as pd

import matplotlib.pyplot as plt

tr = pd.date_range("2014-04-01", "2019-12-31")

n = 300
ts = np.random.choice(tr, n)
tx = np.random.sample(size=n) * np.random.choice(
    (10, -15, -1, -0.5), n, p=[0.1, 0.025, 0.8, 0.075]
)

df = pd.DataFrame(data={"date": ts, "tx": tx})
df = df.sort_values(by=["date"]).reset_index()
df["balance"] = df["tx"].cumsum()

fig, axs = plt.subplots(1, 2)

mbalance = df.groupby(pd.Grouper(key="date", freq="M"))[["balance"]].mean()
print("monthly")
print(mbalance)

qbalance = df.groupby(pd.Grouper(key="date", freq="Q-MAR"))[["balance"]].mean()
print("quarterly")
print(qbalance)

abalance = df.groupby(pd.Grouper(key="date", freq="A-MAR"))[["balance"]].mean()
print("annually")
print(abalance)

df.plot(x="date", y="balance", marker="+", ax=axs[0])
mbalance.plot(linestyle="none", marker="o", ax=axs[0])
qbalance.plot(linestyle="none", marker="s", ax=axs[0])
abalance.plot(linestyle="none", marker="d", ax=axs[0])

plt.legend()

mbalance.plot(linestyle="none", marker="o", ax=axs[1])
qbalance.plot(linestyle="none", marker="s", ax=axs[1])
abalance.plot(linestyle="none", marker="d", ax=axs[1])

plt.legend()

@rhkarls
Copy link
Contributor Author

rhkarls commented Jan 16, 2020

Since the order of plotting clearly matters it might be how pandas decides on representing dates on a numeric scale?

Here's another example showing the problem and how the order matters. This time with two regular spaced series, so the initial title I used might not be so accurate. It seems to happen when pandas deals with known frequencies, either defined on the index or when its able to infer the frequency. Why does it convert the dates to different numeric values depending on the order of plotting?

import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
import pandas as pd

N = 1000
y = np.linspace(0, 10, N)

base_dt = dt.datetime(2000,1,1)
dt_range = [base_dt + dt.timedelta(hours=x) for x in range(N)]
s = pd.Series(index=dt_range, data=y*2)

# Alignment is good when plotting with pyplot first
plt.figure()
plt.plot(dt_range,y, color='b')
s.plot(color='r')

# Alignment is bad when plotting with pandas.plot() first
plt.figure()
s.plot(color='r')
plt.plot(dt_range,y, color='b') # ends up in the 2050's

Looking at the xlim values in when plotting - very different values:

plt.figure()
s.plot(color='r')
plt.xlim() # (262968.0, 263967.0)

plt.figure()
plt.plot(dt_range,y, color='b')
s.plot(color='r')
plt.xlim() # (730120.0, 730161.625)

@rhkarls
Copy link
Contributor Author

rhkarls commented Jan 16, 2020

Came across this 3 year old issue #15071, and seems to be related to this.

jgehrcke added a commit to jgehrcke/ci-analysis that referenced this issue Dec 7, 2020
There's a lot of magic going on between how the datetime64 values
actually encode datetime in plots. Sharing an axis across (sub)plots is
brittle w.r.t. these differences. Work around this, here: make it so that
individual timestamps have a non-zero value for seconds, by simply adding
one second, shifting the whole data set by one second to the left. That
prevents, I guess, an optimization to hit in which would see that
individual timestamps hit the full hour or integer  multiples of 30 or 15
minutes. Also see

 pandas-dev/pandas#15874
 pandas-dev/pandas#15071
 pandas-dev/pandas#31074
 pandas-dev/pandas#29705
 pandas-dev/pandas#29719
 pandas-dev/pandas#18571
 pandas-dev/pandas#11574
 pandas-dev/pandas#22586
@mroeschke mroeschke added the Bug label Jul 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants