Skip to content

BUG: pandas.DataFrame.plot results in inconsistent/incompatible xticks depending on the date span of the data #43972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
2 of 3 tasks
trenton3983 opened this issue Oct 11, 2021 · 1 comment
Labels

Comments

@trenton3983
Copy link

trenton3983 commented Oct 11, 2021

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the master branch of pandas.

Reproducible Example

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# sample data
dates1 = ['2021-08-26', '2021-08-27', '2021-08-30', '2021-08-31',
          '2021-09-01', '2021-09-02', '2021-09-03', '2021-09-07',
          '2021-09-08', '2021-09-09', '2021-09-10', '2021-09-13',
          '2021-09-14', '2021-09-15', '2021-09-16', '2021-09-17',
          '2021-09-20', '2021-09-21', '2021-09-22', '2021-09-23',
          '2021-09-24', '2021-09-27', '2021-09-28', '2021-09-29',
          '2021-09-30', '2021-10-01', '2021-10-04', '2021-10-05',
          '2021-10-06', '2021-10-07', '2021-10-08']

dates2 = ['2021-08-29', '2021-09-05', '2021-09-12', '2021-09-19', '2021-09-26']

np.random.seed(365)
y1 = np.random.randn(len(dates1)).cumsum()
y2 = np.random.randn(len(dates2)).cumsum()

# dataframe with more than a month span
df1 = pd.DataFrame({'date':pd.to_datetime(dates1), 'y1':y1})
df1.set_index('date', inplace=True)

# dataframe with less than a month span
df2 = pd.DataFrame({'date':pd.to_datetime(dates2), 'y2':y2})
df2.set_index('date', inplace=True)

Issue Description

  • See SO: Plotting two pandas time-series on the same axes with matplotlib - unexpected behavior
  • Using pandas.DataFrame.plot to plot data with a date range span of more than one month with another data set where the date range span is less than a month, on seconday_y, produces unexpected results in how the API formats and plots the xticks, which results in an incorrect visualization in subplot 0.
  • In both cases, it is not clear what format the dates have been converted to for plotting.
  • If dates2 spans at least a month, the issue doesn't occur. (e.g. dates2 = ['2021-08-29', '2021-09-05', '2021-09-12', '2021-09-19', '2021-09-26', '2021-09-29']).
fig, axs = plt.subplots(2, 2, figsize=[12, 12])
axs = axs.flat

print('Note the difference in xticks depending on the date span')
df1.plot(ax=axs[0], title='x-axis is incorrect when the dataframe with\nmore than a month of dates is plotted first')
print(f'axs[0]: {axs[0].get_xticks()}')
df2.plot(ax=axs[0], secondary_y=True)
print(f'axs[0]: {axs[0].get_xticks()}')

df2.plot(ax=axs[1], color='tab:orange', title='x-axis is correct when the dataframe with\nless than a month of dates is plotted first')
print(f'axs[1]: {axs[1].get_xticks()}')
df1.plot(ax=axs[1], color='tab:blue', secondary_y=True)
print(f'axs[1]: {axs[1].get_xticks()}')

df1.y1.plot(ax=axs[2], color='tab:blue', title='More than a month of data')
print(f'axs[2]: {axs[2].get_xticks()}')
df2.y2.plot(ax=axs[3], color='tab:orange', title='Less than a month of data')
print(f'axs[3]: {axs[3].get_xticks()}')

plt.tight_layout()
  • Printed output
Note the difference in xticks depending on the date span
axs[0]: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[0]: [ 2696  4175  6784  9393 12002 14611 17220 18908]
axs[1]: [2696 2697 2700]
axs[1]: [2696 2697 2701 2702]
axs[2]: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[3]: [2696 2697 2700]

image

Expected Behavior

  • Plotting directly with matplotlib.pyplot.plot produces the correct result
fig, axs = plt.subplots(2, 2, figsize=[20, 8], sharey=False, sharex=False)
axs = axs.flatten()

axs[0].plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'axs[0]: {axs[0].get_xticks()}')
ax4 = axs[0].twinx()
ax4.plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'ax4: {ax4.get_xticks()}')

axs[1].plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'axs[1]: {axs[1].get_xticks()}')
ax5 = axs[1].twinx()
ax5.plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'ax5: {ax5.get_xticks()}')

axs[2].plot(df1.index, df1.y1, marker='.', color='tab:blue')
print(f'axs[2]: {axs[2].get_xticks()}')
axs[3].plot(df2.index, df2.y2, marker='.', color='tab:orange')
print(f'axs[3]: {axs[3].get_xticks()}')
  • Print output
axs[0]: [18871. 18878. 18885. 18892. 18901. 18908.]
ax4: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[1]: [18868. 18871. 18875. 18879. 18883. 18887. 18891. 18895.]
ax5: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[2]: [18871. 18878. 18885. 18892. 18901. 18908.]
axs[3]: [18868. 18871. 18875. 18879. 18883. 18887. 18891. 18895.]

image

Installed Versions

INSTALLED VERSIONS

commit : 73c6825
python : 3.8.11.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19043
machine : AMD64
processor : Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252

pandas : 1.3.3
numpy : 1.20.3
pytz : 2021.3
dateutil : 2.8.2
pip : 21.0.1
setuptools : 58.0.4
Cython : 0.29.24
pytest : 6.2.4
hypothesis : None
sphinx : 4.2.0
blosc : None
feather : None
xlsxwriter : 3.0.1
lxml.etree : 4.6.3
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.27.0
pandas_datareader: 0.10.0
bs4 : 4.10.0
bottleneck : 1.3.2
fsspec : 2021.08.1
fastparquet : None
gcsfs : None
matplotlib : 3.4.3
numexpr : 2.7.3
odfpy : None
openpyxl : 3.0.9
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.7.1
sqlalchemy : 1.4.22
tables : 3.6.1
tabulate : None
xarray : None
xlrd : 2.0.1
xlwt : 1.3.0
numba : 0.53.1

@trenton3983 trenton3983 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 11, 2021
@trenton3983 trenton3983 changed the title BUG: pandas.DataFrame.plot results in inconsistent/incompatible xticks depending on the date span of the data BUG: pandas.DataFrame.plot results in inconsistent/incompatible xticks depending on the date span of the data Oct 13, 2021
@mroeschke mroeschke added Visualization plotting and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 16, 2021
@azjps
Copy link

azjps commented Dec 26, 2023

This is the same bug as #52895 (and the other issues linked in #52895 (comment)), the issue is actually that the pandas locators infer that dates2 is periodic and enters a different code path where it will plot the periodic time series with different units. In the second example where dates2 contains 2021-09-29, then dates2 is no longer periodic, which inadvertently avoids the issue of pandas plotting in different units.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants