Skip to content

Weird DatetimeIndex + secondary y axis plotting issue #18768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
soxofaan opened this issue Dec 13, 2017 · 6 comments
Open

Weird DatetimeIndex + secondary y axis plotting issue #18768

soxofaan opened this issue Dec 13, 2017 · 6 comments
Labels
Bug Datetime Datetime data dtype Visualization plotting

Comments

@soxofaan
Copy link
Contributor

soxofaan commented Dec 13, 2017

Today I stumbled on weird issue with plotting two series with a datetimeindex and a secondary axis.

Code that illustrates the problem:

a = pd.Series(
    [16, 13, 11],
    index=pd.to_datetime(['2017-09-13', '2017-09-14', '2017-09-16'], format='%Y-%m-%d')
)
b = pd.Series(
    [23, 27, 25],
    index=pd.to_datetime(['2017-09-13', '2017-09-14', '2017-09-15'], format='%Y-%m-%d')
)

# Combinations of `secondary_y` values to try out.
secondary_ys = [(False, False), (False, True), (True, False), (True, True)]

fig, axes = plt.subplots(figsize=(10, 2), ncols=4, nrows=1)
for (sya, syb), ax in zip(secondary_ys, axes.flat):
    a.plot(ax=ax, style='o-', secondary_y=sya)
    b.plot(ax=ax, style='x-', secondary_y=syb)
    ax.set_title('a:%r - b:%r' % (sya, syb))
  • b is a straightforward time series with successive days
  • a has a one day jump in the last item
  • the for loop tries out all the possible combinations of assigning these two series to the primary or secondary y axis in a plot
  • Whenever one or more series is assigned to the secondary y axis, the x axis is completely confused:

screen shot 2017-12-13 at 18 21 48

More in depth experimentation (and version information) can be found in the notebook at https://gist.github.com/soxofaan/9fdfdeafb8fb555dd8547bc48a27e2f3
(also notice the inconsistant x axes label formatting and orientation)

@soxofaan
Copy link
Contributor Author

FYI: just upgraded to pandas 0.21.1 and issue still manifests itself

@TomAugspurger
Copy link
Contributor

Thanks for checking on the new release.

Could you try dropping some breakpoints in pandas/plotting/_core.py to see what's going on? Maybe somewhere around

from pandas.plotting._timeseries import (_maybe_resample,
? I think we might store some data on the axes object, and I suspect that could be confusing things when you have a secondary y. Or maybe it's something else :)

@TomAugspurger TomAugspurger added Datetime Datetime data dtype Visualization plotting labels Dec 13, 2017
@TomAugspurger TomAugspurger added this to the Next Major Release milestone Dec 13, 2017
@soxofaan
Copy link
Contributor Author

Ok, this was quite a rabbit hole to debug, let's see if I can manage to write down what I found.

Main reason that the lines in graph with primary and secondary axis do not properly cover the same x range, is the difference in conversion between:

  • pandas.plotting._converter.DatetimeConverter converts date to number of days since 0001-01-01 plus one. Example: 2017-12-16 -> 736679
  • pandas.plotting._converter.PeriodConverter convers to number of days since 1970-01-01. Example: 2017-12-16 -> 17516

Short description what is happening, starting with the same data as above:

a = pd.Series([16, 13, 11], index=pd.to_datetime(['2017-09-13', '2017-09-14', '2017-09-16'], format='%Y-%m-%d'))
b = pd.Series([23, 27, 25], index=pd.to_datetime(['2017-09-13', '2017-09-14', '2017-09-15'], format='%Y-%m-%d'))

First plot on primary axis:

a.plot(ax=ax, style='o-', secondary_y=False)

in LinePlot._make_plot the self._is_ts_plot() check returns False

if self._is_ts_plot():
because (among other reasons) in pandas.plotting._timeseries._use_dynamic_x the frequency variables freq (and ax_freq) are None for ("irregular") series a.
def _use_dynamic_x(ax, data):

As a result of this a DatetimeConverter is set up as converter for the x axis .

Second plot on secondary axis:

b.plot(ax=ax, style='o-', secondary_y=True)

For the secondary axis twin axes is created.
Now, in LinePlot._make_plot, the self._is_ts_plot() check returns True:

  • in pandas.plotting._timeseries._use_dynamic_x we have now freq='D' (and ax_freq = None ) because the b series has a clean successive day index.
  • a bit further there is
    else: # do not use tsplot if irregular was plotted first
    which, if I understand the comment correctly, seems to want to cover the problem of mixing "irregular" and "clean" time indexes. However: there is an additional check len(ax.get_lines()) > 0 which evaluates to False because the previous plot is done on the primary axes, and we can not reach that from here and we count zero lines.
  • the control flow in _use_dynamic_x goes a bit further, but finally returns True

Back in LinePlot._make_plot, the series with DatetimeIndex is now converted to one with PeriodIndex, and as a result a PeriodConverter is associated now with the axes.

Summary

During plot of a the axes are initially set up using the DatetimeConverter mapping (e.g. for xlim stuff). During plot of b a non compatible PeriodConverter enters the stage. The end result is confusion in x axis conversion for the two line plots.

I guess there are several possible solutions (but my understanding of the inner workings of pandas and matplotlib are limited, so I don't really know the best way forward in terms of breaking things and keeping things managable):

  • make sure DatetimeConverter and PeriodConverter are "more compatible"
  • for pandas.plotting._timeseries._use_dynamic_x in a secondary axes context: make it possible to check if there are already lines in the corresponding primary axes, to avoid replacing the DatetimeConverter with PeriodConverter
  • don't do the conversion to a PeriodIndex series/dataframe in LinePlot._make_plot in the first place

I hope this sheds some light on what is happening here.

(note that I didn't look into the case where both plots are done on the secondary axes, but I guess it is related)

@ajoros
Copy link

ajoros commented Apr 27, 2018

Has there been any progress on this issue at all? I have been struggling to find some solutions.

@TomAugspurger
Copy link
Contributor

I haven't had time to go through @soxofaan's (excellent) debugging yet.

@biddisco
Copy link

biddisco commented May 1, 2019

Is there any kind of workaround that can be used to disable the labels on either the primary or secondary plot so that we can can get what we want with the first dataset and then overlay the second without the labels disappearing? (I experimented with different formatters/locators without success)

@mroeschke mroeschke added the Bug label Jun 12, 2021
@mroeschke mroeschke removed this from the Contributions Welcome milestone Oct 13, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Datetime Datetime data dtype Visualization plotting
Projects
None yet
Development

No branches or pull requests

5 participants