Skip to content

BUG in plotting timeseries data with twinx (different data representation on each ax) #14322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cygenb0ck opened this issue Sep 29, 2016 · 8 comments · Fixed by #14330
Open
Labels

Comments

@cygenb0ck
Copy link

cygenb0ck commented Sep 29, 2016

Simplified the example.
During issue reporting i updated my pandas version from 0.13 to 0.18.1 - with version 0.13 i was able to add the whole DataFrame to my plot, only after i selected some rows between dates it produces the error. After the update the behaviour became worse - also adding the whole DataFrame to the plot now produces the error.

A small, complete example of the issue

import pandas
import dateutil.parser
import matplotlib.pyplot as plt

p_vals = {
    'x_vals' : [
        "2006-12-17 00:00:00+01:00",
        "2006-12-18 00:00:00+01:00",
        "2006-12-19 00:00:00+01:00",
        "2006-12-20 00:00:00+01:00",
        "2006-12-21 00:00:00+01:00",
        "2006-12-22 00:00:00+01:00",
        "2006-12-23 00:00:00+01:00",
        "2006-12-24 00:00:00+01:00",
        "2006-12-25 00:00:00+01:00",
        "2006-12-26 00:00:00+01:00",
    ],
    'y_vals' : [
        10,9,8,7,6,5,4,3,2,1
    ]
}

p_vals2 = {
    'x_vals' : [
        "2006-12-17 00:00:00+01:00",
        "2006-12-18 00:00:00+01:00",
        "2006-12-19 00:00:00+01:00",
        "2006-12-20 00:00:00+01:00",
        "2006-12-21 00:00:00+01:00",
    ],
    'y_vals' : [
        1,2,3,4,5
    ]
}

p_vals['x_vals'] = [ dateutil.parser.parse(x) for x in p_vals['x_vals'] ]
p_vals2['x_vals'] = [ dateutil.parser.parse(x) for x in p_vals2['x_vals'] ]

df = pandas.DataFrame(data = [1,2,3,4,5], index=["2006-12-17","2006-12-18","2006-12-19","2006-12-20","2006-12-21"])
df.index = pandas.to_datetime(df.index, format="%Y-%m-%d")

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.plot(p_vals['x_vals'], p_vals['y_vals'], color="r")
#ax2.plot(p_vals2['x_vals'], p_vals2['y_vals'], color="b") # works as intended, see second attached image
df.plot(ax=ax2, color="b") # hides data on ax1, see first image

plt.show()

Expected Output

Output of pd.show_versions()

## INSTALLED VERSIONS

commit: None
python: 3.4.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.19.0-69-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.18.1
nose: 1.3.1
pip: 8.1.2
setuptools: 3.3
Cython: None
numpy: 1.11.1
scipy: 0.13.3
statsmodels: None
xarray: None
IPython: 1.2.1
sphinx: None
patsy: None
dateutil: 2.5.3
pytz: 2016.6.1
blosc: None
bottleneck: None
tables: 3.1.1
numexpr: 2.2.2
matplotlib: 1.5.2
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.2.1
html5lib: 0.999
httplib2: None
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
boto: None
pandas_datareader: None
pandas_how_it_looks
pandas_how_it_should_look

@TomAugspurger
Copy link
Contributor

Pandas 0.13 is quite old, can you try with a more recent version? Also see if you can simplify your example a bit.

also the x label look strange.

What do you mean by strange?

@cygenb0ck
Copy link
Author

i just updated to pandas: 0.18.1 - sry for not trying with updated panda version
now both of my plot calls hide the data on the first axis.

sorry for my bad wording - by strange i meant, just, that is looks different

@jorisvandenbossche
Copy link
Member

@cygenb0ck Can you try to simplify the example? Eg try to make it reproducible by not having to read a csv file but just create the data with code. Also try to remove other things that are not essential to the problem as much as possible.

@jorisvandenbossche jorisvandenbossche added Visualization plotting Needs Info Clarification about behavior needed to assess issue labels Sep 30, 2016
@cygenb0ck cygenb0ck changed the title After selecting rows between dates and plotting with matplotlib, plotted rows hide first axis Plotting DataFrame on second axis hides data on first axis - was: Plotting a DataFrame on second axis hides data on first axis Oct 1, 2016
@cygenb0ck
Copy link
Author

@jorisvandenbossche
simplified the example and changed the subject

@cygenb0ck cygenb0ck changed the title Plotting DataFrame on second axis hides data on first axis - was: Plotting a DataFrame on second axis hides data on first axis Plotting DataFrame on second axis hides data on first axis - was: After selecting rows between dates and plotting with matplotlib, plotted rows hide first axis Oct 1, 2016
@jorisvandenbossche jorisvandenbossche removed the Needs Info Clarification about behavior needed to assess issue label Oct 1, 2016
@jorisvandenbossche
Copy link
Member

@cygenb0ck Thanks a lot! That let me look at it, and it's a bit a gotcha with the dates.

To start, it's not an issue with the twinx. Eg if you try the following similar example (but without using datetimes), you will see it works as expected:

fig, ax1 = plt.subplots()
ax2 = ax1.twinx()

ax1.plot([1,3,2], color="r")

df3 = pd.DataFrame({'col': [2,5,1]})
df3.plot(ax=ax2, color="b")
# df3['col'].plot(ax=ax2, color="b") # to plot one column not full dataframe

The reason it does not work with the example data you gave, is not because the plot is overwritten, but because the data on the first ax now fall outside the visible plot (if you zoom out enough, you will see both lines). This is because the dates are handled differently in the two cases.
The reason for that is a problem in pandas' plotting machinery to combine both irregular and regular time series in one plot (and because your data on ax1 have hours (although daily freq), they are regarded as irregular, the data on ax2 are regular). Related issues are #6608, #9053, #13341. We should definitely solve this ...
However, in this case it seems also specific to using twinx, as not using this does also solve the issue (then the second data are plotted fine).

Workaround you can use for now is by also plotting on ax2 with the matplotlib plot call:

fig, ax1 = plt.subplots()
ax1.plot(p_vals['x_vals'], p_vals['y_vals'], color="r")
ax2 = ax1.twinx()
ax2.plot(df.index, df[0].values, color="b")

@jorisvandenbossche jorisvandenbossche changed the title Plotting DataFrame on second axis hides data on first axis - was: After selecting rows between dates and plotting with matplotlib, plotted rows hide first axis BUG in plotting timeseries data with twinx (different data representation on each ax) Oct 1, 2016
@jorisvandenbossche
Copy link
Member

Apparently, using x_compat=True is also a way to get this working:

fig, ax1 = plt.subplots()
ax1.plot(p_vals['x_vals'], p_vals['y_vals'], color="r")
ax2 = ax1.twinx()
df.plot(ax=ax2, x_compat=True, color="b")

It's mentioned in the docs: http://pandas.pydata.org/pandas-docs/stable/visualization.html#suppressing-tick-resolution-adjustment (although for another reason, I am not that familiar with this keyword)

@cygenb0ck
Copy link
Author

@jorisvandenbossche
thank you very much for the workaround with x_compat=True. I can finally plot my data and continue my project.

@jorisvandenbossche
Copy link
Member

This was only partly closed by #14330 (this example still does not work when first plotting the irregular series, #14330 added the test but commented it out)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants