Skip to content

automatic inversion of x axis by pandas.plot(...) #10118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
iaimf opened this issue May 13, 2015 · 6 comments
Closed

automatic inversion of x axis by pandas.plot(...) #10118

iaimf opened this issue May 13, 2015 · 6 comments
Labels
Milestone

Comments

@iaimf
Copy link

iaimf commented May 13, 2015

X axis was inverted automatically and unexpectedly when plotting a series of data against another series of data using pandas.
My example code blow creates three plots, only some, not all, of which shows inverted x axis. I think this behavior is very confusing for users even if there was some rationale behind it. IMHO, automatic inversion of x axis is unnecessary because a user can use invert_xaxis() in case one wants to invert it. On stackoverflow, a workaround was suggested, but no direct solution.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df2 = pd.DataFrame(np.random.randn(10, 3), columns=["a", "b", "c"])
df3 = df2*1.1

df3.rename(columns={"a": "a*1.1", "b": "b*1.1", "c": "c*1.1"}, inplace=True)
df23 = df2.join(df3)

fig, ax_list = plt.subplots(1,3)

ax=ax_list[0]
df23[["a", "a*1.1"]].plot(ax=ax, x="a")
ax.axis('equal')
ax.set_title("(x,y)=(a,a*1.1)")
print ax.get_xlim()  ## Added for clarity

ax=ax_list[1]
df23[["b", "b*1.1"]].plot(ax=ax, x="b")
ax.axis('equal')
ax.set_title("(x,y)=(b,b*1.1)")
print ax.get_xlim()  ## Added for clarity  

ax=ax_list[2]
df23[["c", "c*1.1"]].plot(ax=ax, x="c")
ax.axis('equal')
ax.set_title("(x,y)=(c,c*1.1)")
print ax.get_xlim()  ## Added for clarity
@TomAugspurger
Copy link
Contributor

A cleaner example: Note the xticklabels on ax2.

In [6]: df = pd.DataFrame({'a': [1, 2, 3], 'b': [3, 2, 1], 'c': [10, 20, 30]})

In [7]: fig, (ax1, ax2) = plt.subplots(ncols=2)

In [8]: df.plot(x='a', y='c', ax=ax1)
Out[8]: <matplotlib.axes._subplots.AxesSubplot at 0x1123fe2b0>

In [9]: df.plot(x='b', y='c', ax=ax2)
Out[9]: <matplotlib.axes._subplots.AxesSubplot at 0x10a32b978>

In [10]: plt.savefig('/Users/tom.augspurger/Desktop/gh.png')

gh

So what's going on here is we call set_index to put the x in the ax.plot into the index. So df.plot(x='a', y='b') should be equivalent to df.set_index('a').b.plot(). The tricky / confusing thing here is that row ordering has a meaning in pandas. I think it would be equally surprising if

In [32]: df.set_index('b').plot(kind='bar')
Out[32]: <matplotlib.axes._subplots.AxesSubplot at 0x114609438>

did sort the x-axis to go small to large. Thoughts?

@TomAugspurger TomAugspurger added the Visualization plotting label May 13, 2015
@iaimf
Copy link
Author

iaimf commented May 13, 2015

By looking at your example of a bar plot, I now understand the source of confusion. You want the horizontal locations of your bars specified by index (i.e. row label), and each bar labeled by the value of column 'b'.

In a scatter plot, I meant the horizontal location of points specified by the value of column 'b', and x axis labeled by the value of column 'b'.

Determination of horizontal position needs to be handled separately from the choice of x tick labels. Apparently they are confused since API reference says "x : label or position, default None".

If you need to specify position by the index, you can use plot(x=None, y='b'). In my opinion, it is natural to assume the position on x-axis is determined by the value of column 'a' when writing a command like plot(x='a', y='b'). Anyway, pandas.plot(...) already has a way to distinguish two cases. So sorting the x-axis should not cause bad side effects. (For some types of plots like 'pie', parameter x may be ignored and forced to be x=None, I guess.)

To clearly separate the way to set position from the way to set labels, a new way to specify x tick labels is necessary, I think. For example, plot(x=None, y='b', xticklabels='a') can mean that the index specifies the position on the x-axis and column 'a' specifies the x axis labels. (Parameter, xticks, is available but it is for setting the position of x tick labels.)

@TomAugspurger
Copy link
Contributor

Determination of horizontal position needs to be handled separately from the choice of x tick labels. Apparently they are confused since API reference says "x : label or position, default None".

The "x: label or position" refers to what you actually pass in, i.e. a label ('b') or position (1), not what the effect is. For me it's cleaner to say that row ordering always matters, and just sort before plotting.

df.sort('b').plot(x='b', y='c', ax=ax2)

@iaimf
Copy link
Author

iaimf commented May 13, 2015

Since I don't know the entire design philosophy of pandas and dataframe, probably your way is more suitable. I can use plt.plot(...) directly. I liked dataframe.plot(...) though. It provides parameters like x, y, ax, and automatically putting axis labels and grid. However, let me point out a few more things.

What "label or position" means.

By "x: label or position" , you mean:

  • label of a column of a dataframe.
  • position of a column of a dataframe.

What I thought is:

  • "position" means coordinate of a point created on a plot.
  • "label" means the label shown on an axis of a plot.

I think, when visualizing, you care more about how it is displayed, i.e. its effect.

How coordinate of a point is specified.
  1. By the row index.
  2. By the value of a given column.

With current design, x coordinates are specified by the row index whereas y coordinates are specified by the value of a column when you use plot(x='a', y='b'). This could be a pitfall for people without knowledge about internals of dataframe.plot. It would be best if these two methods can be explicitly controllable by a user.

Sorting is another thing.

By sorting, you change the mapping between the row index and the value of a column. Thus, if the row index is used as x coordinates, the resulting graph could look different when the plotted points are connected by lines. As shown below; compare the second row and the third row of the subplots.

Range of x axis

The topmost two subplots show that the range of x axis is not selected well when the values of a column is not monotonic. I think this is a separate issue that needs to be fixed.

x

df = pd.DataFrame({'a': [1, 4, 3], 'b': [5, 7, 1], 'c': [10, 8, 12]})

fig, ax = plt.subplots(ncols=2, nrows=3)
# as is
df.plot(x='a', y='c', ax=ax[0,0])
df.plot(x='b', y='c', ax=ax[0,1])

# sorted
df.sort('a').plot(x='a', y='c', ax=ax[1,0])
df.sort('b').plot(x='b', y='c', ax=ax[1,1])

# x coordinates specified by the  values of a column in dataframe.
ax[2,0].plot(df['a'], df['c'])
ax[2,1].plot(df['b'], df['c'])

@nmartensen
Copy link
Contributor

This is fixed in v0.21.0

@TomAugspurger
Copy link
Contributor

Thanks, #16600 was the fix.

@jorisvandenbossche jorisvandenbossche added this to the 0.21.0 milestone Nov 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants