-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Inconsistent indexes for tick label plotting #28733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent indexes for tick label plotting #28733
Conversation
So one of the example of the doc revealed a bug in this PR. |
This PR solved the bug related to index, but handling MultiIndex in the same way requires more work and I consider it outside the scope of this PR, so I expressely did not handle it. The best way to handle MultiIndex plot would be via label grouping and multi-level axis label, as described in this issue on matplotlib matplotlib/matplotlib#6321, and this one on pandas #2088. I may try to tackle this one day, but definitly not in this PR. |
pandas/tests/plotting/test_frame.py
Outdated
@@ -1129,14 +1129,18 @@ def test_bar_categorical(self): | |||
for df in [df1, df2]: | |||
ax = df.plot.bar() | |||
ticks = ax.xaxis.get_ticklocs() | |||
tm.assert_numpy_array_equal(ticks, np.array([0, 1, 2, 3, 4, 5])) | |||
tm.assert_numpy_array_equal( | |||
ticks, np.array([0, 1, 2, 3, 4, 5], dtype=np.float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did this need to change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ticks pos are handled by matplotlib fonction convert_xunits, which return a float array rather than the int array previously used.
I would argue that having float array for tick position make more sense to have the same type for all plot.
I do not think that the point of the test was to ensure the type of the tick position, so I changed it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you help me understand what part of this PR would have changed this? Still not sure on this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before the PR, the tick position where defined by using self.tick_pos = np.arange(len(data))
.
In this PR, I let matplotlib handle the position by calling
ax.xaxis.update_units(self.ax_index)
self.tick_pos = ax.convert_xunits(self.ax_index)
This use the matplotlib Conversion Interface. For convertiong of string category, this use numpy float array (see https://matplotlib.org/_modules/matplotlib/category.html#StrCategoryConverter), so the tick position are now float instead of int.
I could cast this back to an int array, but it made more sense IMO to change the test here, that trying to force the tick position to be integer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should stay as integer to not break backwards compatibility - how hard would it be to accomplish that?
38d5392
to
2b8373b
Compare
can you merge master and will look again |
pandas/tests/plotting/test_frame.py
Outdated
@@ -1129,14 +1129,18 @@ def test_bar_categorical(self): | |||
for df in [df1, df2]: | |||
ax = df.plot.bar() | |||
ticks = ax.xaxis.get_ticklocs() | |||
tm.assert_numpy_array_equal(ticks, np.array([0, 1, 2, 3, 4, 5])) | |||
tm.assert_numpy_array_equal( | |||
ticks, np.array([0, 1, 2, 3, 4, 5], dtype=np.float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should stay as integer to not break backwards compatibility - how hard would it be to accomplish that?
doc/source/whatsnew/v1.0.0.rst
Outdated
@@ -1090,6 +1090,7 @@ Plotting | |||
- Bug in color validation incorrectly raising for non-color styles (:issue:`29122`). | |||
- Allow :meth:`DataFrame.plot.scatter` to plot ``objects`` and ``datetime`` type data (:issue:`18755`, :issue:`30391`) | |||
- Bug in :meth:`DataFrame.hist`, ``xrot=0`` does not work with ``by`` and subplots (:issue:`30288`). | |||
- Bug in BarPlot. Tick position where assigned by value order instead of using the value for numeric, or a smart order for sting. (:issue:`26186` and issue:`11465`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you move this to 1.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
bf6fefc
to
82ef684
Compare
I made the dtype for tick array as int as pointed by @WillAyd, see last commit change. |
22608a2
to
bcb47bb
Compare
bcb47bb
to
57e09cc
Compare
@nrebena I think this should be good. Can you fix the merge conflict? |
57e09cc
to
02a52a6
Compare
Rebased and green 🍏 @WillAyd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - @jreback care to look?
doc/source/whatsnew/v1.1.0.rst
Outdated
@@ -345,6 +345,7 @@ Plotting | |||
- :func:`.plot` for line/bar now accepts color by dictonary (:issue:`8193`). | |||
- | |||
- Bug in :meth:`DataFrame.boxplot` and :meth:`DataFrame.plot.boxplot` lost color attributes of ``medianprops``, ``whiskerprops``, ``capprops`` and ``medianprops`` (:issue:`30346`) | |||
- Bug in BarPlot. Tick position where assigned by value order instead of using the value for numeric, or a smart order for sting. (:issue:`26186` and issue:`11465`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if u can add the proper reference here to plot.bar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the notes are duplicated (2nd one looks good)
@@ -1345,6 +1348,16 @@ def _make_plot(self): | |||
|
|||
for i, (label, y) in enumerate(self._iter_data(fillna=0)): | |||
ax = self._get_ax(i) | |||
|
|||
if self.orientation == "vertical": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there any assert in orientation that it takes on only certain values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No assert. Orientation is never a parameter. It is defined by the BarPlot class as "vertical", and overloaded by the Barhplot class as "horizontal". It is also defined in LinePlot class.
Globaly, it is use by the _post_plot_logic_common method from the base class MPLPlot.
c6c23ac
to
5b40eed
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @nrebena
nice change and I think it's very close to get merged,
just two nits since there is conflicts if you are still interested in.
pandas/plotting/_matplotlib/core.py
Outdated
elif self.orientation == "horizontal": | ||
ax.yaxis.update_units(self.ax_index) | ||
self.tick_pos = ax.convert_yunits(self.ax_index).astype(np.int) | ||
self.ax_pos = self.tick_pos - self.tickoffset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this self.ax_pos = self.tick_pos - self.tickoffset
can be taken out from if.. else..
clause, and no need to use it twice. the main purpose of this clause is to get self.tick_pos
for bar and barh separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure thing.
pandas/tests/plotting/test_frame.py
Outdated
errors = gp3.std() | ||
|
||
# No assertion we just ensure that we can plot a MultiIndex bar plot | ||
means.plot.bar(yerr=errors, capsize=4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't we have user warning here? shall catch it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Done.
Hi @nrebena - sorry to chase you up, just wanted to ask if this is still active |
Hi. No worries. I may come around to finish it, but if someone want to take it over no problem. |
5b40eed
to
c66af8e
Compare
0355c8a
to
23635d4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @nrebena , very nice PR! thanks @MarcoGorelli for resolving the conflicts! |
This reverts commit fb379d8.
this PR was reverted, but I guess we don't have a tracking issue. @nrebena can you open one. |
…ing (#28733)" (#39252) Co-authored-by: Simon Hawkins <[email protected]>
The tick position for BarPlot can be define using the convert tool from matplotlib.
The main advantage is that it will reuse the same position when you have text as axis values. Previously, the tick position was determined by the order of the given element, so that ['A', 'B'] where given label [0, 1], and if updating the plot with order ['B', 'A'], you will not draw at the right position.
The resulting plot for each issue would be:


black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff