BUG: Fixes #36918 boxplots with matplotlib #37107

francibm97 · 2020-10-13T19:42:22Z

I believe that the @TomAugspurger in #35393 found out that, when sharex or sharey are enabled, matplotlib counts the number of ticks of an axis in a subplot as the total number of ticks of the shared axis in all subplots. Since matplotlib became stricter with set_xticks, the proposed workaround was replicating the labels n times where n is "total number of ticks"/"number of ticks of this subplot"="number of subplots".

However, that fix was implemented only for vertical boxplots and not for horizontal boxplots, hence why the issue.
Also, that fix causes the labels of the shared axis to be drawn n times (where n is the number of subplots) in some cases, see below (matplotlib 3.3.2, code adapted from the one posted by the author of issue 36918)

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib

df1 = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0, 100, size=(100, 4)), columns=list('ABCD'))

f, axs = plt.subplots(nrows=2, ncols=1, figsize=(10,7), sharex=True)
df1.boxplot(ax=axs[0], vert=True)
df2.boxplot(ax=axs[1], vert=True)

plt.show()

In this fix I used set_major_formatter with a FixedFormatter which fixes the issue and does not overdraw the labels:

charlesdong1991

will need a test and a whatnew note

francibm97 · 2020-10-15T20:23:59Z

Sorry I've now added a test and a note in the 1.2.0 rst

charlesdong1991

thanks, the change looks very nice!!

btw, just a question to clarify, are you basically solving two issues in this PR: one is error raised when vert=False, and the other one is overlapping/duplicate labels when sharex/y=True

if so, maybe clarify it in the whatsnew, and better to make two tests, one to have df and a single plot with vert=False to prove it works, the other test which is similar to what you have now to show the duplicate/overlapping labels issue is gone?

charlesdong1991 · 2020-10-16T08:52:41Z

doc/source/whatsnew/v1.2.0.rst

@@ -437,6 +437,7 @@ Plotting
 - Bug in :meth:`DataFrame.plot` was rotating xticklabels when ``subplots=True``, even if the x-axis wasn't an irregular time series (:issue:`29460`)
 - Bug in :meth:`DataFrame.plot` where a marker letter in the ``style`` keyword sometimes causes a ``ValueError`` (:issue:`21003`)
 - Twinned axes were losing their tick labels which should only happen to all but the last row or column of 'externally' shared axes (:issue:`33819`)
+- Bug in :meth:`DataFrame.boxplot` was raising ``ValueError`` when plotting with ``vert=True`` on a subplot with shared axes (:issue:`36918`)


ehh, isn't the original issue about erroring when vert=False?

Yes, sorry my bad I will correct it

charlesdong1991 · 2020-10-16T08:53:13Z

pandas/plotting/_matplotlib/boxplot.py

+        axis.set_major_formatter(FixedFormatter(keys))
+        ax.tick_params(axis=axis.axis_name, which="major", rotation=rot)


nice clean-up!

charlesdong1991 · 2020-10-16T08:53:32Z

pandas/tests/plotting/test_boxplot_method.py

+    def test_boxplot_shared_axis(self, vert):
+        # GH 37107
+        df1 = DataFrame(np.random.random((100, 5)), columns=["A", "B", "C", "D", "E"])
+        df2 = DataFrame(np.random.random((100, 5)), columns=["A", "B", "C", "D", "E"])


Suggested change

df2 = DataFrame(np.random.random((100, 5)), columns=["A", "B", "C", "D", "E"])

df2 = df1.copy()

ahh, if it's to generate two dfs with different random numbers, ignore, but still, probably one df is enough

charlesdong1991 · 2020-10-16T08:55:48Z

pandas/tests/plotting/test_boxplot_method.py

+        # In order for the ticklabels to be placed, the plot has to be drawn
+        fig.canvas.draw()
+
+        for ax in axes:


i might not fully understand, why needs two dfs and two plots (in each of axes)? can we just use one?

charlesdong1991 · 2020-10-16T08:59:25Z

pandas/tests/plotting/test_boxplot_method.py

+            assert len(labels) % 5 == 0
+            assert len(labels) // (len(labels) // 5) == 5
+            assert labels[:5] == ["A", "B", "C", "D", "E"]


can you maybe simply it a bit? seems we want to test there are 10 labels while the first 5 is those column names, and last 5 are empty?

francibm97 · 2020-10-16T17:45:06Z

Thanks for the feedback about the tests, I looked better into the issue of the duplicated and empty ticks and checked better if it was really related to the shared axis or not.

Turns out that it was not.

The problem is that when matplotlib's boxplot is called multiple times on the same axis (or on different shared axes) with manage_ticks=True (which is the default), it will append a new list of locations to the previous one of the FixedLocator without removing duplicates. See https://github.com/matplotlib/matplotlib/blob/c8203fb8ed1a24fb999942340e55ea4a7a9fe6e8/lib/matplotlib/axes/_axes.py#L4066

This is really why there are double ticks when boxplot is called twice (on two shared axes or on the same axis). Therefore the previous workaround would overdraw the labels both in the case of using boxplot on two shared axes and in the case of using boxplot on the same axis twice, because, for what concernes the ticklabels, the two situation are the same.
I leave a sample code to check that behaviour:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.random((100, 5)), columns=["A", "B", "C", "D", "E"])

fig, ax = plt.subplots(1, 1)
df.boxplot(ax=ax, vert=True)
df.boxplot(ax=ax, vert=True)
plt.show()

So now I think that the duplicated ticks are not a feature of the shared axis as I initially thought, but rather a "feature" of matplotlib's boxplot which does not handle well multiple calls on the same axis and which also incidentally is problematic on shared axes. I think that I will rise an issue in the matplotlib repo because, for other plots such as scatter, this problem of double ticks does not arise:

import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
_ = ax.scatter(['a'], [1])
_ = ax.scatter(['a'], [2])
fig.canvas.draw()

print(ax.xaxis.get_ticklabels())
[Text(0, 0, 'a')]


import matplotlib.pyplot as plt

fig, ax = plt.subplots(1, 1)
_ = ax.boxplot([1,2,3])
_ = ax.boxplot([5,6,7])
fig.canvas.draw()

print(ax.xaxis.get_ticklabels())
[Text(1, 0, '1'), Text(1, 0, '1')]

The problem can be solved by preventing this redundant ticks to show up when multiple call of boxplot are made.
This can be done by setting a new FixedLocator along with the FixedFormatter so that there are no duplicated ticks.
This in turn solves the two things you mentioned:

The problem with the horizontal boxplots
The overdrawing of the labels

However I have got some questions:

If the user specifies manage_ticks=False, should pandas add the labels anyways or not?
Is the user supposed to run boxplot multiple times on the same axis? If yes, then the proposed workaround cannot be accepted because something like the code reported below wouldn't work

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame(np.random.random((100, 5)), columns=["A", "B", "C", "D", "E"])

fig, ax = plt.subplots(1, 1)
df.boxplot(ax=ax)
df.boxplot(ax=ax, positions=[10, 11, 12, 13, 14])
plt.show()
-> Only the second boxplot is labels, the first one has no labels

There could be other solutions (such as removing the duplicates in the FixedLocator)

I do not know if a more complex solution is acceptable if it can work in a wider range of cases,
or if this simple solution is preferred

Sorry for the wall of text and for changing the reviewed code

(TODO: still to update the whatsnew and test)

github-actions · 2020-11-16T00:11:15Z

This pull request is stale because it has been open for thirty days with no activity. Please update or respond to this comment if you're still interested in working on this.

jreback · 2020-11-17T01:28:20Z

@francibm97 if you can merge master and update we can get this reviewed and in

jreback · 2021-02-11T01:33:28Z

closing as stale. if you want to continue, pls ping and can re-open.

BUG: Fixes pandas-dev#36918 boxplots with matplotlib

f98033b

charlesdong1991 suggested changes Oct 14, 2020

View reviewed changes

francibm97 added 2 commits October 15, 2020 21:22

Added test

dd5df89

Added whatsnew

bc1d31e

charlesdong1991 suggested changes Oct 16, 2020

View reviewed changes

Update fix

1bbe662

github-actions bot added the Stale label Nov 16, 2020

jreback closed this Feb 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Fixes #36918 boxplots with matplotlib #37107

BUG: Fixes #36918 boxplots with matplotlib #37107

francibm97 commented Oct 13, 2020

charlesdong1991 left a comment

francibm97 commented Oct 15, 2020

charlesdong1991 left a comment

charlesdong1991 Oct 16, 2020

francibm97 Oct 16, 2020

charlesdong1991 Oct 16, 2020

charlesdong1991 Oct 16, 2020 •

edited

Loading

charlesdong1991 Oct 16, 2020

charlesdong1991 Oct 16, 2020

francibm97 commented Oct 16, 2020

github-actions bot commented Nov 16, 2020

jreback commented Nov 17, 2020

jreback commented Feb 11, 2021

		axis.set_major_formatter(FixedFormatter(keys))
		ax.tick_params(axis=axis.axis_name, which="major", rotation=rot)

	df2 = DataFrame(np.random.random((100, 5)), columns=["A", "B", "C", "D", "E"])
	df2 = df1.copy()

BUG: Fixes #36918 boxplots with matplotlib #37107

BUG: Fixes #36918 boxplots with matplotlib #37107

Conversation

francibm97 commented Oct 13, 2020

charlesdong1991 left a comment

Choose a reason for hiding this comment

francibm97 commented Oct 15, 2020

charlesdong1991 left a comment

Choose a reason for hiding this comment

charlesdong1991 Oct 16, 2020

Choose a reason for hiding this comment

francibm97 Oct 16, 2020

Choose a reason for hiding this comment

charlesdong1991 Oct 16, 2020

Choose a reason for hiding this comment

charlesdong1991 Oct 16, 2020 • edited Loading

Choose a reason for hiding this comment

charlesdong1991 Oct 16, 2020

Choose a reason for hiding this comment

charlesdong1991 Oct 16, 2020

Choose a reason for hiding this comment

francibm97 commented Oct 16, 2020

github-actions bot commented Nov 16, 2020

jreback commented Nov 17, 2020

jreback commented Feb 11, 2021

charlesdong1991 Oct 16, 2020 •

edited

Loading