Skip to content

BUG: Horizontal boxplots on subplots throws ValueError #36918

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
H-SG opened this issue Oct 6, 2020 · 5 comments · Fixed by #45465
Closed
3 tasks done

BUG: Horizontal boxplots on subplots throws ValueError #36918

H-SG opened this issue Oct 6, 2020 · 5 comments · Fixed by #45465
Assignees
Labels
Milestone

Comments

@H-SG
Copy link

H-SG commented Oct 6, 2020

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))

f, axs = plt.subplots(1, 2, figsize=(10,7), sharey=True)
df1.boxplot(ax=axs[0], showfliers=False, vert=False)
df2.boxplot(ax=axs[1], showfliers=False, vert=False)

Problem description

Above code snippet throws ValueError: The number of FixedLocator locations (8), usually from a call to set_ticks, does not match the number of ticklabels (4). due to changes introduced in matplotlib 3.3.0.

It looks like fixes were implemented as part of #35393 in boxplot.py for vert=True but not for vert=False. Performing the len(ticks) != len(keys) check for ticks = ax.get_yticks() and the associated keys list doubling snippet fixes the issue.

Not sure if that is the right fix, but works for my use case.

Expected Output

Two sets of horizontal boxplot charts

Output of pd.show_versions()

INSTALLED VERSIONS

commit : 2a7d332
python : 3.7.3.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-48-generic
Version : #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.1.2
numpy : 1.19.2
pytz : 2020.1
dateutil : 2.8.1
pip : 19.1.1
setuptools : 41.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.5.2
html5lib : None
pymysql : None
psycopg2 : 2.8.6 (dt dec pq3 ext lo64)
jinja2 : 2.11.2
IPython : 7.18.1
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.2
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pyxlsb : None
s3fs : None
scipy : 1.5.2
sqlalchemy : 1.3.19
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@H-SG H-SG added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 6, 2020
@charlesdong1991 charlesdong1991 added Visualization plotting and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Oct 7, 2020
@charlesdong1991
Copy link
Member

@H-SG PR is welcome!

@charlesdong1991 charlesdong1991 added this to the Contributions Welcome milestone Oct 7, 2020
@francibm97
Copy link

francibm97 commented Oct 12, 2020

Hello, I have also looked into this.

The problem rises when sharex is set to True because I think that then matplotlib considers the number of ticks of an xaxis of a particular subplot to be equal the total number of ticks of all xaxis of the figure.
This also happens for the yaxis if sharey is set to True.

So for example, if I have got two subplots with 3 vertical boxplots each and sharex=True, each subplot has got 6 ticks in their xaxis even though only 3 ticks are showed on each xaxis of the 2 subplots.

I do not know if this is a bug or intended behaviour, but anyway now matplotlib wants the precise number of ticks, and so if we pass a list of 3 items to set_xticklabels on either of the subplots axis, an error is risen.

The author of #35393 already found out this issue and, for the xaxis, he worked around this problem by replicating the same labels by the number of subplots (but not for the yaxis, therefore why it fails as it is pointed out in this issue).

However, if we take the slightly modified example that he posted, the behaviour of matplotlib looks still buggy to me:

import pandas as pd
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.ticker
import numpy as np

fig, (ax1, ax2) = plt.subplots(ncols=2, sharex=True)
ax1.boxplot([
    np.array([1, 2, 3, 4]),
    np.array([1, 2, 3, 4]),
    np.array([1, 2, 3, 4])
])
ax2.boxplot([
    np.array([1, 2, 2, 3]),
    np.array([1, 2, 3, 4]),
    np.array([1, 2, 3, 4])
])

ax1.set_xticklabels(['A', 'B', 'C', 'D', 'E', 'F'])

Matplotlib 3.1.1:

3 1 1

Matplotlib 3.3.2:

3 3 2

Note that in matplotlib 3.3.2, the same letters are drawn twice.

However, if we use ax1.xaxis.set_major_formatter(matplotlib.ticker.FixedFormatter(['A', 'B', 'C'])), the result looks as intended, there are no errors and the letters do not look to be drawn twice.

So I think that, in order to solve this issue in pandas, one of these two things could be done:

  • implement the workaround of replicating the labels by the number of the subplots also for the yaxis (now it is currently done only for the xaxis); however, at least in some cases, this fix may cause the same letters to be drawn twice
  • use ax.axis.set_major_formatter(matplotlib.ticker.FixedFormatter(...) which works with the correct number of labels and does not cause overdrawings; however, also matplotlib.ticker has to be imported

Let me know what you think, I can do a PR if needed

Edit: spelling

@charlesdong1991
Copy link
Member

sure! @francibm97 the issue hasn't been taken, feel free to take it and PR is very welcomed!!

@francibm97
Copy link

@charlesdong1991 ok nice I'll implement one of the two proposed solutions and create a PR

@francibm97
Copy link

take

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment