-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: bar a line plots are not aligned on the x-axis/xticks #56460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for the report, confirmed on main. Further investigations and PRs to fix are welcome! |
I'm new to the codebase but this caught my interest because I've never encountered it, but it would be really annoying if I did. I'm still trying to understand the code but while reading it I tried to get a sense of where to look by plotting a few graphs myself.
But if I change my axes for all 3 to 3-7, then I get a shift of 3 instead for the lines (starting at 6 instead of 3): Code to reproduce this to save some editing:
Annnnnd consistent with expectations, we get a shift of -1 for the lines if the axes all start with -1. |
Opened an incomplete PR (logic only, I need to fix CI) for further discussion - this would fix the issue, but should we fix it in the first place? Tldr, BarPlot uses |
take |
Thanks for the investigations here. We also get incorrect results if all the Series used for the plots do not have the same index, e.g.
It does seem reasonable to me for users to expect the order of the bars is preserved, even for numeric indexes:
The bars currently appear with x-ticks 1, 10, 3. It seems difficult to determine appropriate results when there are different indexes in the data in general. I wonder if when plotting and |
I tried, and Matplotlib does this chart correctly:
However for the out of order series, the PR implementation is similar to what Matplotlib currently does (ie bars are out of order):
|
I haven't been able to find documentation on this, but as far as I can tell matplotlib's behavior is:
The third bullet point above extends to when a series of non-numeric values is added to the plot: if there is a symbol not yet seen, it is added as the next bar. If it's already been seen, it is stacked. From the matplotlib docs, I think we may also need to consider "if x has units (e.g. datetime)" as a separate case. I've yet to see how that interacts with the logic above. It seems to me we can replicate this behavior (MultiIndex entries treated as tuples and hence non-numeric). One thing I'm still wondering is why we don't just offload this logic to matplotlib's default behavior. |
Sorry, been a bit busy and under the weather lately (both concurrently sometimes with an ill toddler). If I understand correctly, the scope of this bug fix has been expanded in order to align the pandas wrapper with what matplotlib does, except where bars are out of order (preserve out of order, which would differ from the matplotlib implementation? |
The matplotlib behavior certainly seems very reasonable, better than the status quo for pandas, and currently has my support. That said, if there are issues with it, we do not necessarily have to align with it.
I don't think so. While a user might want to present 1, 10, 3 in that order for bar charts, it seems more valuable to prefer using numeric dtypes as indications of where on the x-axis to place the bar. This can support e.g. multiple bar charts stacked when they don't have the same exact index. In order for the user to then get 1, 10, 3 in that order, they just need to convert the integers into strings and that will give the desired behavior. |
Looks like I'll have some time starting from next Monday to take a serious crack at this. |
take |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
Issue Description
Bar and line plot are not aligned on the x-axis when plotting with Pandas. I saw some somewhat related issues, but they were not exactly this type of plot.
This is the plot generated from the sample code above:
Expected Behavior
Line also starts in index=1, and not index=2 like in the plot above.
Installed Versions
INSTALLED VERSIONS
commit : a671b5a
python : 3.11.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : Intel64 Family 6 Model 158 Stepping 13, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United States.1252
pandas : 2.1.4
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 65.5.0
pip : 23.2.1
Cython : None
pytest : 7.4.3
hypothesis : None
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.3
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.17.2
pandas_datareader : None
bs4 : None
bottleneck : None
dataframe-api-compat: None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 14.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.11.4
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None
The text was updated successfully, but these errors were encountered: