Skip to content

BUG: bar and line plots are not aligned on the x-axis/xticks #56653

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

sharonwoo
Copy link
Contributor

@sharonwoo sharonwoo commented Dec 28, 2023

Added a test which will fail on main's self.tick_pos = np.arange(len(data)), which in this test will result in actual_bar_x of [0,1,2,3,4].

def test_bar_plot_x_axis(self, plot_data):
    df = DataFrame(plot_data)
    ax_bar = df["bars"].plot(kind="bar")
    df["pct"].plot(kind="line")
    actual_bar_x = [bar.get_x() + bar.get_width() / 2.0 for bar in ax_bar.patches]
    expected_x = [-1, 0, 1, 2, 3]
    assert actual_bar_x == expected_x
        ```

@sharonwoo sharonwoo force-pushed the BUG-56460-bar-line-plots branch from 8ac6b19 to c8eb0b4 Compare December 29, 2023 05:22
@sharonwoo
Copy link
Contributor Author

@rhshadrach or anyone interested in reviewing, welcoming comments

@sharonwoo sharonwoo force-pushed the BUG-56460-bar-line-plots branch from c8eb0b4 to cf55e0c Compare January 2, 2024 03:42
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting a PR

df = DataFrame(plot_data)
ax_bar = df["bars"].plot(kind="bar")
df["pct"].plot(kind="line")
actual_bar_x = [ax.get_x() + ax.get_width() / 2.0 for ax in ax_bar.patches]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part of the code base is not my area of expertise but what does ax.get_x() return here? I'm a bit surprised we have to do this much work to get the labels

@@ -1805,6 +1805,18 @@ def _kind(self) -> Literal["bar", "barh"]:
def orientation(self) -> PlottingOrientation:
return "vertical"

# GH56460 checks similar to LinePlot's _is_ts_plot and _use_dynamic_x
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the OP it seems like the issue was LinePlot not having the proper labels - is that not the case? If so I think we need to change the logic in that class or maybe even remove it.

Copy link
Contributor Author

@sharonwoo sharonwoo Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for the comment.

I think the related issue here explains it better than I possibly can: #55508

This is my attempt to explain:

When a BarPlot and a LinePlot are combined, because BarPlot sets sequential indices starting from 0 (the line of code: self.tick_pos = np.arange(len(data))) while LinePlot uses what it gets, the combined plot uses the correct labels on BarPlot's incorrect xticks.

If the index starts from 3 instead, the line plot is "shifted" by 3 and partially cut off: #56460 (comment)

(I think the displayed labels are reconciled in ticker._compute_offset with self.locs containing the entire range when it enters, and then not when it exits, but I am not 100% sure as it took quite a bit of stepping through the code to get there and I was interrupted by Christmas festivities around that time.)

@WillAyd WillAyd added the Visualization plotting label Jan 2, 2024
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! It appears to me that if the xticks are not numeric, the current approach here will fail.

Comment on lines 1815 to 1812
if self._is_series and not self._is_ts_plot(data):
return np.array(self._get_xticks(), dtype=int)
else:
return np.arange(len(data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this would fail on the input:

df = pd.DataFrame(
    {
        "bars": {
            "a": 0.5,
            "b": 1.0,
        },
        "pct": {
            "a": 4.0,
            "b": 2.0,
        },
    }
)

Is that correct?

Copy link
Contributor Author

@sharonwoo sharonwoo Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for catching this. That would fail completely.

In the current code, that dataframe would return a graph with xticks of [0,1], since BarPlot currently sets a numpy integer array for xticks with self.tick_pos = np.arange(len(data)).

In the context of this bug, using is_integer_dtype(data.index) should handle the above case within the confines of the integer array BarPlot expects:

def _set_tick_pos(self, data) -> np.ndarray:
    # only set the tick_pos to the index, if index is an integer series
    # else return a data-length array of integers 
    if is_integer_dtype(data.index) and self._is_series:
        return data.index
    else:
        return np.arange(len(data))

Would it, however, be a good idea to expand the scope of this bug fix, to encompass also

  1. When the index is a timestamp
  2. When the index is a set of floats

This would not so much be a bug fix as a change in the way BarPlot handles plotting, though. As a user, I have always just reset_index() in order to handle these cases. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw your comments on the bug report, I also need to address them.

patch.get_bbox().ymax
for patch in sorted(ax.patches, key=lambda patch: patch.get_bbox().xmax)
]
result = [patch.get_bbox().ymax for patch in ax.patches]
Copy link
Contributor Author

@sharonwoo sharonwoo Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as is, this test expects [1,2,3,4] and subsequently [10, 50, 20, 30]. it actually passes on both current BarPlot and PR implementations.

but on the actual displayed graph, the xticks are sorted in ascending order. to investigate what's causing this behaviour and to fix the test (or create a new one to test fixed logic)

@sharonwoo sharonwoo force-pushed the BUG-56460-bar-line-plots branch from aa546e4 to 44b69e9 Compare January 4, 2024 13:38
Copy link
Contributor

github-actions bot commented Feb 4, 2024

This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this.

@github-actions github-actions bot added the Stale label Feb 4, 2024
@sharonwoo sharonwoo force-pushed the BUG-56460-bar-line-plots branch from 44b69e9 to 152374a Compare February 4, 2024 07:42
@sharonwoo
Copy link
Contributor Author

I think this fix is actually irrelevant now because of the discussion in the issue.

@rhshadrach
Copy link
Member

@sharonwoo - no problem, will close for now. If you'd like to take up the fix discussed in the issue, feel free to either reuse this PR (we can reopen) or open a new one.

@rhshadrach rhshadrach closed this Feb 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: bar a line plots are not aligned on the x-axis/xticks
3 participants