Inconsistent treatment of pandas Index when dtype is object #4483

avm19 · 2024-01-17T21:36:25Z

Each of the following lines is expected to produce a bar plot with three bars. However, in 3 out of 4 cases a two-bar plot is produced. Plotly seems to discard non-numeric objects when at least one numeric object is present.

px.bar(pd.Series([11, 22, 33], [0, 1, 'A']), height=200)  # 2 bars
px.bar(pd.Series([11, 22, 33], ['0', 1, 'A']), height=200)  # 2 bars
px.bar(pd.Series([11, 22, 33], [0, '1', 'A']), height=200)  # 2 bars
px.bar(pd.Series([11, 22, 33], ['0', '1', 'A']), height=200)  # 3 bars

I suppose this could be an issue in plotly.js instead of plotly.py, because the figures' jsons differ only in the presence or absence of quotation marks " around values. But because this unexpected behaviour can be fixed in plotly.py, I open the issue here.

Suggested solutions:

raise a warning when numeric and non-numeric (string) objects are mixed
alternatively, coerce everything to strings when at least one string is present.

The text was updated successfully, but these errors were encountered:

alexcjohnson · 2024-01-18T02:21:50Z

Definitely confusing. What's happening is that plotly.py is not providing a type for the x axis, so plotly.js tries to infer the axis type from the data provided, and these cases are kind of on the edge: in all cases two of the three values are consistent with numeric data (including numbers as strings), but either one (first case), two (second and third cases) or three (last case) are consistent with categories. So in the first three cases we infer a numeric axis and the non-numeric value "A" is treated as null. But if you explicitly tell plotly.js this is a category axis (.update_layout(xaxis_type="category")), it will oblige and stringify each number into a category.

I don't think we want to alter the behavior of plotly.js here - it's meant to automatically ignore things like "N/A" when mostly you have numbers in your data, so a lot of charts would break if we mucked with the defaults. I could imagine a check inside plotly.py that raises a warning on potentially ambiguous data with autotyped axes, though this may have more impact on performance than we really want.

avm19 · 2024-01-18T06:13:02Z

But if you explicitly tell plotly.js this is a category axis (.update_layout(xaxis_type="category")), it will oblige and stringify each number into a category.

This a very useful remark, which almost solves my problem!

Unfortunately, plotly.js does not regard null as a numeric type. That is, now we have

px.bar(pd.Series([11, 22, 33], [0, 1, 'A']), height=200).update_layout(xaxis_type="category")  # 3 bars
px.bar(pd.Series([11, 22, 33], ['0', 1, 'A']), height=200).update_layout(xaxis_type="category")  # 3 bars
px.bar(pd.Series([11, 22, 33], [0, '1', 'A']), height=200).update_layout(xaxis_type="category")  # 3 bars
px.bar(pd.Series([11, 22, 33], ['0', '1', 'A']), height=200).update_layout(xaxis_type="category")  # 3 bars
px.bar(pd.Series([11, 22, 33], ['0', '1', pd.NA]), height=200).update_layout(xaxis_type="category")  # 2 bars

and null will be dropped no matter update_layout(xaxis_type="category").

Why is this relevant? Incidentally, the original problem I was trying to solve is plotting a histogram with NaNs counted as a separate category. As px.histogram does not work out of the box, I was doing something like:

s = pd.Series([0, 0, 1, 1, 1, pd.NA])
vc = s.value_counts(dropna=False)
px.bar(vc)

which did not work as I expected, and some digging led to this issue.

As a workaround, I would have to add something like vc.index = vc.index.map(lambda v: str(v)). But now with update_layout(xaxis_type="category"), I can also do vc.index = vc.index.fillna('missing!').

Thank you for clarification, @alexcjohnson, and feel free to close this issue if you don't think code or docs should be changed.

alexcjohnson · 2024-01-19T03:06:38Z

Unfortunately, plotly.js does not regard null as a numeric type.

Perhaps more importantly, if the axis is numeric there's nowhere we can put a null. But in fact a null (which we try to turn most of the various "NaN" type values into during JSON conversion, though as I was testing this with orjson installed I ran into #3253, but it works fine with the slower std-lib json) is ignored by all axis types, including category axes.

Anyway I'm glad this helped. I'll close the issue but @LiamConnors if you think there's an obvious way to improve documentation about this please reopen.

alexcjohnson closed this as completed Jan 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent treatment of pandas Index when dtype is object #4483

Inconsistent treatment of pandas Index when dtype is object #4483

avm19 commented Jan 17, 2024

alexcjohnson commented Jan 18, 2024

avm19 commented Jan 18, 2024

alexcjohnson commented Jan 19, 2024

Inconsistent treatment of pandas Index when dtype is object #4483

Inconsistent treatment of pandas Index when dtype is object #4483

Comments

avm19 commented Jan 17, 2024

alexcjohnson commented Jan 18, 2024

avm19 commented Jan 18, 2024

alexcjohnson commented Jan 19, 2024