Skip to content

Inconsistent treatment of pandas Index when dtype is object #4483

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
avm19 opened this issue Jan 17, 2024 · 3 comments
Closed

Inconsistent treatment of pandas Index when dtype is object #4483

avm19 opened this issue Jan 17, 2024 · 3 comments

Comments

@avm19
Copy link

avm19 commented Jan 17, 2024

Each of the following lines is expected to produce a bar plot with three bars. However, in 3 out of 4 cases a two-bar plot is produced. Plotly seems to discard non-numeric objects when at least one numeric object is present.

px.bar(pd.Series([11, 22, 33], [0, 1, 'A']), height=200)  # 2 bars
px.bar(pd.Series([11, 22, 33], ['0', 1, 'A']), height=200)  # 2 bars
px.bar(pd.Series([11, 22, 33], [0, '1', 'A']), height=200)  # 2 bars
px.bar(pd.Series([11, 22, 33], ['0', '1', 'A']), height=200)  # 3 bars

I suppose this could be an issue in plotly.js instead of plotly.py, because the figures' jsons differ only in the presence or absence of quotation marks " around values. But because this unexpected behaviour can be fixed in plotly.py, I open the issue here.

Suggested solutions:

  • raise a warning when numeric and non-numeric (string) objects are mixed
  • alternatively, coerce everything to strings when at least one string is present.
@alexcjohnson
Copy link
Collaborator

Definitely confusing. What's happening is that plotly.py is not providing a type for the x axis, so plotly.js tries to infer the axis type from the data provided, and these cases are kind of on the edge: in all cases two of the three values are consistent with numeric data (including numbers as strings), but either one (first case), two (second and third cases) or three (last case) are consistent with categories. So in the first three cases we infer a numeric axis and the non-numeric value "A" is treated as null. But if you explicitly tell plotly.js this is a category axis (.update_layout(xaxis_type="category")), it will oblige and stringify each number into a category.

I don't think we want to alter the behavior of plotly.js here - it's meant to automatically ignore things like "N/A" when mostly you have numbers in your data, so a lot of charts would break if we mucked with the defaults. I could imagine a check inside plotly.py that raises a warning on potentially ambiguous data with autotyped axes, though this may have more impact on performance than we really want.

@avm19
Copy link
Author

avm19 commented Jan 18, 2024

But if you explicitly tell plotly.js this is a category axis (.update_layout(xaxis_type="category")), it will oblige and stringify each number into a category.

This a very useful remark, which almost solves my problem!

Unfortunately, plotly.js does not regard null as a numeric type. That is, now we have

px.bar(pd.Series([11, 22, 33], [0, 1, 'A']), height=200).update_layout(xaxis_type="category")  # 3 bars
px.bar(pd.Series([11, 22, 33], ['0', 1, 'A']), height=200).update_layout(xaxis_type="category")  # 3 bars
px.bar(pd.Series([11, 22, 33], [0, '1', 'A']), height=200).update_layout(xaxis_type="category")  # 3 bars
px.bar(pd.Series([11, 22, 33], ['0', '1', 'A']), height=200).update_layout(xaxis_type="category")  # 3 bars
px.bar(pd.Series([11, 22, 33], ['0', '1', pd.NA]), height=200).update_layout(xaxis_type="category")  # 2 bars

and null will be dropped no matter update_layout(xaxis_type="category").

Why is this relevant? Incidentally, the original problem I was trying to solve is plotting a histogram with NaNs counted as a separate category. As px.histogram does not work out of the box, I was doing something like:

s = pd.Series([0, 0, 1, 1, 1, pd.NA])
vc = s.value_counts(dropna=False)
px.bar(vc)

which did not work as I expected, and some digging led to this issue.

As a workaround, I would have to add something like vc.index = vc.index.map(lambda v: str(v)). But now with update_layout(xaxis_type="category"), I can also do vc.index = vc.index.fillna('missing!').

Thank you for clarification, @alexcjohnson, and feel free to close this issue if you don't think code or docs should be changed.

@alexcjohnson
Copy link
Collaborator

Unfortunately, plotly.js does not regard null as a numeric type.

Perhaps more importantly, if the axis is numeric there's nowhere we can put a null. But in fact a null (which we try to turn most of the various "NaN" type values into during JSON conversion, though as I was testing this with orjson installed I ran into #3253, but it works fine with the slower std-lib json) is ignored by all axis types, including category axes.

Anyway I'm glad this helped. I'll close the issue but @LiamConnors if you think there's an obvious way to improve documentation about this please reopen.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants