-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
ENH: Display IntEnums by name rather then value #36124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you provide an example along with your expected output? |
I updated my comment with a simple example. |
Seems that the issue isn't in the repr, it's that pandas converts the IntEnum to integers In [67]: int_df.columns
Out[67]: Int64Index([1, 2], dtype='int64')
In [70]: pd.Index([IntColors.red])
Out[70]: Int64Index([1], dtype='int64') |
@TomAugspurger, I think that is part of the issue, but they still don't print correctly even when the columns are cast as objects explicitly. from enum import Enum, IntEnum
import pandas as pd
class IntColors(IntEnum):
red = 1
blue = 2
# lets create an index explicitly called out as objects
columns = pd.Index([IntColors.red, IntColors.blue], dtype=object)
print('I see colors')
print(columns)
print()
# now lets build a dataframe with those columns
df = pd.DataFrame([[1,2], [3,4]], columns=columns)
print('I still see colors, so the columns are the correct type')
print(df.columns)
print()
print('But not when then entire dataframe is printed')
print(df) Which returns:
|
Thanks, that might be a bug. Can you check the formatting code in io/formats/format.py to see where it's converted? Can you also edit your original post to just include the relevant details (construct dataframe, show actual output, show expected ouptut). |
the main issue seems to be because for IntEnums - which causes two things:
In [70]: pd.Index([IntColors.red])
Out[70]: Int64Index([1], dtype='int64')
In [41]: lib.infer_dtype([IntColors.red, IntColors.blue])
Out[41]: 'integer'
In [43]: lib.maybe_convert_objects(columns._values)
Out[43]: array([1, 2]) |
Thanks. I think let's separate the two issues as much as possible. Changing the behavior of Index seems harder since it's potentially API-breaking. Let's focus just on the output formatting here when you get an object-dtype index with these enums. |
I think the issue is in the TableFormatter. The _get_formatter() method re-casts the data as integers if is_integer() evaluates as true before passing to the formatter for display. This explains why all that work I did to try and MonekeyPatch the integer formatter failed. It could no longer differentiate integers from Int Enums: def _get_formatter(self, i: Union[str, int]) -> Optional[Callable]:
if isinstance(self.formatters, (list, tuple)):
if is_integer(i):
i = cast(int, i)
return self.formatters[i]
else:
return None
else:
if is_integer(i) and i not in self.columns:
i = self.columns[i]
return self.formatters.get(i, None) I don't really think the formatter should be re-casting in this manner, but in order to avoid any API breaking changes, we could simply avoid this re-cast JUST for IntEnum types. Something like this: def _get_formatter(self, i: Union[str, int]) -> Optional[Callable]:
if isinstance(self.formatters, (list, tuple)):
if is_integer(i):
# IntEnums will be displyed differently then ints, so do not re-cast
if not instance(i, IntEnum):
i = cast(int, i)
return self.formatters[i]
else:
return None
else:
if is_integer(i) and i not in self.columns:
i = self.columns[i]
return self.formatters.get(i, None) I don't have a pandas dev environment setup to test this, but should be easy to check if you have one with my simple example above. If no one takes that on by tomorrow I will setup a dev environment and do it myself. I do a lot of Cython work so I think I have most of the dependencies already. Would be cool to get a PR in pandas under my name, even such a mundane one. |
I've been busy with my paycheck job, but I think I will have some time to work on this next week. |
take |
I have isolated the issue to the maybe_convert_objects() function in _libs/lib.pyx. This is used in the _format_with_head() method of Index types and converts an array of IntEnums to array of basic ints in the process of formatting. I can modify the Cython code so as NOT to convert IntEnums. This seems like the most straightforward fix. @TomAugspurger, thoughts? |
I think that maybe_convert_objects is used in other places, not just for
formatting. Changing it might break other things (or it might not).
…On Thu, Oct 1, 2020 at 6:16 PM dzimmanck ***@***.***> wrote:
I have isolated the issue to the maybe_convert_objects() function in
_libs/lib.pyx. This is used in the _format_with_head() method of Index
types and converts an array of IntEnums to array of basic ints in the
process of formatting.
I can modify the Cython code so as NOT to convert IntEnums. This seems
like the most straightforward fix. @TomAugspurger
<https://github.com/TomAugspurger>, thoughts?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#36124 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAKAOITNUVAUK7JY555REI3SIUETNANCNFSM4QZFZTIQ>
.
|
Looking at the code, it looks like timedelta64 went through something similar, and thus added a "convert_timedelta" boolean which defaults to false. I will take a similar approach and add a "convert_intenum" argument which defaults to false in maybe_convert_objects(). I baselined the test results before I started making changes, so I will be able to get a good sense if I broke anything. |
Investigating the most Cythonic way to do the Enum type check. I think I have a good idea but posted on stack overflow to get the best of the best: I posted three methods (only two of which work right now) on the question. Not getting a whole lot of contructive feedback. The method I am leaning towards is:
This does not require importing Enum into Cython for the type check and is consistent with the tzinfo check already used in the function. |
OK, I completed the the fix and added a test. Package still tests out. Will submit a PR and see how it goes. |
This needs a |
Is your feature request related to a problem?
When displaying dataframes that contain IntEnums in the index, columns, or data, it would be nice if they displayed similar to Enum types by using the Enum names that correspond to the value rather then the integer values, which somewhat defeats the prupose of using IntEnums over just ints with alias names.
Describe the solution you'd like
I would like pandas to display IntEnums in the same way as Enums.
API breaking implications
Non that I am aware of, as this request is only for a display change.
Describe alternatives you've considered
I have considered two solution which both have drawbacks. First was to monkey patch Int Array Formatter. This ended up not working as Pandas appears to case IntEnums to regulat ints as soon as they are put into the frame. The second was to change the IntEnums to be categorical, which makes logical sense, but to get them to display properly I need to map the data to the names of the enums which changes the underlying data to strings.
Additional context
Here is a simple example.
Which outputs:
The text was updated successfully, but these errors were encountered: