Skip to content

ENH: Display IntEnums by name rather then value #36124

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
dzimmanck opened this issue Sep 4, 2020 · 16 comments
Open

ENH: Display IntEnums by name rather then value #36124

dzimmanck opened this issue Sep 4, 2020 · 16 comments
Assignees
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string

Comments

@dzimmanck
Copy link

dzimmanck commented Sep 4, 2020

Is your feature request related to a problem?

When displaying dataframes that contain IntEnums in the index, columns, or data, it would be nice if they displayed similar to Enum types by using the Enum names that correspond to the value rather then the integer values, which somewhat defeats the prupose of using IntEnums over just ints with alias names.

Describe the solution you'd like

I would like pandas to display IntEnums in the same way as Enums.

API breaking implications

Non that I am aware of, as this request is only for a display change.

Describe alternatives you've considered

I have considered two solution which both have drawbacks. First was to monkey patch Int Array Formatter. This ended up not working as Pandas appears to case IntEnums to regulat ints as soon as they are put into the frame. The second was to change the IntEnums to be categorical, which makes logical sense, but to get them to display properly I need to map the data to the names of the enums which changes the underlying data to strings.

Additional context

Here is a simple example.

from enum import Enum, IntEnum
import pandas as pd

class Colors(Enum):
    red = 1
    blue = 2

class IntColors(IntEnum):
    red = 1
    blue = 2


df = pd.DataFrame({Colors.red:[1,2,3], Colors.blue:[5,6,6]})
int_df = pd.DataFrame({IntColors.red:[1,2,3], IntColors.blue:[5,6,6]})

print('This is how I want the dataframe to display')
print(df)
print()
print('This is how it is displayed')
print(int_df)

Which outputs:

This is how I want the dataframe to display
   Colors.red  Colors.blue
0           1            5
1           2            6
2           3            6

This is how it is displayed
   1  2
0  1  5
1  2  6
2  3  6
@dzimmanck dzimmanck added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2020
@dzimmanck dzimmanck changed the title ENH: ENH: Display IntEnums by name rather then value Sep 4, 2020
@TomAugspurger
Copy link
Contributor

Can you provide an example along with your expected output?

@TomAugspurger TomAugspurger added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 4, 2020
@dzimmanck
Copy link
Author

@TomAugspurger

I updated my comment with a simple example.

@TomAugspurger
Copy link
Contributor

Seems that the issue isn't in the repr, it's that pandas converts the IntEnum to integers

In [67]: int_df.columns
Out[67]: Int64Index([1, 2], dtype='int64')

In [70]: pd.Index([IntColors.red])
Out[70]: Int64Index([1], dtype='int64')

@dzimmanck
Copy link
Author

dzimmanck commented Sep 4, 2020

@TomAugspurger, I think that is part of the issue, but they still don't print correctly even when the columns are cast as objects explicitly.

from enum import Enum, IntEnum
import pandas as pd


class IntColors(IntEnum):
    red = 1
    blue = 2

# lets create an index explicitly called out as objects
columns = pd.Index([IntColors.red, IntColors.blue], dtype=object)

print('I see colors')
print(columns)
print()
# now lets build a dataframe with those columns
df = pd.DataFrame([[1,2], [3,4]], columns=columns)

print('I still see colors, so the columns are the correct type')
print(df.columns)
print()
print('But not when then entire dataframe is printed')
print(df)

Which returns:

I see colors
Index([IntColors.red, IntColors.blue], dtype='object')

I still see colors, so the columns are the correct type
Index([IntColors.red, IntColors.blue], dtype='object')

But not when then entire dataframe is printed
   1  2
0  1  2
1  3  4

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Sep 5, 2020

Thanks, that might be a bug. Can you check the formatting code in io/formats/format.py to see where it's converted?

Can you also edit your original post to just include the relevant details (construct dataframe, show actual output, show expected ouptut).

@TomAugspurger TomAugspurger added Output-Formatting __repr__ of pandas objects, to_string and removed Needs Info Clarification about behavior needed to assess issue labels Sep 5, 2020
@asishm
Copy link
Contributor

asishm commented Sep 5, 2020

the main issue seems to be because for IntEnums - IntColors.red == 1 evaluates to True

which causes two things:

  1. without specifying dtype = object, pandas infers the dtype to be integer (lib.infer_dtype)
    that explains
In [70]: pd.Index([IntColors.red])
Out[70]: Int64Index([1], dtype='int64')
  1. when formatting colums.format is called - which calls lib.maybe_convert_objects(values, safe=1) which converts to int, presumably again because IntColors.red == 1 evaluates to True
In [41]: lib.infer_dtype([IntColors.red, IntColors.blue])
Out[41]: 'integer'

In [43]: lib.maybe_convert_objects(columns._values)
Out[43]: array([1, 2])

@TomAugspurger
Copy link
Contributor

Thanks. I think let's separate the two issues as much as possible. Changing the behavior of Index seems harder since it's potentially API-breaking. Let's focus just on the output formatting here when you get an object-dtype index with these enums.

@dzimmanck
Copy link
Author

@TomAugspurger,

I think the issue is in the TableFormatter. The _get_formatter() method re-casts the data as integers if is_integer() evaluates as true before passing to the formatter for display. This explains why all that work I did to try and MonekeyPatch the integer formatter failed. It could no longer differentiate integers from Int Enums:

def _get_formatter(self, i: Union[str, int]) -> Optional[Callable]:
        if isinstance(self.formatters, (list, tuple)):
            if is_integer(i):
                i = cast(int, i)
                return self.formatters[i]
            else:
                return None
        else:
            if is_integer(i) and i not in self.columns:
                i = self.columns[i]
            return self.formatters.get(i, None)

I don't really think the formatter should be re-casting in this manner, but in order to avoid any API breaking changes, we could simply avoid this re-cast JUST for IntEnum types. Something like this:

def _get_formatter(self, i: Union[str, int]) -> Optional[Callable]:
        if isinstance(self.formatters, (list, tuple)):
            if is_integer(i):

                # IntEnums will be displyed differently then ints, so do not re-cast
                if not instance(i, IntEnum):
                    i = cast(int, i)
                    
                return self.formatters[i]
            else:
                return None
        else:
            if is_integer(i) and i not in self.columns:
                i = self.columns[i]
            return self.formatters.get(i, None)

I don't have a pandas dev environment setup to test this, but should be easy to check if you have one with my simple example above. If no one takes that on by tomorrow I will setup a dev environment and do it myself. I do a lot of Cython work so I think I have most of the dependencies already. Would be cool to get a PR in pandas under my name, even such a mundane one.

@dzimmanck
Copy link
Author

I've been busy with my paycheck job, but I think I will have some time to work on this next week.

@dzimmanck
Copy link
Author

take

@dzimmanck
Copy link
Author

I have isolated the issue to the maybe_convert_objects() function in _libs/lib.pyx. This is used in the _format_with_head() method of Index types and converts an array of IntEnums to array of basic ints in the process of formatting.

I can modify the Cython code so as NOT to convert IntEnums. This seems like the most straightforward fix. @TomAugspurger, thoughts?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Oct 2, 2020 via email

@dzimmanck
Copy link
Author

@TomAugspurger

Looking at the code, it looks like timedelta64 went through something similar, and thus added a "convert_timedelta" boolean which defaults to false. I will take a similar approach and add a "convert_intenum" argument which defaults to false in maybe_convert_objects().

I baselined the test results before I started making changes, so I will be able to get a good sense if I broke anything.

@dzimmanck
Copy link
Author

dzimmanck commented Oct 2, 2020

Investigating the most Cythonic way to do the Enum type check. I think I have a good idea but posted on stack overflow to get the best of the best:

https://stackoverflow.com/questions/64176420/what-is-the-best-way-to-check-of-an-object-is-an-enum-in-cython/64176519#64176519

I posted three methods (only two of which work right now) on the question. Not getting a whole lot of contructive feedback. The method I am leaning towards is:

getattr(val, 'name', None) is not None

This does not require importing Enum into Cython for the type check and is consistent with the tzinfo check already used in the function.

@dzimmanck
Copy link
Author

OK, I completed the the fix and added a test. Package still tests out. Will submit a PR and see how it goes.

@willofferfit
Copy link

This needs a Bug label – it's not only an enhancement. Found this because I tried to store IntEnum members inside a pandas dataframe column of object dtype. My workaround was to work with ints and convert elsewhere, which made my code more complex and slower.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants