Skip to content

REF: remove ExtensionArrayFormatter #26833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 13, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 3 additions & 5 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -1061,11 +1061,9 @@ def _format_with_header(self, header, **kwargs):

def _format_native_types(self, na_rep='NaN', quoting=None, **kwargs):
""" actually format my specific types """
from pandas.io.formats.format import ExtensionArrayFormatter
return ExtensionArrayFormatter(values=self,
na_rep=na_rep,
justify='all',
leading_space=False).get_result()
from pandas.io.formats.format import format_array
return format_array(values=self, na_rep=na_rep, justify='all',
leading_space=False)

def _format_data(self, name=None):

Expand Down
38 changes: 12 additions & 26 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -849,7 +849,7 @@ def _get_column_name_list(self):
# Array formatters


def format_array(values, formatter, float_format=None, na_rep='NaN',
def format_array(values, formatter=None, float_format=None, na_rep='NaN',
digits=None, space=None, justify='right', decimal='.',
leading_space=None):
"""
Expand Down Expand Up @@ -879,14 +879,23 @@ def format_array(values, formatter, float_format=None, na_rep='NaN',
List[str]
"""

if is_extension_array_dtype(values.dtype):
if isinstance(values, (ABCIndexClass, ABCSeries)):
values = values._values

if is_categorical_dtype(values.dtype):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would we not have both of these in the below if/elif clause?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're deciding which values to use here in the same way as ExtensionArrayFormatter did.

the following if/else clause is selecting the Formatter class to use based on those values.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the idea is to extract the datetime64[ns] from the Categorical, and then reuse the Datetime64Formatter by going into the if / elif below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes i get the idea, trying to see if the logic can somehow be simpler

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could potentially define Categorical._formatter, which would provide the appropriate scalar formatter based on it's .dtype? Not sure if that'll work or not.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm we actually define this

    def _formatter(self, boxed=False):
        # Defer to CategoricalFormatter's formatter.
        return None

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we are not really using this attribute fully?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

format_array is used by to_string, to_html, _repr_html, to_latex and for the repr of many objects.

it should in theory be simpler and more generic.

it may be beneficial to move some logic out into the objects themselves so that format_array can work with any extension array and not require this special casing. (i think that is outside of the scope of this PR)

This PR is intended to remove the call to format_array from within ExtensionArrayFormatter so that the formatter parameter of format_array can be used for custom formatters wihout defaults being applied.

ExtensionArrayFormatter was dispatching back to format_array to then dispatch to the appropriate (another) Formatter.

maybe we are not really using this attribute fully?

agreed. many just return None to defer to the Formatters.

# Categorical is special for now, so that we can preserve tzinfo
values = values.get_values()

if not is_datetime64tz_dtype(values.dtype):
values = np.asarray(values)

if is_datetime64_dtype(values.dtype):
fmt_klass = Datetime64Formatter
elif is_datetime64tz_dtype(values):
fmt_klass = Datetime64TZFormatter
elif is_timedelta64_dtype(values.dtype):
fmt_klass = Timedelta64Formatter
elif is_extension_array_dtype(values.dtype):
fmt_klass = ExtensionArrayFormatter
elif is_float_dtype(values.dtype) or is_complex_dtype(values.dtype):
fmt_klass = FloatArrayFormatter
elif is_integer_dtype(values.dtype):
Expand Down Expand Up @@ -1181,29 +1190,6 @@ def _format_strings(self):
return fmt_values.tolist()


class ExtensionArrayFormatter(GenericArrayFormatter):
def _format_strings(self):
values = self.values
if isinstance(values, (ABCIndexClass, ABCSeries)):
values = values._values

formatter = values._formatter(boxed=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@simonjayhawkins Is this still used in another place? (not very familiar with the formatting code, but wondering where this is called to ensure the underlying can determine the formatting)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it used elsewhere in this PR, or not anymore on master. It might be that we didn't have good tests to cover behaviour where the ExtensionArray deviated from the "normal" behaviour to catch this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be that we didn't have good tests to cover behaviour where the ExtensionArray deviated from the "normal" behaviour to catch this.

i suspect that more tests will need to be added. particularly with cases where na_rep is passed to to_html etc or

we do have a specific issue here concerning EAs #25099

categorical, sparse, period etc return None (to defer) or just str (with boxed=True), so the _formatter was not adding anything.

for datetimelike we have

    def _formatter(self, boxed=False):
        # TODO: Remove Datetime & DatetimeTZ formatters.
        return "'{}'".format

so will be removed.

for integer array, we have

    def _formatter(self, boxed=False):
        def fmt(x):
            if isna(x):
                return 'NaN'
            return str(x)
        return fmt

so this is now not being called following the removal of the pre-formatting step. but this likely contributes to the na_rep issues.

IMO the _formatter methods should be used from within the Formatter classes, not passed to them. and cannot assign 'NaN' explcitily.

this should be part a subsequent refactor, see #26833 (comment) and #26837

i should have a follow-on ready shortly continuing the format_array cleanup. (maybe not till next week due to PyLondinium)


if is_categorical_dtype(values.dtype):
# Categorical is special for now, so that we can preserve tzinfo
array = values.get_values()
else:
array = np.asarray(values)

fmt_values = format_array(array,
formatter,
float_format=self.float_format,
na_rep=self.na_rep, digits=self.digits,
space=self.space, justify=self.justify,
leading_space=self.leading_space)
return fmt_values


def format_percentiles(percentiles):
"""
Outputs rounded and formatted percentiles.
Expand Down