-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
REF: remove ExtensionArrayFormatter #26833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -849,7 +849,7 @@ def _get_column_name_list(self): | |
# Array formatters | ||
|
||
|
||
def format_array(values, formatter, float_format=None, na_rep='NaN', | ||
def format_array(values, formatter=None, float_format=None, na_rep='NaN', | ||
digits=None, space=None, justify='right', decimal='.', | ||
leading_space=None): | ||
""" | ||
|
@@ -879,14 +879,23 @@ def format_array(values, formatter, float_format=None, na_rep='NaN', | |
List[str] | ||
""" | ||
|
||
if is_extension_array_dtype(values.dtype): | ||
if isinstance(values, (ABCIndexClass, ABCSeries)): | ||
values = values._values | ||
|
||
if is_categorical_dtype(values.dtype): | ||
# Categorical is special for now, so that we can preserve tzinfo | ||
values = values.get_values() | ||
|
||
if not is_datetime64tz_dtype(values.dtype): | ||
values = np.asarray(values) | ||
|
||
if is_datetime64_dtype(values.dtype): | ||
fmt_klass = Datetime64Formatter | ||
elif is_datetime64tz_dtype(values): | ||
fmt_klass = Datetime64TZFormatter | ||
elif is_timedelta64_dtype(values.dtype): | ||
fmt_klass = Timedelta64Formatter | ||
elif is_extension_array_dtype(values.dtype): | ||
fmt_klass = ExtensionArrayFormatter | ||
elif is_float_dtype(values.dtype) or is_complex_dtype(values.dtype): | ||
fmt_klass = FloatArrayFormatter | ||
elif is_integer_dtype(values.dtype): | ||
|
@@ -1181,29 +1190,6 @@ def _format_strings(self): | |
return fmt_values.tolist() | ||
|
||
|
||
class ExtensionArrayFormatter(GenericArrayFormatter): | ||
def _format_strings(self): | ||
values = self.values | ||
if isinstance(values, (ABCIndexClass, ABCSeries)): | ||
values = values._values | ||
|
||
formatter = values._formatter(boxed=True) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @simonjayhawkins Is this still used in another place? (not very familiar with the formatting code, but wondering where this is called to ensure the underlying can determine the formatting) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see it used elsewhere in this PR, or not anymore on master. It might be that we didn't have good tests to cover behaviour where the ExtensionArray deviated from the "normal" behaviour to catch this. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
i suspect that more tests will need to be added. particularly with cases where na_rep is passed to to_html etc or we do have a specific issue here concerning EAs #25099 categorical, sparse, period etc return None (to defer) or just str (with boxed=True), so the _formatter was not adding anything. for datetimelike we have
so will be removed. for integer array, we have
so this is now not being called following the removal of the pre-formatting step. but this likely contributes to the na_rep issues. IMO the _formatter methods should be used from within the Formatter classes, not passed to them. and cannot assign 'NaN' explcitily. this should be part a subsequent refactor, see #26833 (comment) and #26837 i should have a follow-on ready shortly continuing the format_array cleanup. (maybe not till next week due to PyLondinium) |
||
|
||
if is_categorical_dtype(values.dtype): | ||
# Categorical is special for now, so that we can preserve tzinfo | ||
array = values.get_values() | ||
else: | ||
array = np.asarray(values) | ||
|
||
fmt_values = format_array(array, | ||
formatter, | ||
float_format=self.float_format, | ||
na_rep=self.na_rep, digits=self.digits, | ||
space=self.space, justify=self.justify, | ||
leading_space=self.leading_space) | ||
return fmt_values | ||
|
||
|
||
def format_percentiles(percentiles): | ||
""" | ||
Outputs rounded and formatted percentiles. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would we not have both of these in the below if/elif clause?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're deciding which values to use here in the same way as ExtensionArrayFormatter did.
the following if/else clause is selecting the Formatter class to use based on those values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the idea is to extract the datetime64[ns] from the Categorical, and then reuse the Datetime64Formatter by going into the if / elif below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes i get the idea, trying to see if the logic can somehow be simpler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially define
Categorical._formatter
, which would provide the appropriate scalar formatter based on it's.dtype
? Not sure if that'll work or not.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm we actually define this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we are not really using this attribute fully?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
format_array is used by
to_string
,to_html
,_repr_html
,to_latex
and for therepr
of many objects.it should in theory be simpler and more generic.
it may be beneficial to move some logic out into the objects themselves so that
format_array
can work with any extension array and not require this special casing. (i think that is outside of the scope of this PR)This PR is intended to remove the call to
format_array
from withinExtensionArrayFormatter
so that theformatter
parameter offormat_array
can be used for custom formatters wihout defaults being applied.ExtensionArrayFormatter
was dispatching back toformat_array
to then dispatch to the appropriate (another) Formatter.agreed. many just return None to defer to the Formatters.