Skip to content

REF: remove special-casing for internal EAs from format_array #26965

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -857,7 +857,7 @@ def __repr__(self):
def _formatter(
self,
boxed: bool = False,
) -> Callable[[Any], Optional[str]]:
) -> Optional[Callable[[Any], Optional[str]]]:
"""Formatting function for scalar values.

This is used in the default '__repr__'. The returned formatting
Expand All @@ -881,7 +881,7 @@ def _formatter(
``boxed=True``.
"""
if boxed:
return str
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can None ever be a correct formatter function?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to pass nested values through unchanged to allow custom formatting options on the underlying floats

>>> import scipy.sparse
>>> from pandas import SparseSeries
>>> sparse = SparseSeries.from_coo(scipy.sparse.rand(350, 18))
C:\Users\simon\OneDrive\code\pandas-simonjayhawkins\pandas\core\sparse\scipy_sparse.py:151: FutureWarning: Series.to_sparse is deprecated and will be removed in a future version
  s = s.to_sparse()  # TODO: specify kind?
>>> sparse._formatting_values()._formatting_values()
array([0.1057938 , 0.12346779, 0.66926098, 0.63135293, 0.51537009,
       0.83084751, 0.11707738, 0.94582471, 0.53306677, 0.76971943,
       0.43649237, 0.21300667, 0.36538291, 0.75462779, 0.2525123 ,
       0.97204424, 0.62077108, 0.77137921, 0.77416063, 0.06010261,
       0.0758256 , 0.81359155, 0.493831  , 0.86517892, 0.99948485,
       0.16560856, 0.10064932, 0.2234172 , 0.37112586, 0.35745351,
       0.03196741, 0.99648038, 0.39904145, 0.17162981, 0.98518366,
       0.85796259, 0.91980664, 0.78111288, 0.00913085, 0.61513879,
       0.68807133, 0.13308309, 0.99738934, 0.41772456, 0.30984901,
       0.23902977, 0.52296826, 0.90989797, 0.25194294, 0.28347783,
       0.76426202, 0.68137012, 0.00918782, 0.6749851 , 0.3858443 ,
       0.82595983, 0.72062635, 0.63630261, 0.13853372, 0.40467435,
       0.03361887, 0.90591611, 0.50676725])
>>>
>>> sparse._formatting_values()._formatter
<bound method SparseArray._formatter of [0.10579380195508936, 0.123467793554345, 0.6692609807557325, 0.631352928127322, 0.5153700914920036, 0.8308475120205755, 0.11707738027292702, 0.9458247148462586, 0.53306677
34689285, 0.7697194280043602, 0.436492370827794, 0.21300667109238547, 0.3653829053917591, 0.7546277896220214, 0.2525123038343048, 0.9720442422636566, 0.62077107713076, 0.7713792114209488, 0.7741606333773862, 0.0
60102611123280636, 0.07582560258774373, 0.8135915546849382, 0.4938310020146096, 0.865178922460138, 0.9994848548516182, 0.16560855997590118, 0.10064932274817606, 0.22341720411731247, 0.3711258583673639, 0.3574535
112638453, 0.03196740969420808, 0.9964803793350895, 0.399041447277493, 0.17162981330873828, 0.985183664900777, 0.8579625879222734, 0.9198066431764218, 0.7811128816049261, 0.009130849663441576, 0.6151387882173937
, 0.6880713270054183, 0.13308308845790984, 0.997389339850816, 0.4177245615919253, 0.3098490139048895, 0.23902976566001577, 0.5229682615072562, 0.9098979745269118, 0.25194293787078936, 0.28347782689329026, 0.7642
620222948321, 0.6813701212402914, 0.009187820750616082, 0.6749850993894757, 0.3858442970367967, 0.8259598335077263, 0.7206263455689199, 0.6363026132729022, 0.1385337245480187, 0.4046743457416371, 0.0336188668302
50727, 0.905916112003116, 0.5067672489799548]
Fill: nan
BlockIndex
Block locations: array([0])
Block lengths: array([63])>
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully understand the example you show.

But in any case, I would argue that such a fix should not be in the base array, but only in those EAs that need it? (but maybe that doesn't make sense, as I don't really understand the fix :-))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the formatting is now recursive. so multiple formatters can be applied.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this may be a better example of recursing until a numpy array returned.

>>> from pandas import period_range, Categorical
>>> idx = period_range('2011-01-01 09:00', freq='H', periods=5)
>>> c = Categorical(idx, ordered=True)
>>> exp = """[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00]
... Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 <
...                     2011-01-01 13:00]"""  # noqa
>>>
>>> c
[2011-01-01 09:00, 2011-01-01 10:00, 2011-01-01 11:00, 2011-01-01 12:00, 2011-01-01 13:00]
Categories (5, period[H]): [2011-01-01 09:00 < 2011-01-01 10:00 < 2011-01-01 11:00 < 2011-01-01 12:00 <
                            2011-01-01 13:00]
>>>
>>>
>>> type(c)
<class 'pandas.core.arrays.categorical.Categorical'>
>>>
>>> c._formatting_values()
PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00',
             '2011-01-01 12:00', '2011-01-01 13:00'],
            dtype='period[H]', freq='H')
>>>
>>> type(c._formatting_values())
<class 'pandas.core.indexes.period.PeriodIndex'>
>>>
>>> c._formatting_values().dtype
period[H]
>>>
>>> type(c._formatter(boxed=True))
<class 'NoneType'>
>>>
>>> c._formatting_values()._formatting_values()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'PeriodIndex' object has no attribute '_formatting_values'
>>>
>>> type(c.get_values())
<class 'pandas.core.indexes.period.PeriodIndex'>
>>>
>>> c.get_values()
PeriodIndex(['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00',
             '2011-01-01 12:00', '2011-01-01 13:00'],
            dtype='period[H]', freq='H')
>>>
>>> c.get_values()._values
<PeriodArray>
['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00',
 '2011-01-01 12:00', '2011-01-01 13:00']
Length: 5, dtype: period[H]
>>>
>>> c.get_values()._values._formatter
<bound method PeriodArray._formatter of <PeriodArray>
['2011-01-01 09:00', '2011-01-01 10:00', '2011-01-01 11:00',
 '2011-01-01 12:00', '2011-01-01 13:00']
Length: 5, dtype: period[H]>
>>>
>>> c.get_values()._values._formatting_values()
array([Period('2011-01-01 09:00', 'H'), Period('2011-01-01 10:00', 'H'),
       Period('2011-01-01 11:00', 'H'), Period('2011-01-01 12:00', 'H'),
       Period('2011-01-01 13:00', 'H')], dtype=object)
>>>

return repr

def _formatting_values(self) -> np.ndarray:
Expand Down
3 changes: 3 additions & 0 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -457,6 +457,9 @@ def _formatter(self, boxed=False):
# Defer to CategoricalFormatter's formatter.
return None

def _formatting_values(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_formatting_values is currently deprecated on EAs, so why was this needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to support the recursive behaviour and remove the special casing.

return self.get_values()

def copy(self):
"""
Copy constructor.
Expand Down
10 changes: 8 additions & 2 deletions pandas/core/arrays/datetimelike.py
Original file line number Diff line number Diff line change
Expand Up @@ -385,8 +385,14 @@ def _format_native_types(self, na_rep='NaT', date_format=None):
raise AbstractMethodError(self)

def _formatter(self, boxed=False):
# TODO: Remove Datetime & DatetimeTZ formatters.
return "'{}'".format
from pandas.io.formats.format import (
_is_dates_only, _get_format_datetime64)
if boxed:
values = self.values.astype(object)
is_dates_only = _is_dates_only(values)
return _get_format_datetime64(is_dates_only)
else:
return "'{}'".format

# ----------------------------------------------------------------
# Array-Like / EA-Interface Methods
Expand Down
2 changes: 1 addition & 1 deletion pandas/core/arrays/timedeltas.py
Original file line number Diff line number Diff line change
Expand Up @@ -363,7 +363,7 @@ def astype(self, dtype, copy=True):

def _formatter(self, boxed=False):
from pandas.io.formats.format import _get_format_timedelta64
return _get_format_timedelta64(self, box=True)
return _get_format_timedelta64(self, box=not boxed)

def _format_native_types(self, na_rep='NaT', date_format=None):
from pandas.io.formats.format import _get_format_timedelta64
Expand Down
11 changes: 6 additions & 5 deletions pandas/core/indexes/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -1061,11 +1061,12 @@ def _format_with_header(self, header, **kwargs):

def _format_native_types(self, na_rep='NaN', quoting=None, **kwargs):
""" actually format my specific types """
from pandas.io.formats.format import ExtensionArrayFormatter
return ExtensionArrayFormatter(values=self,
na_rep=na_rep,
justify='all',
leading_space=False).get_result()
from pandas.io.formats.format import format_array
return format_array(values=self,
formatter=None,
na_rep=na_rep,
justify='all',
leading_space=False)

def _format_data(self, name=None):

Expand Down
8 changes: 0 additions & 8 deletions pandas/core/internals/blocks.py
Original file line number Diff line number Diff line change
Expand Up @@ -1768,16 +1768,8 @@ def _slice(self, slicer):
return self.values[slicer]

def formatting_values(self):
# Deprecating the ability to override _formatting_values.
# Do the warning here, it's only user in pandas, since we
# have to check if the subclass overrode it.
fv = getattr(type(self.values), '_formatting_values', None)
if fv and fv != ExtensionArray._formatting_values:
msg = (
"'ExtensionArray._formatting_values' is deprecated. "
"Specify 'ExtensionArray._formatter' instead."
)
warnings.warn(msg, DeprecationWarning, stacklevel=10)
return self.values._formatting_values()

return self.values
Expand Down
91 changes: 33 additions & 58 deletions pandas/io/formats/format.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,15 @@
from pandas._libs.tslibs import NaT, Timedelta, Timestamp, iNaT

from pandas.core.dtypes.common import (
is_categorical_dtype, is_complex_dtype, is_datetime64_dtype,
is_datetime64tz_dtype, is_extension_array_dtype, is_float, is_float_dtype,
is_integer, is_integer_dtype, is_list_like, is_numeric_dtype, is_scalar,
is_timedelta64_dtype)
from pandas.core.dtypes.generic import (
ABCIndexClass, ABCMultiIndex, ABCSeries, ABCSparseArray)
is_categorical_dtype, is_complex_dtype, is_datetime64_dtype, is_float,
is_float_dtype, is_integer, is_integer_dtype, is_list_like,
is_numeric_dtype, is_scalar, is_timedelta64_dtype)
from pandas.core.dtypes.generic import ABCIndexClass, ABCMultiIndex
from pandas.core.dtypes.missing import isna, notna

from pandas.core.base import PandasObject
import pandas.core.common as com
from pandas.core.index import Index, ensure_index
from pandas.core.index import ensure_index
from pandas.core.indexes.datetimes import DatetimeIndex

from pandas.io.common import _expand_user, _stringify_path
Expand Down Expand Up @@ -248,8 +246,8 @@ def _get_formatted_index(self):
return fmt_index, have_header

def _get_formatted_values(self):
values_to_format = self.tr_series._formatting_values()
return format_array(values_to_format, None,
values = self.tr_series
return format_array(values, formatter=None,
float_format=self.float_format, na_rep=self.na_rep)

def to_string(self):
Expand Down Expand Up @@ -713,10 +711,9 @@ def to_latex(self, column_format=None, longtable=False, encoding=None,
'method')

def _format_col(self, i):
frame = self.tr_frame
values = self.tr_frame.iloc[:, i]
formatter = self._get_formatter(i)
values_to_format = frame.iloc[:, i]._formatting_values()
return format_array(values_to_format, formatter,
return format_array(values, formatter=formatter,
float_format=self.float_format, na_rep=self.na_rep,
space=self.col_space, decimal=self.decimal)

Expand Down Expand Up @@ -883,14 +880,34 @@ def format_array(values, formatter, float_format=None, na_rep='NaN',
List[str]
"""

def _get_formatted_values(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you type & add a doc-string


if isinstance(values, ABCIndexClass):
values = values._values

try:
formatter = values._formatter(boxed=True)
except AttributeError:
formatter = None

def _format_values(values):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a module level function (with types & doc-string), maybe _format_as_array

if formatter is None:
return values
else:
return np.array([formatter(x) for x in values])

try:
values = values._formatting_values()
return _format_values(_get_formatted_values(values))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can the try/except just be around values = values._formatting_values()?

except AttributeError:
return _format_values(values)

values = _get_formatted_values(values)

if is_datetime64_dtype(values.dtype):
fmt_klass = Datetime64Formatter
elif is_datetime64tz_dtype(values):
fmt_klass = Datetime64TZFormatter
elif is_timedelta64_dtype(values.dtype):
fmt_klass = Timedelta64Formatter
elif is_extension_array_dtype(values.dtype):
fmt_klass = ExtensionArrayFormatter
elif is_float_dtype(values.dtype) or is_complex_dtype(values.dtype):
fmt_klass = FloatArrayFormatter
elif is_integer_dtype(values.dtype):
Expand Down Expand Up @@ -970,10 +987,6 @@ def _format(x):
return '{x}'.format(x=formatter(x))

vals = self.values
if isinstance(vals, Index):
vals = vals._values
elif isinstance(vals, ABCSparseArray):
vals = vals.values

is_float_type = lib.map_infer(vals, is_float) & notna(vals)
leading_space = self.leading_space
Expand Down Expand Up @@ -1185,29 +1198,6 @@ def _format_strings(self):
return fmt_values.tolist()


class ExtensionArrayFormatter(GenericArrayFormatter):
def _format_strings(self):
values = self.values
if isinstance(values, (ABCIndexClass, ABCSeries)):
values = values._values

formatter = values._formatter(boxed=True)

if is_categorical_dtype(values.dtype):
# Categorical is special for now, so that we can preserve tzinfo
array = values.get_values()
else:
array = np.asarray(values)

fmt_values = format_array(array,
formatter,
float_format=self.float_format,
na_rep=self.na_rep, digits=self.digits,
space=self.space, justify=self.justify,
leading_space=self.leading_space)
return fmt_values


def format_percentiles(percentiles):
"""
Outputs rounded and formatted percentiles.
Expand Down Expand Up @@ -1330,21 +1320,6 @@ def _get_format_datetime64_from_values(values, date_format):
return date_format


class Datetime64TZFormatter(Datetime64Formatter):

def _format_strings(self):
""" we by definition have a TZ """

values = self.values.astype(object)
is_dates_only = _is_dates_only(values)
formatter = (self.formatter or
_get_format_datetime64(is_dates_only,
date_format=self.date_format))
fmt_values = [formatter(x) for x in values]

return fmt_values


class Timedelta64Formatter(GenericArrayFormatter):

def __init__(self, values, nat_rep='NaT', box=False, **kwargs):
Expand Down
12 changes: 0 additions & 12 deletions pandas/tests/extension/decimal/test_decimal.py
Original file line number Diff line number Diff line change
Expand Up @@ -384,15 +384,3 @@ def test_divmod_array(reverse, expected_div, expected_mod):

tm.assert_extension_array_equal(div, expected_div)
tm.assert_extension_array_equal(mod, expected_mod)


def test_formatting_values_deprecated():
class DecimalArray2(DecimalArray):
def _formatting_values(self):
return np.array(self)

ser = pd.Series(DecimalArray2([decimal.Decimal('1.0')]))

with tm.assert_produces_warning(DeprecationWarning,
check_stacklevel=False):
repr(ser)
8 changes: 4 additions & 4 deletions pandas/tests/frame/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1009,9 +1009,9 @@ def test_astype_str(self):

with option_context('display.max_columns', 20):
result = str(self.tzframe)
assert ('0 2013-01-01 2013-01-01 00:00:00-05:00 '
assert ('0 2013-01-01 2013-01-01 00:00:00-05:00 '
'2013-01-01 00:00:00+01:00') in result
assert ('1 2013-01-02 '
'NaT NaT') in result
assert ('2 2013-01-03 2013-01-03 00:00:00-05:00 '
assert ('1 2013-01-02 '
'NaT NaT') in result
assert ('2 2013-01-03 2013-01-03 00:00:00-05:00 '
'2013-01-03 00:00:00+01:00') in result
12 changes: 6 additions & 6 deletions pandas/tests/frame/test_repr_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -504,12 +504,12 @@ def test_repr_categorical_dates_periods(self):
tz='US/Eastern')
p = period_range('2011-01', freq='M', periods=5)
df = DataFrame({'dt': dt, 'p': p})
exp = """ dt p
0 2011-01-01 09:00:00-05:00 2011-01
1 2011-01-01 10:00:00-05:00 2011-02
2 2011-01-01 11:00:00-05:00 2011-03
3 2011-01-01 12:00:00-05:00 2011-04
4 2011-01-01 13:00:00-05:00 2011-05"""
exp = """ dt p
0 2011-01-01 09:00:00-05:00 2011-01
1 2011-01-01 10:00:00-05:00 2011-02
2 2011-01-01 11:00:00-05:00 2011-03
3 2011-01-01 12:00:00-05:00 2011-04
4 2011-01-01 13:00:00-05:00 2011-05"""

assert repr(df) == exp

Expand Down
12 changes: 6 additions & 6 deletions pandas/tests/indexes/datetimes/test_formats.py
Original file line number Diff line number Diff line change
Expand Up @@ -127,14 +127,14 @@ def test_dti_representation_to_series(self):
"2 2011-01-03\n"
"dtype: datetime64[ns]")

exp5 = ("0 2011-01-01 09:00:00+09:00\n"
"1 2011-01-01 10:00:00+09:00\n"
"2 2011-01-01 11:00:00+09:00\n"
exp5 = ("0 2011-01-01 09:00:00+09:00\n"
"1 2011-01-01 10:00:00+09:00\n"
"2 2011-01-01 11:00:00+09:00\n"
"dtype: datetime64[ns, Asia/Tokyo]")

exp6 = ("0 2011-01-01 09:00:00-05:00\n"
"1 2011-01-01 10:00:00-05:00\n"
"2 NaT\n"
exp6 = ("0 2011-01-01 09:00:00-05:00\n"
"1 2011-01-01 10:00:00-05:00\n"
"2 NaT\n"
"dtype: datetime64[ns, US/Eastern]")

exp7 = ("0 2011-01-01 09:00:00\n"
Expand Down
24 changes: 12 additions & 12 deletions pandas/tests/io/formats/test_format.py
Original file line number Diff line number Diff line change
Expand Up @@ -872,25 +872,25 @@ def test_datetimelike_frame(self):
df = pd.DataFrame({"dt": dts,
"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
with option_context('display.max_rows', 5):
expected = (' dt x\n'
'0 2011-01-01 00:00:00-05:00 1\n'
'1 2011-01-01 00:00:00-05:00 2\n'
'.. ... ..\n'
'8 NaT 9\n'
'9 NaT 10\n\n'
expected = (' dt x\n'
'0 2011-01-01 00:00:00-05:00 1\n'
'1 2011-01-01 00:00:00-05:00 2\n'
'.. ... ..\n'
'8 NaT 9\n'
'9 NaT 10\n\n'
'[10 rows x 2 columns]')
assert repr(df) == expected

dts = [pd.NaT] * 5 + [pd.Timestamp('2011-01-01', tz='US/Eastern')] * 5
df = pd.DataFrame({"dt": dts,
"x": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]})
with option_context('display.max_rows', 5):
expected = (' dt x\n'
'0 NaT 1\n'
'1 NaT 2\n'
'.. ... ..\n'
'8 2011-01-01 00:00:00-05:00 9\n'
'9 2011-01-01 00:00:00-05:00 10\n\n'
expected = (' dt x\n'
'0 NaT 1\n'
'1 NaT 2\n'
'.. ... ..\n'
'8 2011-01-01 00:00:00-05:00 9\n'
'9 2011-01-01 00:00:00-05:00 10\n\n'
'[10 rows x 2 columns]')
assert repr(df) == expected

Expand Down
Loading