Skip to content

Commit d1b9134

Browse files
TomAugspurgerPingviinituutti
authored andcommitted
Add default repr for EAs (pandas-dev#23601)
1 parent 373fa14 commit d1b9134

26 files changed

+316
-173
lines changed

doc/source/whatsnew/v0.24.0.rst

+3
Original file line numberDiff line numberDiff line change
@@ -1002,6 +1002,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
10021002
- :meth:`DataFrame.stack` no longer converts to object dtype for DataFrames where each column has the same extension dtype. The output Series will have the same dtype as the columns (:issue:`23077`).
10031003
- :meth:`Series.unstack` and :meth:`DataFrame.unstack` no longer convert extension arrays to object-dtype ndarrays. Each column in the output ``DataFrame`` will now have the same dtype as the input (:issue:`23077`).
10041004
- Bug when grouping :meth:`Dataframe.groupby()` and aggregating on ``ExtensionArray`` it was not returning the actual ``ExtensionArray`` dtype (:issue:`23227`).
1005+
- A default repr for :class:`ExtensionArray` is now provided (:issue:`23601`).
10051006

10061007
.. _whatsnew_0240.api.incompatibilities:
10071008

@@ -1117,6 +1118,7 @@ Deprecations
11171118
- The methods :meth:`Series.str.partition` and :meth:`Series.str.rpartition` have deprecated the ``pat`` keyword in favor of ``sep`` (:issue:`22676`)
11181119
- Deprecated the `nthreads` keyword of :func:`pandas.read_feather` in favor of
11191120
`use_threads` to reflect the changes in pyarrow 0.11.0. (:issue:`23053`)
1121+
- :meth:`ExtensionArray._formatting_values` is deprecated. Use :attr:`ExtensionArray._formatter` instead. (:issue:`23601`)
11201122
- :func:`pandas.read_excel` has deprecated accepting ``usecols`` as an integer. Please pass in a list of ints from 0 to ``usecols`` inclusive instead (:issue:`23527`)
11211123
- Constructing a :class:`TimedeltaIndex` from data with ``datetime64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23539`)
11221124
- Constructing a :class:`DatetimeIndex` from data with ``timedelta64``-dtyped data is deprecated, will raise ``TypeError`` in a future version (:issue:`23675`)
@@ -1284,6 +1286,7 @@ Datetimelike
12841286
- Bug in rounding methods of :class:`DatetimeIndex` (:meth:`~DatetimeIndex.round`, :meth:`~DatetimeIndex.ceil`, :meth:`~DatetimeIndex.floor`) and :class:`Timestamp` (:meth:`~Timestamp.round`, :meth:`~Timestamp.ceil`, :meth:`~Timestamp.floor`) could give rise to loss of precision (:issue:`22591`)
12851287
- Bug in :func:`to_datetime` with an :class:`Index` argument that would drop the ``name`` from the result (:issue:`21697`)
12861288
- Bug in :class:`PeriodIndex` where adding or subtracting a :class:`timedelta` or :class:`Tick` object produced incorrect results (:issue:`22988`)
1289+
- Bug in the :class:`Series` repr with period-dtype data missing a space before the data (:issue:`23601`)
12871290
- Bug in :func:`date_range` when decrementing a start date to a past end date by a negative frequency (:issue:`23270`)
12881291
- Bug in :meth:`Series.min` which would return ``NaN`` instead of ``NaT`` when called on a series of ``NaT`` (:issue:`23282`)
12891292
- Bug in :func:`DataFrame.combine` with datetimelike values raising a TypeError (:issue:`23079`)

pandas/core/arrays/base.py

+59-4
Original file line numberDiff line numberDiff line change
@@ -47,10 +47,12 @@ class ExtensionArray(object):
4747
* copy
4848
* _concat_same_type
4949
50-
An additional method is available to satisfy pandas' internal,
51-
private block API.
50+
A default repr displaying the type, (truncated) data, length,
51+
and dtype is provided. It can be customized or replaced by
52+
by overriding:
5253
53-
* _formatting_values
54+
* __repr__ : A default repr for the ExtensionArray.
55+
* _formatter : Print scalars inside a Series or DataFrame.
5456
5557
Some methods require casting the ExtensionArray to an ndarray of Python
5658
objects with ``self.astype(object)``, which may be expensive. When
@@ -676,17 +678,70 @@ def copy(self, deep=False):
676678
raise AbstractMethodError(self)
677679

678680
# ------------------------------------------------------------------------
679-
# Block-related methods
681+
# Printing
680682
# ------------------------------------------------------------------------
683+
def __repr__(self):
684+
from pandas.io.formats.printing import format_object_summary
685+
686+
template = (
687+
u'{class_name}'
688+
u'{data}\n'
689+
u'Length: {length}, dtype: {dtype}'
690+
)
691+
# the short repr has no trailing newline, while the truncated
692+
# repr does. So we include a newline in our template, and strip
693+
# any trailing newlines from format_object_summary
694+
data = format_object_summary(self, self._formatter(),
695+
indent_for_name=False).rstrip(', \n')
696+
class_name = u'<{}>\n'.format(self.__class__.__name__)
697+
return template.format(class_name=class_name, data=data,
698+
length=len(self),
699+
dtype=self.dtype)
700+
701+
def _formatter(self, boxed=False):
702+
# type: (bool) -> Callable[[Any], Optional[str]]
703+
"""Formatting function for scalar values.
704+
705+
This is used in the default '__repr__'. The returned formatting
706+
function receives instances of your scalar type.
707+
708+
Parameters
709+
----------
710+
boxed: bool, default False
711+
An indicated for whether or not your array is being printed
712+
within a Series, DataFrame, or Index (True), or just by
713+
itself (False). This may be useful if you want scalar values
714+
to appear differently within a Series versus on its own (e.g.
715+
quoted or not).
716+
717+
Returns
718+
-------
719+
Callable[[Any], str]
720+
A callable that gets instances of the scalar type and
721+
returns a string. By default, :func:`repr` is used
722+
when ``boxed=False`` and :func:`str` is used when
723+
``boxed=True``.
724+
"""
725+
if boxed:
726+
return str
727+
return repr
681728

682729
def _formatting_values(self):
683730
# type: () -> np.ndarray
684731
# At the moment, this has to be an array since we use result.dtype
685732
"""
686733
An array of values to be printed in, e.g. the Series repr
734+
735+
.. deprecated:: 0.24.0
736+
737+
Use :meth:`ExtensionArray._formatter` instead.
687738
"""
688739
return np.array(self)
689740

741+
# ------------------------------------------------------------------------
742+
# Reshaping
743+
# ------------------------------------------------------------------------
744+
690745
@classmethod
691746
def _concat_same_type(cls, to_concat):
692747
# type: (Sequence[ExtensionArray]) -> ExtensionArray

pandas/core/arrays/categorical.py

+8-3
Original file line numberDiff line numberDiff line change
@@ -500,6 +500,10 @@ def _constructor(self):
500500
def _from_sequence(cls, scalars, dtype=None, copy=False):
501501
return Categorical(scalars, dtype=dtype)
502502

503+
def _formatter(self, boxed=False):
504+
# Defer to CategoricalFormatter's formatter.
505+
return None
506+
503507
def copy(self):
504508
"""
505509
Copy constructor.
@@ -2036,6 +2040,10 @@ def __unicode__(self):
20362040

20372041
return result
20382042

2043+
def __repr__(self):
2044+
# We want PandasObject.__repr__, which dispatches to __unicode__
2045+
return super(ExtensionArray, self).__repr__()
2046+
20392047
def _maybe_coerce_indexer(self, indexer):
20402048
"""
20412049
return an indexer coerced to the codes dtype
@@ -2392,9 +2400,6 @@ def _concat_same_type(self, to_concat):
23922400

23932401
return _concat_categorical(to_concat)
23942402

2395-
def _formatting_values(self):
2396-
return self
2397-
23982403
def isin(self, values):
23992404
"""
24002405
Check whether `values` are contained in Categorical.

pandas/core/arrays/integer.py

+8-27
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
import numpy as np
66

77
from pandas._libs import lib
8-
from pandas.compat import range, set_function_name, string_types, u
8+
from pandas.compat import range, set_function_name, string_types
99
from pandas.util._decorators import cache_readonly
1010

1111
from pandas.core.dtypes.base import ExtensionDtype
@@ -20,9 +20,6 @@
2020
from pandas.core import nanops
2121
from pandas.core.arrays import ExtensionArray, ExtensionOpsMixin
2222

23-
from pandas.io.formats.printing import (
24-
default_pprint, format_object_attrs, format_object_summary)
25-
2623

2724
class _IntegerDtype(ExtensionDtype):
2825
"""
@@ -268,6 +265,13 @@ def _from_sequence(cls, scalars, dtype=None, copy=False):
268265
def _from_factorized(cls, values, original):
269266
return integer_array(values, dtype=original.dtype)
270267

268+
def _formatter(self, boxed=False):
269+
def fmt(x):
270+
if isna(x):
271+
return 'NaN'
272+
return str(x)
273+
return fmt
274+
271275
def __getitem__(self, item):
272276
if is_integer(item):
273277
if self._mask[item]:
@@ -301,10 +305,6 @@ def __iter__(self):
301305
else:
302306
yield self._data[i]
303307

304-
def _formatting_values(self):
305-
# type: () -> np.ndarray
306-
return self._coerce_to_ndarray()
307-
308308
def take(self, indexer, allow_fill=False, fill_value=None):
309309
from pandas.api.extensions import take
310310

@@ -354,25 +354,6 @@ def __setitem__(self, key, value):
354354
def __len__(self):
355355
return len(self._data)
356356

357-
def __repr__(self):
358-
"""
359-
Return a string representation for this object.
360-
361-
Invoked by unicode(df) in py2 only. Yields a Unicode String in both
362-
py2/py3.
363-
"""
364-
klass = self.__class__.__name__
365-
data = format_object_summary(self, default_pprint, False)
366-
attrs = format_object_attrs(self)
367-
space = " "
368-
369-
prepr = (u(",%s") %
370-
space).join(u("%s=%s") % (k, v) for k, v in attrs)
371-
372-
res = u("%s(%s%s)") % (klass, data, prepr)
373-
374-
return res
375-
376357
@property
377358
def nbytes(self):
378359
return self._data.nbytes + self._mask.nbytes

pandas/core/arrays/interval.py

-3
Original file line numberDiff line numberDiff line change
@@ -690,9 +690,6 @@ def copy(self, deep=False):
690690
# TODO: Could skip verify_integrity here.
691691
return type(self).from_arrays(left, right, closed=closed)
692692

693-
def _formatting_values(self):
694-
return np.asarray(self)
695-
696693
def isna(self):
697694
return isna(self.left)
698695

pandas/core/arrays/period.py

+4-7
Original file line numberDiff line numberDiff line change
@@ -341,13 +341,10 @@ def to_timestamp(self, freq=None, how='start'):
341341
# --------------------------------------------------------------------
342342
# Array-like / EA-Interface Methods
343343

344-
def __repr__(self):
345-
return '<{}>\n{}\nLength: {}, dtype: {}'.format(
346-
self.__class__.__name__,
347-
[str(s) for s in self],
348-
len(self),
349-
self.dtype
350-
)
344+
def _formatter(self, boxed=False):
345+
if boxed:
346+
return str
347+
return "'{}'".format
351348

352349
def __setitem__(
353350
self,

pandas/core/arrays/sparse.py

+5
Original file line numberDiff line numberDiff line change
@@ -1746,6 +1746,11 @@ def __unicode__(self):
17461746
fill=printing.pprint_thing(self.fill_value),
17471747
index=printing.pprint_thing(self.sp_index))
17481748

1749+
def _formatter(self, boxed=False):
1750+
# Defer to the formatter from the GenericArrayFormatter calling us.
1751+
# This will infer the correct formatter from the dtype of the values.
1752+
return None
1753+
17491754

17501755
SparseArray._add_arithmetic_ops()
17511756
SparseArray._add_comparison_ops()

pandas/core/indexes/period.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -503,7 +503,7 @@ def __array_wrap__(self, result, context=None):
503503

504504
@property
505505
def _formatter_func(self):
506-
return lambda x: "'%s'" % x
506+
return self.array._formatter(boxed=False)
507507

508508
def asof_locs(self, where, mask):
509509
"""

pandas/core/internals/blocks.py

+14-2
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,7 @@
3333
_isna_compat, array_equivalent, is_null_datelike_scalar, isna, notna)
3434

3535
import pandas.core.algorithms as algos
36-
from pandas.core.arrays import Categorical
36+
from pandas.core.arrays import Categorical, ExtensionArray
3737
from pandas.core.base import PandasObject
3838
import pandas.core.common as com
3939
from pandas.core.indexes.datetimes import DatetimeIndex
@@ -1915,7 +1915,19 @@ def _slice(self, slicer):
19151915
return self.values[slicer]
19161916

19171917
def formatting_values(self):
1918-
return self.values._formatting_values()
1918+
# Deprecating the ability to override _formatting_values.
1919+
# Do the warning here, it's only user in pandas, since we
1920+
# have to check if the subclass overrode it.
1921+
fv = getattr(type(self.values), '_formatting_values', None)
1922+
if fv and fv != ExtensionArray._formatting_values:
1923+
msg = (
1924+
"'ExtensionArray._formatting_values' is deprecated. "
1925+
"Specify 'ExtensionArray._formatter' instead."
1926+
)
1927+
warnings.warn(msg, DeprecationWarning, stacklevel=10)
1928+
return self.values._formatting_values()
1929+
1930+
return self.values
19191931

19201932
def concat_same_type(self, to_concat, placement=None):
19211933
"""

pandas/io/formats/format.py

+23-44
Original file line numberDiff line numberDiff line change
@@ -16,11 +16,12 @@
1616
from pandas.compat import StringIO, lzip, map, u, zip
1717

1818
from pandas.core.dtypes.common import (
19-
is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype, is_float,
20-
is_float_dtype, is_integer, is_integer_dtype, is_interval_dtype,
21-
is_list_like, is_numeric_dtype, is_period_arraylike, is_scalar,
19+
is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype,
20+
is_extension_array_dtype, is_float, is_float_dtype, is_integer,
21+
is_integer_dtype, is_list_like, is_numeric_dtype, is_scalar,
2222
is_timedelta64_dtype)
23-
from pandas.core.dtypes.generic import ABCMultiIndex, ABCSparseArray
23+
from pandas.core.dtypes.generic import (
24+
ABCIndexClass, ABCMultiIndex, ABCSeries, ABCSparseArray)
2425
from pandas.core.dtypes.missing import isna, notna
2526

2627
from pandas import compat
@@ -29,7 +30,6 @@
2930
from pandas.core.config import get_option, set_option
3031
from pandas.core.index import Index, ensure_index
3132
from pandas.core.indexes.datetimes import DatetimeIndex
32-
from pandas.core.indexes.period import PeriodIndex
3333

3434
from pandas.io.common import _expand_user, _stringify_path
3535
from pandas.io.formats.printing import adjoin, justify, pprint_thing
@@ -842,22 +842,18 @@ def _get_column_name_list(self):
842842
def format_array(values, formatter, float_format=None, na_rep='NaN',
843843
digits=None, space=None, justify='right', decimal='.'):
844844

845-
if is_categorical_dtype(values):
846-
fmt_klass = CategoricalArrayFormatter
847-
elif is_interval_dtype(values):
848-
fmt_klass = IntervalArrayFormatter
845+
if is_datetime64_dtype(values.dtype):
846+
fmt_klass = Datetime64Formatter
847+
elif is_timedelta64_dtype(values.dtype):
848+
fmt_klass = Timedelta64Formatter
849+
elif is_extension_array_dtype(values.dtype):
850+
fmt_klass = ExtensionArrayFormatter
849851
elif is_float_dtype(values.dtype):
850852
fmt_klass = FloatArrayFormatter
851-
elif is_period_arraylike(values):
852-
fmt_klass = PeriodArrayFormatter
853853
elif is_integer_dtype(values.dtype):
854854
fmt_klass = IntArrayFormatter
855855
elif is_datetime64tz_dtype(values):
856856
fmt_klass = Datetime64TZFormatter
857-
elif is_datetime64_dtype(values.dtype):
858-
fmt_klass = Datetime64Formatter
859-
elif is_timedelta64_dtype(values.dtype):
860-
fmt_klass = Timedelta64Formatter
861857
else:
862858
fmt_klass = GenericArrayFormatter
863859

@@ -1121,39 +1117,22 @@ def _format_strings(self):
11211117
return fmt_values.tolist()
11221118

11231119

1124-
class IntervalArrayFormatter(GenericArrayFormatter):
1125-
1126-
def __init__(self, values, *args, **kwargs):
1127-
GenericArrayFormatter.__init__(self, values, *args, **kwargs)
1128-
1129-
def _format_strings(self):
1130-
formatter = self.formatter or str
1131-
fmt_values = np.array([formatter(x) for x in self.values])
1132-
return fmt_values
1133-
1134-
1135-
class PeriodArrayFormatter(IntArrayFormatter):
1136-
1120+
class ExtensionArrayFormatter(GenericArrayFormatter):
11371121
def _format_strings(self):
1138-
from pandas.core.indexes.period import IncompatibleFrequency
1139-
try:
1140-
values = PeriodIndex(self.values).to_native_types()
1141-
except IncompatibleFrequency:
1142-
# periods may contains different freq
1143-
values = Index(self.values, dtype='object').to_native_types()
1144-
1145-
formatter = self.formatter or (lambda x: '{x}'.format(x=x))
1146-
fmt_values = [formatter(x) for x in values]
1147-
return fmt_values
1148-
1122+
values = self.values
1123+
if isinstance(values, (ABCIndexClass, ABCSeries)):
1124+
values = values._values
11491125

1150-
class CategoricalArrayFormatter(GenericArrayFormatter):
1126+
formatter = values._formatter(boxed=True)
11511127

1152-
def __init__(self, values, *args, **kwargs):
1153-
GenericArrayFormatter.__init__(self, values, *args, **kwargs)
1128+
if is_categorical_dtype(values.dtype):
1129+
# Categorical is special for now, so that we can preserve tzinfo
1130+
array = values.get_values()
1131+
else:
1132+
array = np.asarray(values)
11541133

1155-
def _format_strings(self):
1156-
fmt_values = format_array(self.values.get_values(), self.formatter,
1134+
fmt_values = format_array(array,
1135+
formatter,
11571136
float_format=self.float_format,
11581137
na_rep=self.na_rep, digits=self.digits,
11591138
space=self.space, justify=self.justify)

0 commit comments

Comments
 (0)