Skip to content

Add default repr for EAs #23601

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 54 commits into from
Dec 4, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
0fdbfd3
wip
TomAugspurger Nov 9, 2018
ace62aa
Deprecate formatting_values
TomAugspurger Nov 9, 2018
6e76b51
test for warning
TomAugspurger Nov 9, 2018
fef04e6
compat
TomAugspurger Nov 9, 2018
1885a97
na formatter
TomAugspurger Nov 9, 2018
ecfcd72
clean
TomAugspurger Nov 9, 2018
4e0d91f
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 9, 2018
37638cc
wip
TomAugspurger Nov 9, 2018
6e64b7b
more cleanup
TomAugspurger Nov 9, 2018
193747e
update docs, type
TomAugspurger Nov 9, 2018
5a2e1e4
format
TomAugspurger Nov 9, 2018
1635b73
try this
TomAugspurger Nov 9, 2018
e2b1941
updates
TomAugspurger Nov 9, 2018
48e55cc
fixup interval
TomAugspurger Nov 10, 2018
d8e7ba4
py2 compat
TomAugspurger Nov 10, 2018
b312fe4
revert interval
TomAugspurger Nov 10, 2018
445736d
unicode, bytes
TomAugspurger Nov 10, 2018
60e0d02
isort
TomAugspurger Nov 10, 2018
5b07906
py3 fixup
TomAugspurger Nov 10, 2018
ff0c998
fixup
TomAugspurger Nov 10, 2018
2fd3d5d
unicode
TomAugspurger Nov 10, 2018
5d8d2fc
unicode
TomAugspurger Nov 10, 2018
baee6b2
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 10, 2018
4d343ea
unicode
TomAugspurger Nov 10, 2018
5b291d5
lint
TomAugspurger Nov 10, 2018
1b93bf0
update repr tests
TomAugspurger Nov 11, 2018
708dd75
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 12, 2018
0f4083e
remove periodarray
TomAugspurger Nov 12, 2018
9116930
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 12, 2018
ebadf6f
FutureWarning -> DeprecationWarning
TomAugspurger Nov 12, 2018
e5f6976
wip
TomAugspurger Nov 12, 2018
221cee9
use repr
TomAugspurger Nov 12, 2018
439f2f8
fixup! use repr
TomAugspurger Nov 12, 2018
2364546
fixup! fixup! use repr
TomAugspurger Nov 12, 2018
62b1e2f
remove bytes
TomAugspurger Nov 12, 2018
a926dca
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 14, 2018
fc4279d
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 15, 2018
27db397
simplify formatter
TomAugspurger Nov 15, 2018
5c253a4
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 19, 2018
ef390fc
Updates: misc
TomAugspurger Nov 19, 2018
2b5fe25
BUG: Fixed SparseArray formatter
TomAugspurger Nov 19, 2018
d84cc02
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 20, 2018
d9df6bf
correct boxing
TomAugspurger Nov 20, 2018
a35399e
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 20, 2018
740f9e5
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 28, 2018
e7cc2ac
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Nov 28, 2018
c79ba0b
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 2, 2018
3825aeb
Use Array formatter in PeriodIndex
TomAugspurger Dec 2, 2018
2a60c15
Use repr / str
TomAugspurger Dec 2, 2018
bccf40d
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
a7ef104
Update for review
TomAugspurger Dec 3, 2018
a3b1c92
REF: removed trailing_comma argument
TomAugspurger Dec 3, 2018
e080023
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
6ad113b
Merge remote-tracking branch 'upstream/master' into ea-repr
TomAugspurger Dec 3, 2018
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -857,6 +857,7 @@ update the ``ExtensionDtype._metadata`` tuple to match the signature of your
- :meth:`DataFrame.stack` no longer converts to object dtype for DataFrames where each column has the same extension dtype. The output Series will have the same dtype as the columns (:issue:`23077`).
- :meth:`Series.unstack` and :meth:`DataFrame.unstack` no longer convert extension arrays to object-dtype ndarrays. Each column in the output ``DataFrame`` will now have the same dtype as the input (:issue:`23077`).
- Bug when grouping :meth:`Dataframe.groupby()` and aggregating on ``ExtensionArray`` it was not returning the actual ``ExtensionArray`` dtype (:issue:`23227`).
- A default repr is now provided.

.. _whatsnew_0240.api.incompatibilities:

Expand Down
40 changes: 39 additions & 1 deletion pandas/core/arrays/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,13 @@ class ExtensionArray(object):

* _formatting_values

A default repr displaying the type, (truncated) data, length,
and dtype is provided. It can be customized or replaced by
by overriding:

* _formatter
* __repr__

Some methods require casting the ExtensionArray to an ndarray of Python
objects with ``self.astype(object)``, which may be expensive. When
performance is a concern, we highly recommend overriding the following
Expand Down Expand Up @@ -653,15 +660,46 @@ def copy(self, deep=False):
raise AbstractMethodError(self)

# ------------------------------------------------------------------------
# Block-related methods
# Printing
# ------------------------------------------------------------------------
def __repr__(self):
from pandas.io.formats.printing import format_object_summary

template = (
'<{class_name}>\n'
'{data}\n'
'Length: {length}, dtype: {dtype}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to define the “unicode
we do this in Base for all pandas objects

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are writing new code here but this should be consistent as well (it’s ok to change that too)
but to have a completely different impl is odd

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed... I left the implementation in repr, and then encoded / decoded as needed in __unicdoe__ and __bytes__ if that's OK.

)
# the short repr has no trailing newline, while the truncated
# repr does. So we include a newline in our template, and strip
# any trailing newlines from format_object_summary
data = format_object_summary(self, self._formatter, name=False,
trailing_comma=False).rstrip('\n')
name = self.__class__.__name__
return template.format(class_name=name, data=data,
length=len(self),
dtype=self.dtype)

@property
def _formatter(self):
# type: () -> Callable[Any]
"""Formatting function for scalar values.

This is used in the default '__repr__'. The formatting function
receives instances of your scalar type.
"""
return str

def _formatting_values(self):
# type: () -> np.ndarray
# At the moment, this has to be an array since we use result.dtype
"""An array of values to be printed in, e.g. the Series repr"""
return np.array(self)

# ------------------------------------------------------------------------
# Reshaping
# ------------------------------------------------------------------------

@classmethod
def _concat_same_type(cls, to_concat):
# type: (Sequence[ExtensionArray]) -> ExtensionArray
Expand Down
2 changes: 2 additions & 0 deletions pandas/core/arrays/categorical.py
Original file line number Diff line number Diff line change
Expand Up @@ -1986,6 +1986,8 @@ def __unicode__(self):

return result

__repr__ = __unicode__

def _maybe_coerce_indexer(self, indexer):
""" return an indexer coerced to the codes dtype """
if isinstance(indexer, np.ndarray) and indexer.dtype.kind == 'i':
Expand Down
24 changes: 1 addition & 23 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from pandas._libs import lib
from pandas.util._decorators import cache_readonly
from pandas.compat import u, range, string_types
from pandas.compat import range, string_types
from pandas.compat import set_function_name

from pandas.core import nanops
Expand All @@ -24,9 +24,6 @@
from pandas.core.dtypes.dtypes import register_extension_dtype
from pandas.core.dtypes.missing import isna, notna

from pandas.io.formats.printing import (
format_object_summary, format_object_attrs, default_pprint)


class _IntegerDtype(ExtensionDtype):
"""
Expand Down Expand Up @@ -353,25 +350,6 @@ def __setitem__(self, key, value):
def __len__(self):
return len(self._data)

def __repr__(self):
"""
Return a string representation for this object.

Invoked by unicode(df) in py2 only. Yields a Unicode String in both
py2/py3.
"""
klass = self.__class__.__name__
data = format_object_summary(self, default_pprint, False)
attrs = format_object_attrs(self)
space = " "

prepr = (u(",%s") %
space).join(u("%s=%s") % (k, v) for k, v in attrs)

res = u("%s(%s%s)") % (klass, data, prepr)

return res

@property
def nbytes(self):
return self._data.nbytes + self._mask.nbytes
Expand Down
10 changes: 0 additions & 10 deletions pandas/core/arrays/interval.py
Original file line number Diff line number Diff line change
Expand Up @@ -835,16 +835,6 @@ def _format_data(self):

return summary

def __repr__(self):
tpl = textwrap.dedent("""\
{cls}({data},
{lead}closed='{closed}',
{lead}dtype='{dtype}')""")
return tpl.format(cls=self.__class__.__name__,
data=self._format_data(),
lead=' ' * len(self.__class__.__name__) + ' ',
closed=self.closed, dtype=self.dtype)

def _format_space(self):
space = ' ' * (len(self.__class__.__name__) + 1)
return "\n{space}".format(space=space)
Expand Down
8 changes: 0 additions & 8 deletions pandas/core/arrays/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -330,14 +330,6 @@ def start_time(self):
def end_time(self):
return self.to_timestamp(how='end')

def __repr__(self):
return '<{}>\n{}\nLength: {}, dtype: {}'.format(
self.__class__.__name__,
[str(s) for s in self],
len(self),
self.dtype
)

def __setitem__(
self,
key, # type: Union[int, Sequence[int], Sequence[bool]]
Expand Down
37 changes: 29 additions & 8 deletions pandas/io/formats/printing.py
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,9 @@ class TableSchemaFormatter(BaseFormatter):
max_seq_items=max_seq_items)


def format_object_summary(obj, formatter, is_justify=True, name=None):
def format_object_summary(obj, formatter, is_justify=True, name=None,
trailing_comma=True,
truncated_trailing_newline=True):
"""
Return the formatted obj as a unicode string

Expand All @@ -283,9 +285,14 @@ def format_object_summary(obj, formatter, is_justify=True, name=None):
string formatter for an element
is_justify : boolean
should justify the display
name : name, optiona
name : name, optional
defaults to the class name of the obj

Pass ``False`` to indicate that subsequent lines should
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what calls this with False? IOW what has a name that u don’t want to print

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is needed to re-use for both Index and EA-style formatters. It disables indentation on subsequent lines.

print(pd.io.formats.printing.format_object_summary(arr, str))
[2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01,
             2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,
             2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01,
             2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01],

vs.

In [6]: print(pd.io.formats.printing.format_object_summary(arr, str, name=False))
[2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,
 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,
 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,
 2000-01-01, 2001-01-01],

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can this be another parameter then? it seems like it is used for 2 purposes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Calling it indent_for_name.

not be indented to align with the name.
trailing_comma : bool, default True
Whether to include a comma after the closing ']'

Returns
-------
summary string
Expand All @@ -300,8 +307,13 @@ def format_object_summary(obj, formatter, is_justify=True, name=None):
if name is None:
name = obj.__class__.__name__

space1 = "\n%s" % (' ' * (len(name) + 1))
space2 = "\n%s" % (' ' * (len(name) + 2))
if name is False:
space1 = "\n"
space2 = "\n " # space for the opening '['
else:
name_len = len(name)
space1 = "\n%s" % (' ' * (name_len + 1))
space2 = "\n%s" % (' ' * (name_len + 2))

n = len(obj)
sep = ','
Expand All @@ -328,15 +340,20 @@ def best_len(values):
else:
return 0

if trailing_comma:
close = ', '
else:
close = ''

if n == 0:
summary = '[], '
summary = '[]{}'.format(close)
elif n == 1:
first = formatter(obj[0])
summary = '[%s], ' % first
summary = '[{}]{}'.format(first, close)
elif n == 2:
first = formatter(obj[0])
last = formatter(obj[-1])
summary = '[%s, %s], ' % (first, last)
summary = '[{}, {}]{}'.format(first, last, close)
else:

if n > max_seq_items:
Expand Down Expand Up @@ -381,7 +398,11 @@ def best_len(values):
summary, line = _extend_line(summary, line, tail[-1],
display_width - 2, space2)
summary += line
summary += '],'

# right now close is either '' or ', '
# Now we want to include the ']', but not the maybe space.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without it, you'd get something like

In [2]: pd.core.arrays.period_array(['2000', '2001'], freq='D')
Out[2]:
<PeriodArray>
['2000-01-01', '2001-01-01'], 
Length: 2, dtype: period[D]

(notice the trailing coma after the ending ] in the data section.

That's what we want for index classes:

In [3]: pd.Index([1, 2, 3])
Out[3]: Int64Index([1, 2, 3], dtype='int64')

but not for EAs since we don't know if they're valid code:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this is a very strange repr
where is this based off of? why not use parens?
it’s also multi line

this is reinventing the wheel again compared to what we do for index; and somewhat arbitrary

repr is very important for consistency and this is off

i would avoid the special casing and have it look a whole lot more like what we have now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also none of this looks tested (meaning the special casing and so on)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the valid code argument does not work here either

we already have not valid code in repr eg MultiIndex and IntervalIndex

it’s very hard to guarantee this

but using parens and no angle brackets and commas between items would be a major improvement

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also none of this looks tested (meaning the special casing and so on)

What special casing? I have 100% coverage for the diff when running it on the base repr tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the very fact that you need a special option is the strange part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why's that? Don't we want to reuse the common formatting code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so another difference this is highliting is that EA have the attributes on another line, while the Index does not (as they are args).

close = ']' + close.rstrip(' ')
summary += close

if len(summary) > (display_width):
summary += space1
Expand Down
32 changes: 25 additions & 7 deletions pandas/tests/arrays/interval/test_interval.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# -*- coding: utf-8 -*-
import numpy as np
import pytest

from pandas import Index, IntervalIndex, date_range, timedelta_range
from pandas import Index, date_range, option_context, timedelta_range
from pandas.core.arrays import IntervalArray
import pandas.util.testing as tm
import pytest


@pytest.fixture(params=[
Expand Down Expand Up @@ -65,8 +65,26 @@ def test_set_na(self, left_right_dtypes):
tm.assert_extension_array_equal(result, expected)


def test_repr_matches():
idx = IntervalIndex.from_breaks([1, 2, 3])
a = repr(idx)
b = repr(idx.values)
assert a.replace("Index", "Array") == b
def test_repr_small():
arr = IntervalArray.from_breaks([1, 2, 3])
result = repr(arr)
expected = (
'<IntervalArray>\n'
'[(1, 2], (2, 3]]\n'
'Length: 2, dtype: interval[int64]'
)
assert result == expected


def test_repr_large():
arr = IntervalArray.from_breaks([1, 2, 3, 4, 5, 6])
with option_context('display.max_seq_items', 2):
result = repr(arr)
expected = (
'<IntervalArray>\n'
'[(1, 2],\n'
' ...\n'
' (5, 6]] \n'
'Length: 5, dtype: interval[int64]'
)
assert result == expected
29 changes: 14 additions & 15 deletions pandas/tests/arrays/test_integer.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# -*- coding: utf-8 -*-
import numpy as np
import pytest

from pandas.core.dtypes.generic import ABCIndexClass

Expand All @@ -12,6 +11,7 @@
UInt32Dtype, UInt64Dtype)
from pandas.tests.extension.base import BaseOpsUtil
import pandas.util.testing as tm
import pytest


def make_data():
Expand Down Expand Up @@ -57,24 +57,23 @@ def test_dtypes(dtype):
assert dtype.name is not None


class TestInterface(object):
def test_repr_array(data):
result = repr(data)
assert '<IntegerArray>' in result

def test_repr_array(self, data):
result = repr(data)
# not long
assert '...' not in result
assert 'Length: ' in result
assert 'dtype: ' in result

# not long
assert '...' not in result

assert 'dtype=' in result
assert 'IntegerArray' in result

def test_repr_array_long(self, data):
# some arrays may be able to assert a ... in the repr
with pd.option_context('display.max_seq_items', 1):
result = repr(data)
def test_repr_array_long(data):
# some arrays may be able to assert a ... in the repr
with pd.option_context('display.max_seq_items', 1):
result = repr(data)

assert '...' in result
assert 'length' in result
assert '...' in result
assert 'Length' in result


class TestConstructors(object):
Expand Down
33 changes: 32 additions & 1 deletion pandas/tests/arrays/test_period.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
import numpy as np
import pytest

from pandas._libs.tslibs import iNaT
from pandas._libs.tslibs.period import IncompatibleFrequency
Expand All @@ -10,6 +9,7 @@
import pandas as pd
from pandas.core.arrays import PeriodArray, period_array
import pandas.util.testing as tm
import pytest

# ----------------------------------------------------------------------------
# Constructors
Expand Down Expand Up @@ -195,3 +195,34 @@ def tet_sub_period():
other = pd.Period("2000", freq="M")
with tm.assert_raises_regex(IncompatibleFrequency, "freq"):
arr - other


# ----------------------------------------------------------------------------
# Printing

def test_repr_small():
arr = period_array(['2000', '2001'], freq='D')
result = str(arr)
expected = (
'<PeriodArray>\n'
'[2000-01-01, 2001-01-01]\n'
'Length: 2, dtype: period[D]'
)
assert result == expected


def test_repr_large():
arr = period_array(['2000', '2001'] * 500, freq='D')
result = str(arr)
expected = (
'<PeriodArray>\n'
'[2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, '
'2001-01-01,\n' # continuation
' 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01,\n'
' ...\n'
' 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01, 2000-01-01, '
'2001-01-01,\n' # continuation
' 2000-01-01, 2001-01-01, 2000-01-01, 2001-01-01]\n'
'Length: 1000, dtype: period[D]'
)
assert result == expected
1 change: 1 addition & 0 deletions pandas/tests/extension/base/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ class TestMyDtype(BaseDtypeTests):
from .interface import BaseInterfaceTests # noqa
from .methods import BaseMethodsTests # noqa
from .ops import BaseArithmeticOpsTests, BaseComparisonOpsTests, BaseOpsUtil # noqa
from .printing import BasePrintingTests # noqa
from .reduce import BaseNoReduceTests, BaseNumericReduceTests, BaseBooleanReduceTests # noqa
from .missing import BaseMissingTests # noqa
from .reshaping import BaseReshapingTests # noqa
Expand Down
Loading