Fix repr of DataFrame with IntervalIndex #24134

jorisvandenbossche · 2018-12-06T22:10:22Z

@TomAugspurger after the repr PR, the docs build catched an error: the repr of a DataFrame with an IntervalIndex started failing:

In [1]: df = pd.DataFrame({'A': [1, 2, 3, 4]}, 
   ...:                   index=pd.IntervalIndex.from_breaks([0, 1, 2, 3, 4]))                                                                                                  

In [2]: df                                                                                                                                                                      
Out[2]: ---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/core/formatters.py in __call__(self, obj)
    700                 type_pprinters=self.type_printers,
    701                 deferred_pprinters=self.deferred_printers)
--> 702             printer.pretty(obj)
    703             printer.flush()
    704             return stream.getvalue()

~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/lib/pretty.py in pretty(self, obj)
    400                         if cls is not object \
    401                                 and callable(cls.__dict__.get('__repr__')):
--> 402                             return _repr_pprint(obj, self, cycle)
    403 
    404             return _default_pprint(obj, self, cycle)

~/miniconda3/envs/dev/lib/python3.5/site-packages/IPython/lib/pretty.py in _repr_pprint(obj, p, cycle)
    695     """A pprint that just redirects to the normal repr function."""
    696     # Find newlines and replace them with p.break_()
--> 697     output = repr(obj)
    698     for idx,output_line in enumerate(output.splitlines()):
    699         if idx:

~/scipy/pandas/pandas/core/base.py in __repr__(self)
     75         Yields Bytestring in Py2, Unicode String in py3.
     76         """
---> 77         return str(self)
     78 
     79 

~/scipy/pandas/pandas/core/base.py in __str__(self)
     54 
     55         if compat.PY3:
---> 56             return self.__unicode__()
     57         return self.__bytes__()
     58 

~/scipy/pandas/pandas/core/frame.py in __unicode__(self)
    626             width = None
    627         self.to_string(buf=buf, max_rows=max_rows, max_cols=max_cols,
--> 628                        line_width=width, show_dimensions=show_dimensions)
    629 
    630         return buf.getvalue()

~/scipy/pandas/pandas/core/frame.py in to_string(self, buf, columns, col_space, header, index, na_rep, formatters, float_format, sparsify, index_names, justify, max_rows, max_cols, show_dimensions, decimal, line_width)
    707                                            decimal=decimal,
    708                                            line_width=line_width)
--> 709         formatter.to_string()
    710 
    711         if buf is None:

~/scipy/pandas/pandas/io/formats/format.py in to_string(self)
    601         else:
    602 
--> 603             strcols = self._to_str_columns()
    604             if self.line_width is None:  # no need to wrap around just print
    605                 # the whole frame

~/scipy/pandas/pandas/io/formats/format.py in _to_str_columns(self)
    510         # may include levels names also
    511 
--> 512         str_index = self._get_formatted_index(frame)
    513 
    514         if not is_list_like(self.header) and not self.header:

~/scipy/pandas/pandas/io/formats/format.py in _get_formatted_index(self, frame)
    807                                      names=show_index_names, formatter=fmt)
    808         else:
--> 809             fmt_index = [index.format(name=show_index_names, formatter=fmt)]
    810         fmt_index = [tuple(_make_fixed_width(list(x), justify='left',
    811                                              minimum=(self.col_space or 0),

~/scipy/pandas/pandas/core/indexes/base.py in format(self, name, formatter, **kwargs)
    993             return header + list(self.map(formatter))
    994 
--> 995         return self._format_with_header(header, **kwargs)
    996 
    997     def _format_with_header(self, header, na_rep='NaN', **kwargs):

~/scipy/pandas/pandas/core/indexes/interval.py in _format_with_header(self, header, **kwargs)
   1012 
   1013     def _format_with_header(self, header, **kwargs):
-> 1014         return header + list(self._format_native_types(**kwargs))
   1015 
   1016     def _format_native_types(self, na_rep='', quoting=None, **kwargs):

~/scipy/pandas/pandas/core/indexes/interval.py in _format_native_types(self, na_rep, quoting, **kwargs)
   1016     def _format_native_types(self, na_rep='', quoting=None, **kwargs):
   1017         """ actually format my specific types """
-> 1018         from pandas.io.formats.format import IntervalArrayFormatter
   1019         return IntervalArrayFormatter(values=self,
   1020                                       na_rep=na_rep,

ImportError: cannot import name 'IntervalArrayFormatter'

What is in this PR "fixes" the immediate error, but, I see a difference with what was before:

On 0.23.4:

In [2]: i = pd.io.formats.format.IntervalArrayFormatter(pd.interval_range(1, 5))

In [3]: i.get_result()
Out[3]: ['(1, 2]', '(2, 3]', '(3, 4]', '(4, 5]']

On master:

In [22]: i = pd.io.formats.format.ExtensionArrayFormatter(pd.interval_range(1, 5))                                                                                              

In [23]: i.get_result()                                                                                                                                                         
Out[23]: [' (1, 2]', ' (2, 3]', ' (3, 4]', ' (4, 5]']

So there is now an extra space. So which means also the DataFrame repr starts with a space.

(still need to add tests, and it might this also breaks existing tests due to the whitespace change)

pep8speaks · 2018-12-06T22:10:24Z

Hello @jorisvandenbossche! Thanks for submitting the PR.

There are no PEP8 issues in the file pandas/core/indexes/interval.py !

codecov · 2018-12-06T22:47:02Z

Codecov Report

Merging #24134 into master will not change coverage.
The diff coverage is 0%.

@@           Coverage Diff           @@
##           master   #24134   +/-   ##
=======================================
  Coverage    92.2%    92.2%           
=======================================
  Files         162      162           
  Lines       51701    51701           
=======================================
  Hits        47671    47671           
  Misses       4030     4030

Flag	Coverage Δ
#multiple	`90.6% <0%> (ø)`	⬆️
#single	`43.02% <0%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/core/indexes/interval.py	`94.73% <0%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 03134cb...ec88f66. Read the comment docs.

codecov · 2018-12-06T22:47:03Z

Codecov Report

Merging #24134 into master will increase coverage by 0.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #24134      +/-   ##
==========================================
+ Coverage    92.2%   92.22%   +0.01%     
==========================================
  Files         162      162              
  Lines       51700    51769      +69     
==========================================
+ Hits        47669    47742      +73     
+ Misses       4031     4027       -4

Flag	Coverage Δ
#multiple	`90.62% <100%> (+0.01%)`	⬆️
#single	`43.01% <70%> (-0.02%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/formats/format.py	`97.97% <100%> (+0.01%)`	⬆️
pandas/core/indexes/interval.py	`95.26% <100%> (+0.52%)`	⬆️
pandas/core/arrays/categorical.py	`95.3% <0%> (-0.1%)`	⬇️
pandas/core/frame.py	`96.91% <0%> (ø)`	⬆️
pandas/core/indexes/base.py	`96.27% <0%> (ø)`	⬆️
pandas/core/indexes/category.py	`97.9% <0%> (ø)`	⬆️
pandas/core/arrays/datetimes.py	`98.56% <0%> (ø)`	⬆️
pandas/core/generic.py	`96.65% <0%> (ø)`	⬆️
pandas/io/pytables.py	`92.31% <0%> (ø)`	⬆️
pandas/core/internals/managers.py	`95.93% <0%> (+0.01%)`	⬆️
... and 6 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c0b9eea...f95ae76. Read the comment docs.

TomAugspurger · 2018-12-08T13:37:54Z

I'll look into this later today.

It seems that we're now hitting the else in

pandas/pandas/io/formats/format.py

Lines 933 to 939 in c911151

    
           for i, v in enumerate(vals): 
        
               if not is_float_type[i] and leading_space: 
        
                   fmt_values.append(u' {v}'.format(v=_format(v))) 
        
               elif is_float_type[i]: 
        
                   fmt_values.append(float_format(v)) 
        
               else: 
        
                   fmt_values.append(u' {v}'.format(v=_format(v)))

.

…e-interval-df-repr

TomAugspurger · 2018-12-08T20:49:40Z

Updated to pass through a leading_space argument and tests for the formatters that we removed.

jreback · 2018-12-09T14:11:35Z

pandas/core/indexes/interval.py

+        from pandas.io.formats.format import ExtensionArrayFormatter
+        return ExtensionArrayFormatter(values=self,
+                                       na_rep=na_rep,
+                                       justify='all',


why do we need this new arg? just change the output tests, which are incorrect

In 0.23.4, we didn't have the leading space for indexes.

In [2]: df = pd.Series(1, index=pd.IntervalIndex.from_breaks([1, 2, 3, 4])).to_frame() In [3]: df Out[3]: 0 (1, 2] 1 (2, 3] 1 (3, 4] 1

wasn’t this the same issue you recently adjusted for DTi? this keywords just promote inconsistency

No I don't think so.

AFAICT, this is happing because IntervalIndex and Series[Interval] are now both using GenericArrayFormatter to format the values. Series need a leading space, but indexes don't. So I think things should be more consistent. If you want I can remove the keyword and go back to the old implementation which just did the formatting on its own, but I suspect you don't want that :)

jreback · 2018-12-09T14:11:53Z

pandas/io/formats/format.py

@@ -936,7 +941,13 @@ def _format(x):
            elif is_float_type[i]:
                fmt_values.append(float_format(v))
            else:
-                fmt_values.append(u' {v}'.format(v=_format(v)))
+                if leading_space is False:


this is so hacky, we should not be doing this

jreback · 2018-12-10T17:57:17Z

pandas/io/formats/format.py

@@ -927,7 +930,9 @@ def _format(x):
            vals = vals.values

        is_float_type = lib.map_infer(vals, is_float) & notna(vals)
-        leading_space = is_float_type.any()
+        leading_space = self.leading_space


can you document when this is set.

jreback

see comments

jorisvandenbossche · 2018-12-11T00:02:36Z

this is happing because IntervalIndex and Series[Interval] are now both using GenericArrayFormatter to format the values. Series need a leading space, but indexes don't.

I don't really know the formatting code, but would it make sense to move this "adding a leading space or not" to the code that actually combines the list of strings into the actual repr? As it is only when concatting them with ',' in the middle for the Index repr that you need those spaces?
(just wondering, not for this PR, but might be an idea to clean this up)

TomAugspurger · 2018-12-11T02:55:15Z

I'm not too familiar with it either, but that seems sensible to me. It'd be nice to focus the array formatter on formatting the values, and a higher-level formatter for joining together those formatted arrays.

…

On Mon, Dec 10, 2018 at 6:02 PM Joris Van den Bossche < ***@***.***> wrote: this is happing because IntervalIndex and Series[Interval] are now both using GenericArrayFormatter to format the values. Series need a leading space, but indexes don't. I don't really know the formatting code, but would it make sense to move this "adding a leading space or not" to the code that actually combines the list of strings into the actual repr? As it is only when concatting them with ',' in the middle for the Index repr that you need those spaces? (just wondering, not for this PR, but might be an idea to clean this up) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#24134 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABQHIrYCCgEJd4vD1B22GuQBggTQqy6xks5u3vYhgaJpZM4ZHaSi> .

jreback · 2018-12-13T00:50:54Z

ok, merging this though @TomAugspurger can you open an issue on #24134 (comment) which sounds good.

Fix repr of DataFrame with interval values

ec88f66

jorisvandenbossche added Bug Output-Formatting __repr__ of pandas objects, to_string Blocker Blocking issue or pull request for an upcoming release Interval Interval data type labels Dec 6, 2018

jorisvandenbossche added this to the 0.24.0 milestone Dec 6, 2018

TomAugspurger added 2 commits December 8, 2018 14:32

Merge remote-tracking branch 'upstream/master' into jorisvandenbossch…

137cba7

…e-interval-df-repr

Pass through leading space argument

5b0479f

jreback requested changes Dec 9, 2018

View reviewed changes

TomAugspurger mentioned this pull request Dec 10, 2018

DOC: Fixed implicit imports for whatsnew (v >= version 20.0) #24199

Merged

4 tasks

jreback reviewed Dec 10, 2018

View reviewed changes

jreback requested changes Dec 10, 2018

View reviewed changes

update

f95ae76

jreback approved these changes Dec 13, 2018

View reviewed changes

jreback merged commit 33ca356 into pandas-dev:master Dec 13, 2018

jorisvandenbossche deleted the interval-df-repr branch December 16, 2018 07:54

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Fix repr of DataFrame with IntervalIndex (pandas-dev#24134)

c13fcd1

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

Fix repr of DataFrame with IntervalIndex (pandas-dev#24134)

c2f3284

simonjayhawkins mentioned this pull request Apr 5, 2019

to_html formatter not called for float values in a mixed-type column (2) #26000

Closed

4 tasks

Uh oh!

Fix repr of DataFrame with IntervalIndex #24134

Fix repr of DataFrame with IntervalIndex #24134

Uh oh!

Conversation

jorisvandenbossche commented Dec 6, 2018

Uh oh!

pep8speaks commented Dec 6, 2018

Uh oh!

codecov bot commented Dec 6, 2018

Codecov Report

Uh oh!

codecov bot commented Dec 6, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

TomAugspurger commented Dec 8, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TomAugspurger commented Dec 8, 2018

Uh oh!

jreback Dec 9, 2018

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Dec 9, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Dec 9, 2018

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Dec 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Dec 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Dec 9, 2018

Choose a reason for hiding this comment

Uh oh!

jreback Dec 10, 2018

Choose a reason for hiding this comment

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented Dec 11, 2018

Uh oh!

TomAugspurger commented Dec 11, 2018 via email

Uh oh!

jreback commented Dec 13, 2018

Uh oh!

Uh oh!

codecov bot commented Dec 6, 2018 •

edited

Loading

TomAugspurger commented Dec 8, 2018 •

edited

Loading