-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Fix repr of DataFrame with IntervalIndex #24134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix repr of DataFrame with IntervalIndex #24134
Conversation
Hello @jorisvandenbossche! Thanks for submitting the PR.
|
Codecov Report
@@ Coverage Diff @@
## master #24134 +/- ##
=======================================
Coverage 92.2% 92.2%
=======================================
Files 162 162
Lines 51701 51701
=======================================
Hits 47671 47671
Misses 4030 4030
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #24134 +/- ##
==========================================
+ Coverage 92.2% 92.22% +0.01%
==========================================
Files 162 162
Lines 51700 51769 +69
==========================================
+ Hits 47669 47742 +73
+ Misses 4031 4027 -4
Continue to review full report at Codecov.
|
I'll look into this later today. It seems that we're now hitting the else in pandas/pandas/io/formats/format.py Lines 933 to 939 in c911151
|
Updated to pass through a |
from pandas.io.formats.format import ExtensionArrayFormatter | ||
return ExtensionArrayFormatter(values=self, | ||
na_rep=na_rep, | ||
justify='all', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this new arg? just change the output tests, which are incorrect
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In 0.23.4, we didn't have the leading space for indexes.
In [2]: df = pd.Series(1, index=pd.IntervalIndex.from_breaks([1, 2, 3, 4])).to_frame()
In [3]: df
Out[3]:
0
(1, 2] 1
(2, 3] 1
(3, 4] 1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wasn’t this the same issue you recently adjusted for DTi? this keywords just promote inconsistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No I don't think so.
AFAICT, this is happing because IntervalIndex
and Series[Interval]
are now both using GenericArrayFormatter
to format the values. Series need a leading space, but indexes don't. So I think things should be more consistent. If you want I can remove the keyword and go back to the old implementation which just did the formatting on its own, but I suspect you don't want that :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok i guess
@@ -936,7 +941,13 @@ def _format(x): | |||
elif is_float_type[i]: | |||
fmt_values.append(float_format(v)) | |||
else: | |||
fmt_values.append(u' {v}'.format(v=_format(v))) | |||
if leading_space is False: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is so hacky, we should not be doing this
@@ -927,7 +930,9 @@ def _format(x): | |||
vals = vals.values | |||
|
|||
is_float_type = lib.map_infer(vals, is_float) & notna(vals) | |||
leading_space = is_float_type.any() | |||
leading_space = self.leading_space |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you document when this is set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
I don't really know the formatting code, but would it make sense to move this "adding a leading space or not" to the code that actually combines the list of strings into the actual repr? As it is only when concatting them with ',' in the middle for the Index repr that you need those spaces? |
I'm not too familiar with it either, but that seems sensible to me. It'd be
nice to focus the array formatter on formatting
the values, and a higher-level formatter for joining together those
formatted arrays.
…On Mon, Dec 10, 2018 at 6:02 PM Joris Van den Bossche < ***@***.***> wrote:
this is happing because IntervalIndex and Series[Interval] are now both
using GenericArrayFormatter to format the values. Series need a leading
space, but indexes don't.
I don't really know the formatting code, but would it make sense to move
this "adding a leading space or not" to the code that actually combines the
list of strings into the actual repr? As it is only when concatting them
with ',' in the middle for the Index repr that you need those spaces?
(just wondering, not for this PR, but might be an idea to clean this up)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#24134 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHIrYCCgEJd4vD1B22GuQBggTQqy6xks5u3vYhgaJpZM4ZHaSi>
.
|
ok, merging this though @TomAugspurger can you open an issue on #24134 (comment) which sounds good. |
@TomAugspurger after the repr PR, the docs build catched an error: the repr of a DataFrame with an IntervalIndex started failing:
What is in this PR "fixes" the immediate error, but, I see a difference with what was before:
On 0.23.4:
On master:
So there is now an extra space. So which means also the DataFrame repr starts with a space.
(still need to add tests, and it might this also breaks existing tests due to the whitespace change)