-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Index repr changes to make them consistent #9901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Looks like a great idea to me! Would be nice to squeeze name into repr for datetimeindex, too. |
5a79e96
to
9ff163e
Compare
I think adding the name is certainly OK! However, I don't know if I like the stacking and indenting under each other. For short indices, this makes your output in the console a lot longer. If we change the Index repr in such a way, I would do at once a more thourough clean-up, and also look if we can/want to make the DatetimeIndex more consistent (and I would then do this for 0.17 and not in a minor release) |
Ah, I see that you updated the initial message :-) (my response was based on the mail I got) So you also reformatted the datetime-like indices. I think that is a good idea, but to repeat above: maybe we should leave this for 0.17? And I would also ping the mailing list. And maybe we should also think about if we want a difference between repr and str. Small remark on the actual output: for DatetimeIndex, the dates should also be quoted I think to have this evalable (like it is for TimedeltaIndex and objext Index) |
I do agree with @jorisvandenbossche that it's nice to have the output on one line than spread over many. As for larger unification of index repr, this is also a great idea! Given that this is unlikely to break any existing code, I think it would probably be OK in an point release (especially given how loosely pandas does semantic versioning already -- people already expect new features and changes in point releases). My vote would be to include more than the first and last items in string output for all indexes. Perhaps the first 30 and last 30 items, like |
ok, this was pretty trivial (and now the short format works for everything).
so will print short format if its a small sequence, and long format otherwise (trivial to be either one)
This is not really much of a change per-se as a) you normally don't print indexes themselves that often. and its just a display fix (and really a bug fix at that). |
starting to like these (I changed it to always print on the same line). MultiIndex (and CategoricalIndex) are the odd one out here, but they have quite a bit more info to communicate
|
ok, updated the top of the PR with new/previous behavior. This unifies all Index repr (including MultiIndex, though that is unchanged and multi-line). I switched back to a single line, with the The number to display in the short-repr is 2 on the head and 2 on the tail and is hard coded, but I suppose could be 'computed', e.g. in theory you could show say more integers that you can datetimes, but will leave that for later. |
OK, I would probably default to showing 3 items rather than 2, but this is a step in the right direction. As long as we're changing this, let's make sure that all index types can be run through eval, at least if they aren't truncated for length. So I think that means the |
I would vote for on a single line. The terminal wraps it anyways, and otherwise you can get strange effects like:
But should check how this in a notebook. Some other feedback:
|
By the way, as this directly touches the output users see (and that can be delicate), I would certainly also bring this to the mailing list. Plus, just having more eyes looking at this as the three of us would be good. I can do it tomorrow, if it is not yet done by then. |
we could have an argument say I just put If you want to put on the mailing list ok by me. I don't really think this is that big of a deal. The basic formats are pretty much unchanged and its just cleaning up things. (the current PR is for 1 line btw). |
top-section is updated |
241d2bf
to
66f5e12
Compare
I updated the top of the PR to show the repr of |
Wouldn't we keep the new |
@jorisvandenbossche you mean 1 line? I think things get lost, e.g.
Which IMHO is more informative, but it would be by-definition multi-line repr and |
@jorisvandenbossche your proposal looks like adding a line feed after the 'data' portion, then putting all the rest on the next line, yes? |
going once, going twice........ |
OK I'll test this out right now... |
I'm not a big fan of the way truncating looks with multiple lines right now. For example:
My vote would probably be for showing not quite as many elements in the truncated version (3?) and putting everything on line (at least the array elements). Closer to the one of the earlier versions:
I know this is arguably less consistent with the logic for displaying series and frames, but I think it looks better. |
But to be clear, I would not hold up the release for my objections. Any of these changes are better than what we have now. |
|
@shoyer @jorisvandenbossche
|
b784464
to
2746c13
Compare
ok, updated on the top of the PR. here are basically the rules:
The last point is a slight deviation as it then doesn't justify say integers for a categorical (but make the code a bit simpler, and though it looked a bit better - though my categorical examples of 1000 categories makes it look worse...oh well). |
move tests to generically tests for index generify __unicode__ for Index adjust index display to max_seq_items
increase limits for max_seq_items & printing for Index add extended repr for datetimelike indexes fix tseries/test_base for repr adjust docs for repr-name use new format_data on all Index types
@jreback While on the train I made an alternative implementation to allow an unequal number of elements on one row, based on this branch (jorisvandenbossche@8ff7998). It is partly based on the array2string function of numpy. The advantage is that the number of elements on all rows don't have to be equal, it just fills the row up to display width. As a consequence, this (all 3 elements on each row because of one larger string):
now looks as:
which looks a bit better in my opinion (but you can give more extreme examples where the difference is larger). For the rest, everything looks exactly the same as before (the latest update of this PR). The only difference is that the justifying or not (depending on string/categorical or not) is now the same for truncated and non-truncated (as I do this with the same code), while in the latest update you made it not justify for non-truncated ones (but I think doing it the same is a bit more consistent). |
ok let me incorporate this and I'll update the top section |
Conflicts: pandas/tseries/base.py use new format_data updates Fix detection of good width more fixes Change [ Conflicts: pandas/core/index.py more fixes revsised according to comments
Inspired by numpy's array2string
@jorisvandenbossche @shoyer ok, updated with joris commit. I had to change around slightly the code to avoid justification in certain cases, e.g. here's you don't want to justify otherwise the NaT would have a very long string (basically if its going to be 1 line and its non-truncated, then you don't justify)
|
Ah, yes, the case of 1 line not needing justification, I forgot, thanks! There are some more details we could improve, but I think this is looking good for now! For later (follow-up PR), I was thinking of:
|
@jorisvandenbossche yes, I was trying to avoid the wrap-around with a small number of elements, in fact thats why the example used 100 for display.width; if its too short you get that kind of effect. ok, bombs aways. And will add a follow-up issue. |
Index repr changes to make them consistent
With this PR, makes repr of all Index types consistent.
closes #6482
closes #6295
replaces #9897
Previous Behavior
New Behavior
Note that
MultiIndex
are multi-line repr and do no truncate sequences (of e.g. labels), this is consistent with previous versions. (easy to change this though)