Index repr changes to make them consistent #9901

jreback · 2015-04-14T18:51:17Z

With this PR, makes repr of all Index types consistent.
closes #6482
closes #6295
replaces #9897

Previous Behavior

In [1]: pd.get_option('max_seq_items')
Out[1]: 100

In [2]: pd.Index(range(4),name='foo')
Out[2]: Int64Index([0, 1, 2, 3], dtype='int64')

In [3]: pd.Index(range(104),name='foo')
Out[3]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, ...], dtype='int64')

In [4]: pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern')
Out[4]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00-05:00, ..., 2013-01-04 00:00:00-05:00]
Length: 4, Freq: D, Timezone: US/Eastern

In [5]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
Out[5]: 
<class 'pandas.tseries.index.DatetimeIndex'>
[2013-01-01 00:00:00-05:00, ..., 2013-04-14 00:00:00-04:00]
Length: 104, Freq: D, Timezone: US/Eastern

New Behavior


# this is here just to have a consisten display here, it will normally get the value from the console width
In [1]:   pd.set_option('display.width',100)

In [2]:    pd.Index(range(4),name='foo')
Out[2]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')

In [3]:    pd.Index(range(25),name='foo')
Out[3]: 
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23,
            24],
           dtype='int64', name=u'foo')

In [4]:    pd.Index(range(104),name='foo')
Out[4]: 
Int64Index([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9, 
            ...
             94,  95,  96,  97,  98,  99, 100, 101, 102, 103],
           dtype='int64', name=u'foo', length=104)

In [5]:    pd.Index(['datetime', 'sA', 'sB', 'sC', 'flow', 'error', 'temp', 'ref', 'a_bit_a_longer_one']*2)
Out[5]: 
Index([u'datetime', u'sA', u'sB', u'sC', u'flow', u'error', u'temp', u'ref', u'a_bit_a_longer_one',
       u'datetime', u'sA', u'sB', u'sC', u'flow', u'error', u'temp', u'ref',
       u'a_bit_a_longer_one'],
      dtype='object')

In [6]:    pd.CategoricalIndex(['a','bb','ccc','dddd'],ordered=True,name='foobar')
Out[6]: CategoricalIndex([u'a', u'bb', u'ccc', u'dddd'], categories=[u'a', u'bb', u'ccc', u'dddd'], ordered=True, name=u'foobar', dtype='category')

In [7]:    pd.CategoricalIndex(['a','bb','ccc','dddd']*10,ordered=True,name='foobar')
Out[7]: 
CategoricalIndex([u'a', u'bb', u'ccc', u'dddd', u'a', u'bb', u'ccc', u'dddd', u'a', u'bb', u'ccc',
                  u'dddd', u'a', u'bb', u'ccc', u'dddd', u'a', u'bb', u'ccc', u'dddd', u'a', u'bb',
                  u'ccc', u'dddd', u'a', u'bb', u'ccc', u'dddd', u'a', u'bb', u'ccc', u'dddd',
                  u'a', u'bb', u'ccc', u'dddd', u'a', u'bb', u'ccc', u'dddd'],
                 categories=[u'a', u'bb', u'ccc', u'dddd'], ordered=True, name=u'foobar', dtype='category')

In [8]:    pd.CategoricalIndex(['a','bb','ccc','dddd']*100,ordered=True,name='foobar')
Out[8]: 
CategoricalIndex([u'a', u'bb', u'ccc', u'dddd', u'a', u'bb', u'ccc', u'dddd', u'a', u'bb', 
                  ...
                  u'ccc', u'dddd', u'a', u'bb', u'ccc', u'dddd', u'a', u'bb', u'ccc', u'dddd'],
                 categories=[u'a', u'bb', u'ccc', u'dddd'], ordered=True, name=u'foobar', dtype='category', length=400)

In [9]:    pd.CategoricalIndex(np.arange(1000),ordered=True,name='foobar')
Out[9]: 
CategoricalIndex([  0,   1,   2,   3,   4,   5,   6,   7,   8,   9, 
                  ...
                  990, 991, 992, 993, 994, 995, 996, 997, 998, 999],
                 categories=[0, 1, 2, 3, 4, 5, 6, 7, ...], ordered=True, name=u'foobar', dtype='category', length=1000)
In [10]:    pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern')
Out[10]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
               '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'],
              dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')

In [11]:    pd.date_range('20130101',periods=25,name='foo',tz='US/Eastern')
Out[11]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
               '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00',
               '2013-01-05 00:00:00-05:00', '2013-01-06 00:00:00-05:00',
               '2013-01-07 00:00:00-05:00', '2013-01-08 00:00:00-05:00',
               '2013-01-09 00:00:00-05:00', '2013-01-10 00:00:00-05:00',
               '2013-01-11 00:00:00-05:00', '2013-01-12 00:00:00-05:00',
               '2013-01-13 00:00:00-05:00', '2013-01-14 00:00:00-05:00',
               '2013-01-15 00:00:00-05:00', '2013-01-16 00:00:00-05:00',
               '2013-01-17 00:00:00-05:00', '2013-01-18 00:00:00-05:00',
               '2013-01-19 00:00:00-05:00', '2013-01-20 00:00:00-05:00',
               '2013-01-21 00:00:00-05:00', '2013-01-22 00:00:00-05:00',
               '2013-01-23 00:00:00-05:00', '2013-01-24 00:00:00-05:00',
               '2013-01-25 00:00:00-05:00'],
              dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')

In [12]:    pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
Out[12]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
               '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00',
               '2013-01-05 00:00:00-05:00', '2013-01-06 00:00:00-05:00',
               '2013-01-07 00:00:00-05:00', '2013-01-08 00:00:00-05:00',
               '2013-01-09 00:00:00-05:00', '2013-01-10 00:00:00-05:00', 
               ...
               '2013-04-05 00:00:00-04:00', '2013-04-06 00:00:00-04:00',
               '2013-04-07 00:00:00-04:00', '2013-04-08 00:00:00-04:00',
               '2013-04-09 00:00:00-04:00', '2013-04-10 00:00:00-04:00',
               '2013-04-11 00:00:00-04:00', '2013-04-12 00:00:00-04:00',
               '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'],
              dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')

Note that MultiIndex are multi-line repr and do no truncate sequences (of e.g. labels), this is consistent with previous versions. (easy to change this though)

In [1]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second'])
Out[1]: 
MultiIndex(levels=[[u'a', u'b', u'c', u'd', u'e', u'f', u'g'], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           labels=[[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 6], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]],
           names=[u'first', u'second'])

jreback · 2015-04-14T18:52:00Z

@jorisvandenbossche
@shoyer

cc @hsperr

shoyer · 2015-04-14T19:06:07Z

Looks like a great idea to me! Would be nice to squeeze name into repr for datetimeindex, too.

jorisvandenbossche · 2015-04-15T10:21:47Z

I think adding the name is certainly OK!

However, I don't know if I like the stacking and indenting under each other. For short indices, this makes your output in the console a lot longer.

If we change the Index repr in such a way, I would do at once a more thourough clean-up, and also look if we can/want to make the DatetimeIndex more consistent (and I would then do this for 0.17 and not in a minor release)

jorisvandenbossche · 2015-04-15T10:25:10Z

Ah, I see that you updated the initial message :-) (my response was based on the mail I got)

So you also reformatted the datetime-like indices. I think that is a good idea, but to repeat above: maybe we should leave this for 0.17? And I would also ping the mailing list.

And maybe we should also think about if we want a difference between repr and str.

Small remark on the actual output: for DatetimeIndex, the dates should also be quoted I think to have this evalable (like it is for TimedeltaIndex and objext Index)

shoyer · 2015-04-15T17:16:02Z

I do agree with @jorisvandenbossche that it's nice to have the output on one line than spread over many.

As for larger unification of index repr, this is also a great idea! Given that this is unlikely to break any existing code, I think it would probably be OK in an point release (especially given how loosely pandas does semantic versioning already -- people already expect new features and changes in point releases).

My vote would be to include more than the first and last items in string output for all indexes. Perhaps the first 30 and last 30 items, like Series? I'd be happy even with just 5/5. This is actually a bit of a pet-peeve of mine with the old DatetimeIndex repr -- I would often end up converting the index to a series just to see more than 2 values.

jreback · 2015-04-15T22:34:34Z

ok, this was pretty trivial (and now the short format works for everything).

max_seq_items is for this purpose, but I think need to make much smaller (its a default of 100 now), or maybe adjust a bit, e.g. for integers its ok, but for datetimes its way too big.

so will print short format if its a small sequence, and long format otherwise (trivial to be either one)

In [1]: pd.set_option('max_seq_items',8)

In [2]: pd.date_range('20130101',freq='s',periods=2,tz='US/Eastern',name='foo')
Out[2]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-01 00:00:01-05:00'], name=u'foo', dtype='datetime64[ns]', length=2, freq='S', tz='US/Eastern')

In [3]: pd.date_range('20130101',freq='s',periods=4,tz='US/Eastern',name='foo')
Out[3]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-01 00:00:01-05:00', '2013-01-01 00:00:02-05:00', '2013-01-01 00:00:03-05:00'], name=u'foo', dtype='datetime64[ns]', length=4, freq='S', tz='US/Eastern')

In [4]: pd.date_range('20130101',freq='s',periods=6,tz='US/Eastern',name='foo')
Out[4]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-01 00:00:01-05:00', '2013-01-01 00:00:02-05:00', '2013-01-01 00:00:03-05:00', '2013-01-01 00:00:04-05:00', '2013-01-01 00:00:05-05:00'], name=u'foo', dtype='datetime64[ns]', length=6, freq='S', tz='US/Eastern')

In [5]: pd.date_range('20130101',freq='s',periods=8,tz='US/Eastern',name='foo')
Out[5]: DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-01 00:00:01-05:00', '2013-01-01 00:00:02-05:00', '2013-01-01 00:00:03-05:00', '2013-01-01 00:00:04-05:00', '2013-01-01 00:00:05-05:00', '2013-01-01 00:00:06-05:00', '2013-01-01 00:00:07-05:00'], name=u'foo', dtype='datetime64[ns]', length=8, freq='S', tz='US/Eastern')

In [6]: pd.date_range('20130101',freq='s',periods=10,tz='US/Eastern',name='foo')
Out[6]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-01 00:00:01-05:00', ..., '2013-01-01 00:00:08-05:00', '2013-01-01 00:00:09-05:00'],
              name=u'foo',
              dtype='datetime64[ns]',
              length=10,
              freq='S',
              tz='US/Eastern')

In [9]: Index(range(8),name='foo')
Out[9]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7], name=u'foo', dtype='int64')

In [10]: Index(range(10),name='foo')
Out[10]: 
Int64Index([0, 1, ..., 8, 9],
           name=u'foo',
           dtype='int64')

This is not really much of a change per-se as a) you normally don't print indexes themselves that often. and its just a display fix (and really a bug fix at that).

jreback · 2015-04-15T22:44:58Z

starting to like these (I changed it to always print on the same line). MultiIndex (and CategoricalIndex) are the odd one out here, but they have quite a bit more info to communicate

In [2]: pd.set_option('max_seq_items',8)

In [3]: pd.date_range('20130101',periods=80)
Out[3]: DatetimeIndex(['2013-01-01', '2013-01-02', ..., '2013-03-20', '2013-03-21'], dtype='datetime64[ns]', length=80, freq='D', tz=None)

In [4]: Index(range(8))
Out[4]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')

In [5]: Index(range(10))
Out[5]: Int64Index([0, 1, ..., 8, 9], dtype='int64')

In [6]: Index(range(10),name='foo')
Out[6]: Int64Index([0, 1, ..., 8, 9], name=u'foo', dtype='int64')

jreback · 2015-04-16T12:05:13Z

ok, updated the top of the PR with new/previous behavior.

This unifies all Index repr (including MultiIndex, though that is unchanged and multi-line). I switched back to a single line, with the display.max_seq_items controlling if it does a short or long-format. All datetimelike are quoted (so actually a short-repr works, a long-one won't work).

The number to display in the short-repr is 2 on the head and 2 on the tail and is hard coded, but I suppose could be 'computed', e.g. in theory you could show say more integers that you can datetimes, but will leave that for later.

shoyer · 2015-04-16T16:40:03Z

OK, I would probably default to showing 3 items rather than 2, but this is a step in the right direction.

As long as we're changing this, let's make sure that all index types can be run through eval, at least if they aren't truncated for length. So I think that means the length field needs to go.

jorisvandenbossche · 2015-04-16T19:52:36Z

I would vote for on a single line. The terminal wraps it anyways, and otherwise you can get strange effects like:

In [3]: pd.Index(range(21))
Out[3]:
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19
, 20],
           dtype='int64')

But should check how this in a notebook.

Some other feedback:

not sure that the 'name' should go before dtype. It seems more logical to put optional ones as last.
agree about the length field. I know it is there in the current repr, but this is inconsistent with the others, and also not a real correct kwarg.
it feels strange to me that if you approach max_seq_items you can have a very long output, and then at once if you reach it, only a very short output. It seems more logical that the output is truncated at max_seq_items (eg if max_seq_items is set at 30, show the first 15 and last 15). But that can be an argument to change the default of max_seq_items to a much lower one (or to make this a new option specific for index)

jorisvandenbossche · 2015-04-16T19:54:59Z

By the way, as this directly touches the output users see (and that can be delicate), I would certainly also bring this to the mailing list. Plus, just having more eyes looking at this as the three of us would be good. I can do it tomorrow, if it is not yet done by then.

jreback · 2015-04-16T22:49:54Z

we could have an argument say max_seq_visible to control the head/tail ones. But honestly I would do that later, see if its really needed.

I just put name/dtype (and length only shows up if you are truncating) first. but could easily change that order (you want dtype then name, ok)

If you want to put on the mailing list ok by me. I don't really think this is that big of a deal. The basic formats are pretty much unchanged and its just cleaning up things. (the current PR is for 1 line btw).

jreback · 2015-04-16T22:54:01Z

top-section is updated

jreback · 2015-04-20T12:08:50Z

I updated the top of the PR to show the repr of MultiIndex/CategoricalIndex (unchanged for MultiIndex from previous versions)

jorisvandenbossche · 2015-04-20T12:10:40Z

Wouldn't we keep the new CategoricalIndex consistent with the other Index types instead of to MultiIndex?

jreback · 2015-04-20T12:25:18Z

@jorisvandenbossche you mean 1 line?

I think things get lost, e.g. ordered/name. As you have both codes/categories. Now codes themselves are not really necessary (and simliarly for MultiIndex), e.g. in my example a MultiIndex is really just

In [7]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second']).levels[0]
Out[7]: Index([u'a', u'b', u'c', u'd', u'e', u'f', u'g'], dtype='object', name=u'first')

In [8]: MultiIndex.from_product([list('abcdefg'),range(10)],names=['first','second']).levels[1]
Out[8]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64', name=u'second')

Which IMHO is more informative, but it would be by-definition multi-line repr

and CategoricalIndex is closer to MultiIndex I think.

jreback · 2015-04-28T11:23:30Z

In [12]: pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
Out[12]:
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00', ...,
               '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'],
              dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')

@jorisvandenbossche your proposal looks like adding a line feed after the 'data' portion, then putting all the rest on the next line, yes?

jreback · 2015-05-08T00:17:47Z

going once, going twice........

shoyer · 2015-05-08T00:19:28Z

OK I'll test this out right now...

shoyer · 2015-05-08T04:54:01Z

I'm not a big fan of the way truncating looks with multiple lines right now. For example:

In [29]: pd.Index(['asd', 'zx', 'a'] * 10000)
Out[29]:
Index([u'asd',  u'zx',   u'a', u'asd',  u'zx',   u'a', u'asd',  u'zx',   u'a',
       u'asd',
       ...
         u'a', u'asd',  u'zx',   u'a', u'asd',  u'zx',   u'a', u'asd',  u'zx',
         u'a'],
      dtype='object', length=30000)

In [40]: pd.Index(np.arange(2000) * 100)
Out[40]:
Int64Index([     0,    100,    200,    300,    400,    500,    600,    700,
               800,    900,
            ...
            199000, 199100, 199200, 199300, 199400, 199500, 199600, 199700,
            199800, 199900],
           dtype='int64', length=2000)

My vote would probably be for showing not quite as many elements in the truncated version (3?) and putting everything on line (at least the array elements). Closer to the one of the earlier versions:

In [3]: pd.date_range('20130101',periods=80)
Out[3]: DatetimeIndex(['2013-01-01', '2013-01-02', ..., '2013-03-20', '2013-03-21'], dtype='datetime64[ns]', length=80, freq='D', tz=None)

In [4]: Index(range(8))
Out[4]: Int64Index([0, 1, 2, 3, 4, 5, 6, 7], dtype='int64')

In [5]: Index(range(10))
Out[5]: Int64Index([0, 1, ..., 8, 9], dtype='int64')

In [6]: Index(range(10),name='foo')
Out[6]: Int64Index([0, 1, ..., 8, 9], name=u'foo', dtype='int64')

I know this is arguably less consistent with the logic for displaying series and frames, but I think it looks better.

shoyer · 2015-05-08T04:54:50Z

But to be clear, I would not hold up the release for my objections. Any of these changes are better than what we have now.

jorisvandenbossche · 2015-05-08T07:43:49Z

the truncating: it looks off in this case beacause the length of the strings used are so that there is just only one element on the second line. I think there are some options:
- we could detect that and fill the rest of the line (but, we don't know the length of the following elements in advance, so that may break the aligning
- we could also use not a fixed number of values, but just one full row, the dots, a full row with the last elements. So for datetimes it is then maybe the first and last 2 elements, for integers it can be more?
- Or we could also decide to return max_seq_len number of items, so max_seq_len/2 first and last. Then the line in the middle will not be that prominent
- or the suggestion of @shoyer (this is also how numpy does it)

Something else I realized: this aligning is not really ideal for column names. Example:

In [10]: pd.Index(['datetime', 'sA', 'sB', 'sC', 'flow', 'error', 'temp', 'ref',
'and_a_longer_one'])
Out[10]:
Index([        u'datetime',               u'sA',               u'sB',
                 u'sC',             u'flow',            u'error',
               u'temp',              u'ref', u'and_a_longer_one'],
  dtype='object')

And apparantly, numpy does not align arrays of object dtype (which has some logic, as these have a higher chance of being of different length). Maybe we should follow that?

In [12]: pd.Index(['datetime', 'sA', 'sB', 'sC', 'flow', 'error', 'temp', 'ref',
'and_a_longer_one']*3).values
Out[12]:
array(['datetime', 'sA', 'sB', 'sC', 'flow', 'error', 'temp', 'ref',
       'and_a_longer_one', 'datetime', 'sA', 'sB', 'sC', 'flow', 'error',
       'temp', 'ref', 'and_a_longer_one', 'datetime', 'sA', 'sB', 'sC',
       'flow', 'error', 'temp', 'ref', 'and_a_longer_one'], dtype=object)

jreback · 2015-05-08T13:40:24Z

@shoyer @jorisvandenbossche
slight modification; I print now 2 head and 2 tail on the truncated display; each line is justified itself (but maxlen are the same). Doesn't affect fixed width types like datetimelikes, but integers look better, mixed width strings are somewhat better I think

In [1]:    pd.get_option('max_seq_items')
Out[1]: 100

In [2]:    pd.Index(range(4),name='foo')
Out[2]: Int64Index([0, 1, 2, 3], dtype='int64', name=u'foo')

In [3]:    pd.Index(range(25),name='foo')
Out[3]: 
Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24],
           dtype='int64', name=u'foo')

In [4]:    pd.Index(range(104),name='foo')
Out[4]: 
Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,
            10, 11, 12, 13, 14, 15, 16, 17, 18, 19,
            ...
            84, 85, 86, 87, 88, 89, 90, 91, 92, 93,
            94, 95, 96, 97, 98, 99, 100, 101, 102, 103],
           dtype='int64', name=u'foo', length=104)

In [5]:    pd.CategoricalIndex(['a','bb','ccc','dddd'],ordered=True,name='foobar')
Out[5]: CategoricalIndex([u'a', u'bb', u'ccc', u'dddd'], categories=[u'a', u'bb', u'ccc', u'dddd'], ordered=True, name=u'foobar', dtype='category')

In [6]:    pd.CategoricalIndex(['a','bb','ccc','dddd']*10,ordered=True,name='foobar')
Out[6]: 
CategoricalIndex([   u'a',   u'bb',  u'ccc', u'dddd',    u'a',   u'bb',
                   u'ccc', u'dddd',    u'a',   u'bb',  u'ccc', u'dddd',
                     u'a',   u'bb',  u'ccc', u'dddd',    u'a',   u'bb',
                   u'ccc', u'dddd',    u'a',   u'bb',  u'ccc', u'dddd',
                     u'a',   u'bb',  u'ccc', u'dddd',    u'a',   u'bb',
                   u'ccc', u'dddd',    u'a',   u'bb',  u'ccc', u'dddd',
                     u'a',   u'bb',  u'ccc', u'dddd'],
                 categories=[u'a', u'bb', u'ccc', u'dddd'], ordered=True, name=u'foobar', dtype='category')

In [7]:    pd.CategoricalIndex(['a','bb','ccc','dddd']*100,ordered=True,name='foobar')
Out[7]: 
CategoricalIndex([   u'a',   u'bb',  u'ccc', u'dddd',    u'a',   u'bb',
                   u'ccc', u'dddd',    u'a',   u'bb',
                   u'ccc', u'dddd',    u'a',   u'bb',  u'ccc', u'dddd',
                     u'a',   u'bb',  u'ccc', u'dddd',
                  ...
                     u'a',   u'bb',  u'ccc', u'dddd',    u'a',   u'bb',
                   u'ccc', u'dddd',    u'a',   u'bb',
                   u'ccc', u'dddd',    u'a',   u'bb',  u'ccc', u'dddd',
                     u'a',   u'bb',  u'ccc', u'dddd'],
                 categories=[u'a', u'bb', u'ccc', u'dddd'], ordered=True, name=u'foobar', dtype='category', length=400)

In [8]:    pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern')
Out[8]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
               '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00'],
              dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')

In [9]:    pd.date_range('20130101',periods=25,name='foo',tz='US/Eastern')
Out[9]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
               '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00',
               '2013-01-05 00:00:00-05:00', '2013-01-06 00:00:00-05:00',
               '2013-01-07 00:00:00-05:00', '2013-01-08 00:00:00-05:00',
               '2013-01-09 00:00:00-05:00', '2013-01-10 00:00:00-05:00',
               '2013-01-11 00:00:00-05:00', '2013-01-12 00:00:00-05:00',
               '2013-01-13 00:00:00-05:00', '2013-01-14 00:00:00-05:00',
               '2013-01-15 00:00:00-05:00', '2013-01-16 00:00:00-05:00',
               '2013-01-17 00:00:00-05:00', '2013-01-18 00:00:00-05:00',
               '2013-01-19 00:00:00-05:00', '2013-01-20 00:00:00-05:00',
               '2013-01-21 00:00:00-05:00', '2013-01-22 00:00:00-05:00',
               '2013-01-23 00:00:00-05:00', '2013-01-24 00:00:00-05:00',
               '2013-01-25 00:00:00-05:00'],
              dtype='datetime64[ns]', name=u'foo', freq='D', tz='US/Eastern')

In [10]:    pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
Out[10]: 
DatetimeIndex(['2013-01-01 00:00:00-05:00', '2013-01-02 00:00:00-05:00',
               '2013-01-03 00:00:00-05:00', '2013-01-04 00:00:00-05:00',
               '2013-01-05 00:00:00-05:00', '2013-01-06 00:00:00-05:00',
               '2013-01-07 00:00:00-05:00', '2013-01-08 00:00:00-05:00',
               '2013-01-09 00:00:00-05:00', '2013-01-10 00:00:00-05:00',
               '2013-01-11 00:00:00-05:00', '2013-01-12 00:00:00-05:00',
               '2013-01-13 00:00:00-05:00', '2013-01-14 00:00:00-05:00',
               '2013-01-15 00:00:00-05:00', '2013-01-16 00:00:00-05:00',
               '2013-01-17 00:00:00-05:00', '2013-01-18 00:00:00-05:00',
               '2013-01-19 00:00:00-05:00', '2013-01-20 00:00:00-05:00',
               ...
               '2013-03-26 00:00:00-04:00', '2013-03-27 00:00:00-04:00',
               '2013-03-28 00:00:00-04:00', '2013-03-29 00:00:00-04:00',
               '2013-03-30 00:00:00-04:00', '2013-03-31 00:00:00-04:00',
               '2013-04-01 00:00:00-04:00', '2013-04-02 00:00:00-04:00',
               '2013-04-03 00:00:00-04:00', '2013-04-04 00:00:00-04:00',
               '2013-04-05 00:00:00-04:00', '2013-04-06 00:00:00-04:00',
               '2013-04-07 00:00:00-04:00', '2013-04-08 00:00:00-04:00',
               '2013-04-09 00:00:00-04:00', '2013-04-10 00:00:00-04:00',
               '2013-04-11 00:00:00-04:00', '2013-04-12 00:00:00-04:00',
               '2013-04-13 00:00:00-04:00', '2013-04-14 00:00:00-04:00'],
              dtype='datetime64[ns]', name=u'foo', length=104, freq='D', tz='US/Eastern')

jreback · 2015-05-08T20:22:10Z

ok, updated on the top of the PR. here are basically the rules:

0, 1, 2 values on a single line, no justification
if we are < max_seq_items, wrap the output to make multi-line; no justification (datetimelikes are 'naturally justfiied though, they just line up)
if we are >= max_seq_items, truncate. Justify non (string,categorical), print head/tail lines, these will wrap (e.g. have more than one line) for things that are datetimelike, but for strings, integers, they will not wrap at all (e.g. only 1 line)

The last point is a slight deviation as it then doesn't justify say integers for a categorical (but make the code a bit simpler, and though it looked a bit better - though my categorical examples of 1000 categories makes it look worse...oh well).

move tests to generically tests for index generify __unicode__ for Index adjust index display to max_seq_items

increase limits for max_seq_items & printing for Index add extended repr for datetimelike indexes fix tseries/test_base for repr adjust docs for repr-name use new format_data on all Index types

jorisvandenbossche · 2015-05-09T17:00:10Z

@jreback While on the train I made an alternative implementation to allow an unequal number of elements on one row, based on this branch (jorisvandenbossche@8ff7998). It is partly based on the array2string function of numpy.

The advantage is that the number of elements on all rows don't have to be equal, it just fills the row up to display width. As a consequence, this (all 3 elements on each row because of one larger string):

In [8]: pd.Index(['datetime', 'sA', 'sB', 'sC', 'flow', 'error', 'temp', 'ref', 'a_bit_a_longer_one']*2)
Index([u'datetime', u'sA', u'sB',
       u'sC', u'flow', u'error',
       u'temp', u'ref', u'and_a_longer_one',
       u'datetime', u'sA', u'sB',
       u'sC', u'flow', u'error',
       u'temp', u'ref', u'and_a_longer_one'],
      dtype='object')

now looks as:

In [5]: pd.Index(['datetime', 'sA', 'sB', 'sC', 'flow', 'error', 'temp', 'ref','a_bit_a_longer_one']*2)
Out[5]:
Index([u'datetime', u'sA', u'sB', u'sC', u'flow', u'error', u'temp', u'ref',
       u'a_bit_a_longer_one', u'datetime', u'sA', u'sB', u'sC', u'flow',
       u'error', u'temp', u'ref', u'a_bit_a_longer_one'],
      dtype='object')

which looks a bit better in my opinion (but you can give more extreme examples where the difference is larger).

For the rest, everything looks exactly the same as before (the latest update of this PR). The only difference is that the justifying or not (depending on string/categorical or not) is now the same for truncated and non-truncated (as I do this with the same code), while in the latest update you made it not justify for non-truncated ones (but I think doing it the same is a bit more consistent).

jreback · 2015-05-09T17:02:23Z

ok let me incorporate this and I'll update the top section

Conflicts: pandas/tseries/base.py use new format_data updates Fix detection of good width more fixes Change [ Conflicts: pandas/core/index.py more fixes revsised according to comments

Inspired by numpy's array2string

jreback · 2015-05-09T18:30:55Z

@jorisvandenbossche @shoyer ok, updated with joris commit. I had to change around slightly the code to avoid justification in certain cases, e.g. here's you don't want to justify otherwise the NaT would have a very long string (basically if its going to be 1 line and its non-truncated, then you don't justify)

In [13]: PeriodIndex(['2011-01-01 09:00', '2012-02-01 10:00', 'NaT'], freq='H')
Out[13]: PeriodIndex(['2011-01-01 09:00', '2012-02-01 10:00', 'NaT'], dtype='int64', freq='H')

jorisvandenbossche · 2015-05-09T21:47:14Z

Ah, yes, the case of 1 line not needing justification, I forgot, thanks!

There are some more details we could improve, but I think this is looking good for now!

For later (follow-up PR), I was thinking of:

also splitting the attributes part over multiple lines if this is very long
for truncated output: now hard-code at 10 elements, but this could maybe also be 1 line? (and possibly with a minimum number of elements for longer strings) Because now, small integers don't fill the full line with 10 elements, and eg floats there is one element on the second line:
```
In [7]: pd.Index(np.arange(400.))
Out[7]:
Float64Index([  0.0,   1.0,   2.0,   3.0,   4.0,   5.0,   6.0,   7.0,   8.0,
                9.0,
              ...
              390.0, 391.0, 392.0, 393.0, 394.0, 395.0, 396.0, 397.0, 398.0,
              399.0],
             dtype='float64', length=400)
```
This could eg easily be restricted to two times one line.

jreback · 2015-05-09T21:51:57Z

@jorisvandenbossche yes, I was trying to avoid the wrap-around with a small number of elements, in fact thats why the example used 100 for display.width; if its too short you get that kind of effect.

ok, bombs aways. And will add a follow-up issue.

Index repr changes to make them consistent

jreback added Indexing Related to indexing on series/frames, not to indexes themselves Output-Formatting __repr__ of pandas objects, to_string labels Apr 14, 2015

jreback added this to the 0.16.1 milestone Apr 14, 2015

jreback mentioned this pull request Apr 14, 2015

ENH: repr now shows index name #6482 #9897

Closed

jreback force-pushed the repr-name branch 2 times, most recently from 5a79e96 to 9ff163e Compare April 15, 2015 00:19

jreback force-pushed the repr-name branch from 9ff163e to 4b7e5c9 Compare April 15, 2015 22:13

jreback force-pushed the repr-name branch from a318bb6 to 82f097d Compare April 16, 2015 22:51

jorisvandenbossche changed the title ~~Index to show name~~ Index repr changes to make them consistent Apr 17, 2015

jreback force-pushed the repr-name branch 2 times, most recently from 241d2bf to 66f5e12 Compare April 20, 2015 12:05

jreback force-pushed the repr-name branch from 66f5e12 to 4b42791 Compare April 28, 2015 11:21

jreback force-pushed the repr-name branch from 4b42791 to c6a5a4a Compare May 4, 2015 12:47

jreback force-pushed the repr-name branch 3 times, most recently from b784464 to 2746c13 Compare May 8, 2015 20:15

jreback force-pushed the repr-name branch from 2746c13 to a80ae5e Compare May 8, 2015 20:26

hsperr and others added 3 commits May 9, 2015 11:50

ENH: repr now shows index name pandas-dev#6482

aa66e30

move tests to generically tests for index generify __unicode__ for Index adjust index display to max_seq_items

formatting MultiIndex

a818882

fixup for CategoricalIndex merge

a3c52d1

increase limits for max_seq_items & printing for Index add extended repr for datetimelike indexes fix tseries/test_base for repr adjust docs for repr-name use new format_data on all Index types

jorisvandenbossche added 2 commits May 9, 2015 14:23

Change Index repr to adjust to string length

b190a9d

Conflicts: pandas/tseries/base.py use new format_data updates Fix detection of good width more fixes Change [ Conflicts: pandas/core/index.py more fixes revsised according to comments

Index repr: allow unequal number of elements on one line

e17e2b8

Inspired by numpy's array2string

jreback force-pushed the repr-name branch from a80ae5e to 4fc84a2 Compare May 9, 2015 18:23

more fixups

ac5aa58

jreback force-pushed the repr-name branch from 4fc84a2 to ac5aa58 Compare May 9, 2015 18:26

jorisvandenbossche mentioned this pull request May 9, 2015

DOC: update docstrings with new Index repr #10094

Closed

jreback added a commit that referenced this pull request May 9, 2015

Merge pull request #9901 from jreback/repr-name

d5fdbc6

Index repr changes to make them consistent

jreback merged commit d5fdbc6 into pandas-dev:master May 9, 2015

jreback mentioned this pull request May 9, 2015

ENH: follow up on indx-repr #10095

Closed

jreback mentioned this pull request Feb 23, 2016

Abbreviate MultiIndex representation #12423

Closed

jreback mentioned this pull request Apr 11, 2016

Large MultiIndex objects are not truncated when printing #12872

Closed

Uh oh!

Index repr changes to make them consistent #9901

Index repr changes to make them consistent #9901

Uh oh!

Conversation

jreback commented Apr 14, 2015

Uh oh!

jreback commented Apr 14, 2015

Uh oh!

shoyer commented Apr 14, 2015

Uh oh!

jorisvandenbossche commented Apr 15, 2015

Uh oh!

jorisvandenbossche commented Apr 15, 2015

Uh oh!

shoyer commented Apr 15, 2015

Uh oh!

jreback commented Apr 15, 2015

Uh oh!

jreback commented Apr 15, 2015

Uh oh!

jreback commented Apr 16, 2015

Uh oh!

shoyer commented Apr 16, 2015

Uh oh!

jorisvandenbossche commented Apr 16, 2015

Uh oh!

jorisvandenbossche commented Apr 16, 2015

Uh oh!

jreback commented Apr 16, 2015

Uh oh!

jreback commented Apr 16, 2015

Uh oh!

jreback commented Apr 20, 2015

Uh oh!

jorisvandenbossche commented Apr 20, 2015

Uh oh!

jreback commented Apr 20, 2015

Uh oh!

jreback commented Apr 28, 2015

Uh oh!

jreback commented May 8, 2015

Uh oh!

shoyer commented May 8, 2015

Uh oh!

shoyer commented May 8, 2015

Uh oh!

shoyer commented May 8, 2015

Uh oh!

jorisvandenbossche commented May 8, 2015

Uh oh!

jreback commented May 8, 2015

Uh oh!

jreback commented May 8, 2015

Uh oh!

jorisvandenbossche commented May 9, 2015

Uh oh!

jreback commented May 9, 2015

Uh oh!

jreback commented May 9, 2015

Uh oh!

jorisvandenbossche commented May 9, 2015

Uh oh!

jreback commented May 9, 2015

Uh oh!

Uh oh!