FIX: Fix problems with Series text representation. #9182

bjonen · 2015-01-02T03:08:18Z

This PR harmonizes the way DataFrame and Series are printed.
closes #8532
closes #7508
Before

In [1]: import pandas as pd

In [2]: pd.options.display.max_rows = 10

In [3]: s = pd.Series([1,1,1,1,1,1,1,1,1,1,0.9999,1,1]*10)

In [4]: s
Out[4]:
0    1
1    1
2    1
...
127    0.9999
128    1.0000
129    1.0000
Length: 130, dtype: float64

Now

0      1.0000
1      1.0000
2      1.0000
3      1.0000
4      1.0000
        ...  
125    1.0000
126    1.0000
127    0.9999
128    1.0000
129    1.0000
dtype: float64

shoyer · 2015-01-04T20:14:14Z

pandas/core/series.py

-                                    dtype=True)
-        elif self.name is None:
-            result = u('Series([], dtype: %s)') % (self.dtype)
+        # if max_rows and len(self.index) > max_rows:


please remove commented out code

shoyer · 2015-01-04T20:18:30Z

Looks like this still needs tests to verify that the mentioned issues are fixed?

jreback · 2015-01-18T21:03:31Z

@bjonen hows this coming?

bjonen · 2015-01-19T13:46:17Z

If you want I can make the adjustments proposed by @shoyer (thanks by the way). I will not have time to do major re-factoring in the next week or two.

jreback · 2015-03-05T23:38:09Z

pandas/core/series.py

+    #     """
+
+    #     Internal function, should always return unicode string
+    #     """


if this is not needed, pls just remove it (rather than comment out). If you want to keep the commented out for some reason, then put a comment why its here.

jreback · 2015-03-05T23:38:40Z

@bjonen can you put in the top-section the before and after?

jreback · 2015-03-07T23:16:18Z

pandas/tests/test_categorical.py

+        # a = pd.Series(pd.Categorical(["a","b"] *25, name="a"))
+        # exp = u("".join(["%s    a\n%s    b\n"%(i,i+1) for i in range(0,10,2)]) + "...\n" +
+        #         "".join(["%s    a\n%s    b\n"%(i,i+1) for i in range(40,50,2)]) +
+        #         "Name: a, Length: 50, dtype: category\n" +


hmm, can't take this out. assume the cat repr needs fixing? or is this just a dupe test

Checking on this one...

jreback · 2015-03-08T14:25:14Z

@bjonen also pls add a release note. I think this is pretty straight forward so don't even need an example, just put a sentence or 2 and refernce the issues. Can open another issue to address the TODO you mentioned (which can be in 0.16.0 or next version).

jreback · 2015-03-10T23:20:25Z

@bjonen I'ld like to get this in 0.16.0

can you rebase / squash and add an example?

bjonen · 2015-03-11T01:34:07Z

@jreback Example is at the top of the PR, or do you mean into the docs? I canl get this ready to go tomorrow. Hope that's early enough.

shoyer · 2015-03-11T01:36:01Z

@bjonen Yes, would be great to have this in the docs :)

bjonen · 2015-03-11T01:37:04Z

ok ;). Will add the example to the docs and squash tomorrow. Glad to have this in 0.16!

bjonen · 2015-03-12T00:15:56Z

Added example and rebased

shoyer · 2015-03-12T01:21:47Z

pandas/core/format.py

-                if not com.is_categorical_dtype(self.series.dtype):
-                    footer += ', '
+        if self.name is not False and name is not None:
+            # # categories have already a comma + linebreak


can you just delete these commented out lines?

shoyer · 2015-03-12T01:31:16Z

@bjonen Looks like your test changes for Categorical broke in the rebase because of #9622.

I would still love to see a unit test that verifies that the aligned display for the Series representation in the presence of truncation. This should be pretty easy to cook up, either with some explicit test cases for which you verify the entire result of __repr__, or with some sort of synthetic test that verifies line length, e.g.,

result = repr(s)
lines = result.split('\n')
fixed_width_lines = result[:5] + result[6:11]
assert len(set(len(line.strip()) for line in fixed_width_lines)) == 1

bjonen · 2015-03-12T08:33:07Z

Sure I will add this test and take out the print that is failing the 3.x build.

bjonen · 2015-03-13T08:13:12Z

Anything else you would like to add?

shoyer · 2015-03-13T08:19:32Z

pandas/core/format.py

+        series = self.series
+        if truncate_v:
+            if max_rows == 0:
+                row_num = len(series)


This line of code is unreachable given how you defined truncate_v above

shoyer · 2015-03-13T08:30:02Z

pandas/core/series.py

@@ -968,6 +949,13 @@ def to_string(self, buf=None, na_rep='NaN', float_format=None,
            Add the Series dtype
        name : boolean, default False
            Add the Series name (which may be None)
+        max_rows : int, optional


all of these new options need tests

shoyer · 2015-03-13T08:37:58Z

I would still be a little happier with some explicit tests, too, just to nail down that this looks right. I would probably include cases where the string width of the series values (a) is 1, (b) is 2, (c) changes and (d) changes in descending ordering.

Also, for the width consistency test, I would cook up a bunch of synthetic series and toss them all through the same checks.

shoyer · 2015-03-13T08:38:50Z

pandas/core/format.py

+                row_num = len(series)
+            if max_rows == 1:
+                row_num = max_rows
+                series = series.iloc[:max_rows]


Is max_rows == 1 covered by tests? If not, then it need some...

shoyer · 2015-03-13T08:42:22Z

Sorry to be so insistent about testing -- but covering all your bases makes a big difference :). You don't necessarily need every last test I've suggested (use your judgment), but it is almost impossible to have too many tests, and we would like to confident that everything works properly. Generally, it's good to verify every edge case and ensure that tests exercise every line of code.

shoyer · 2015-03-13T08:43:14Z

I think I've finally done that full thorough read now... :)

bjonen · 2015-03-13T08:51:40Z

Thanks @shoyer for the comments. I can do this on the weekend. What is the deadline for 0.16?

shoyer · 2015-03-13T08:59:59Z

I think @jreback wants to release a release candidate for 0.16 any day now.....

jreback · 2015-03-13T13:05:41Z

@bjonen I am going to do the rc today, but this still for 0.16.0, ping when you have updated (weekend ok)

bjonen · 2015-03-15T22:15:18Z

@jreback This is ready from my side. @shoyer Thanks again for the detailed comments and no worries for insisting on adding more tests.

jreback · 2015-03-15T22:21:08Z

pandas/core/format.py

        self.series = series
        self.buf = buf if buf is not None else StringIO()
        self.name = name
        self.na_rep = na_rep
        self.length = length
-        self.header = header


header not necessary i guess?

header in this case of series only concerns the index name i think. I removed header because it was unused in SeriesFormatter previously and it was also not accessible through the Series.to_string. But I'm ok to add it back. That should be quick.

maybe just for consistency add back
I don't know what it does

jreback · 2015-03-15T22:25:21Z

lgtm, @shoyer ?

bjonen · 2015-03-15T23:59:14Z

Added header to Series.to_string(), added test.

jreback · 2015-03-16T00:07:43Z

@bjonen thanks!

pls ping when green.

shoyer · 2015-03-16T02:53:34Z

pandas/tests/test_format.py

+    def test_format_explicit(self):
+        test_sers = self.gen_test_series()
+        with option_context("display.max_rows", 4):
+            res = test_sers['onel'].__repr__()


Can you test repr(test_sers['onel']) instead of .__repr__()? I prefer to test the end user API, if possible.

Same goes below...

shoyer · 2015-03-16T02:53:54Z

I have one style nit, but otherwise looks good to me.

jorisvandenbossche · 2015-03-16T08:33:17Z

@bjonen I just gave this a quick test. And in any case, this solves the original issues I reported (the wrong number of rows when truncated, and the floating point things, and the large max_rows value). Thanks for that!

But, I noticed something else now. Previously, the ... were in the index column, now in the values. Is this change deliberate?

With this PR:

In [10]:  s4 = pd.Series(range(10))

In [12]: pd.options.display.max_rows = 8

In [13]: s4
Out[13]:
0    0
1    1
2    2
3    3
    ..
6    6
7    7
8    8
9    9
dtype: int64

In 0.15.2:

In [10]: s4 = pd.Series(range(10))

In [16]: pd.options.display.max_rows = 8

In [17]: s4
Out[17]:
0    0
1    1
...
8    8
9    9
Length: 10, dtype: int64

It is just an observations, I am not yet sure what I like most. For dataframes with put dots in both the index column and values:

In [14]: s4.to_frame()
Out[14]:
    0
0   0
1   1
2   2
3   3
.. ..
6   6
7   7
8   8
9   9

[10 rows x 1 columns]

jorisvandenbossche · 2015-03-16T08:37:37Z

And a last thing: could you also add a test that tests the second bullet point in #8532: when setting pd.options.display.max_rows at something larger than 60 had no effect (but that is fixed now with your PR!)

bjonen · 2015-03-16T08:49:46Z

That's a fair point. While consistent with DataFrame, having the dots in both index and data for a Series looked like overkill to me. That's why it is only in one. I don't have a strong preference where to put it (index or values). You prefer to keep it in the index as in 0.15?

FIX: Fix problems with Series text representation.

jreback · 2015-03-17T00:19:21Z

@bjonen thanks for this! you are formatting expert!

as usual, pls look at the built docs (give it a couple of hours) and confirm that they look look ok.!

jreback added Bug Output-Formatting __repr__ of pandas objects, to_string labels Jan 2, 2015

jreback added this to the 0.16.0 milestone Jan 2, 2015

shoyer reviewed Jan 4, 2015
View reviewed changes

jreback reviewed Mar 5, 2015
View reviewed changes

bjonen force-pushed the fix_series branch from 70c126f to 1485d0a Compare March 7, 2015 17:21

jreback reviewed Mar 7, 2015
View reviewed changes

bjonen force-pushed the fix_series branch 2 times, most recently from fc74971 to 00409b6 Compare March 12, 2015 00:15

shoyer reviewed Mar 12, 2015
View reviewed changes

bjonen force-pushed the fix_series branch 2 times, most recently from dc0c55f to 0441088 Compare March 13, 2015 00:34

shoyer reviewed Mar 13, 2015
View reviewed changes

bjonen force-pushed the fix_series branch 3 times, most recently from 183299c to e74b381 Compare March 15, 2015 20:34

jreback reviewed Mar 15, 2015
View reviewed changes

bjonen force-pushed the fix_series branch 2 times, most recently from bedf6ee to c2bb7ed Compare March 15, 2015 23:57

shoyer reviewed Mar 16, 2015
View reviewed changes

FIX: Series text representation

7d715fa

bjonen force-pushed the fix_series branch from c2bb7ed to 7d715fa Compare March 16, 2015 08:05

jreback added a commit that referenced this pull request Mar 17, 2015

Merge pull request #9182 from bjonen/fix_series

e106df7

FIX: Fix problems with Series text representation.

jreback merged commit e106df7 into pandas-dev:master Mar 17, 2015

FIX: Fix problems with Series text representation. #9182

FIX: Fix problems with Series text representation. #9182

Conversation

bjonen commented Jan 2, 2015

Choose a reason for hiding this comment

shoyer commented Jan 4, 2015

jreback commented Jan 18, 2015

bjonen commented Jan 19, 2015

Choose a reason for hiding this comment

jreback commented Mar 5, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 8, 2015

jreback commented Mar 10, 2015

bjonen commented Mar 11, 2015

shoyer commented Mar 11, 2015

bjonen commented Mar 11, 2015

bjonen commented Mar 12, 2015

Choose a reason for hiding this comment

shoyer commented Mar 12, 2015

bjonen commented Mar 12, 2015

bjonen commented Mar 13, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shoyer commented Mar 13, 2015

Choose a reason for hiding this comment

shoyer commented Mar 13, 2015

shoyer commented Mar 13, 2015

bjonen commented Mar 13, 2015

shoyer commented Mar 13, 2015

jreback commented Mar 13, 2015

bjonen commented Mar 15, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Mar 15, 2015

bjonen commented Mar 15, 2015

jreback commented Mar 16, 2015

Choose a reason for hiding this comment

shoyer commented Mar 16, 2015

jorisvandenbossche commented Mar 16, 2015

jorisvandenbossche commented Mar 16, 2015

bjonen commented Mar 16, 2015

jreback commented Mar 17, 2015