Skip to content

FIX: Fix problems with Series text representation. #9182

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 17, 2015

Conversation

bjonen
Copy link
Contributor

@bjonen bjonen commented Jan 2, 2015

This PR harmonizes the way DataFrame and Series are printed.
closes #8532
closes #7508
Before

In [1]: import pandas as pd

In [2]: pd.options.display.max_rows = 10

In [3]: s = pd.Series([1,1,1,1,1,1,1,1,1,1,0.9999,1,1]*10)

In [4]: s
Out[4]:
0    1
1    1
2    1
...
127    0.9999
128    1.0000
129    1.0000
Length: 130, dtype: float64

Now

0      1.0000
1      1.0000
2      1.0000
3      1.0000
4      1.0000
        ...  
125    1.0000
126    1.0000
127    0.9999
128    1.0000
129    1.0000
dtype: float64

@jreback jreback added Bug Output-Formatting __repr__ of pandas objects, to_string labels Jan 2, 2015
@jreback jreback added this to the 0.16.0 milestone Jan 2, 2015
dtype=True)
elif self.name is None:
result = u('Series([], dtype: %s)') % (self.dtype)
# if max_rows and len(self.index) > max_rows:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove commented out code

@shoyer
Copy link
Member

shoyer commented Jan 4, 2015

Looks like this still needs tests to verify that the mentioned issues are fixed?

@jreback
Copy link
Contributor

jreback commented Jan 18, 2015

@bjonen hows this coming?

@bjonen
Copy link
Contributor Author

bjonen commented Jan 19, 2015

If you want I can make the adjustments proposed by @shoyer (thanks by the way). I will not have time to do major re-factoring in the next week or two.

# """

# Internal function, should always return unicode string
# """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this is not needed, pls just remove it (rather than comment out). If you want to keep the commented out for some reason, then put a comment why its here.

@jreback
Copy link
Contributor

jreback commented Mar 5, 2015

@bjonen can you put in the top-section the before and after?

# a = pd.Series(pd.Categorical(["a","b"] *25, name="a"))
# exp = u("".join(["%s a\n%s b\n"%(i,i+1) for i in range(0,10,2)]) + "...\n" +
# "".join(["%s a\n%s b\n"%(i,i+1) for i in range(40,50,2)]) +
# "Name: a, Length: 50, dtype: category\n" +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, can't take this out. assume the cat repr needs fixing? or is this just a dupe test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking on this one...

@jreback
Copy link
Contributor

jreback commented Mar 8, 2015

@bjonen also pls add a release note. I think this is pretty straight forward so don't even need an example, just put a sentence or 2 and refernce the issues. Can open another issue to address the TODO you mentioned (which can be in 0.16.0 or next version).

@jreback
Copy link
Contributor

jreback commented Mar 10, 2015

@bjonen I'ld like to get this in 0.16.0

can you rebase / squash and add an example?

@bjonen
Copy link
Contributor Author

bjonen commented Mar 11, 2015

@jreback Example is at the top of the PR, or do you mean into the docs? I canl get this ready to go tomorrow. Hope that's early enough.

@shoyer
Copy link
Member

shoyer commented Mar 11, 2015

@bjonen Yes, would be great to have this in the docs :)

@bjonen
Copy link
Contributor Author

bjonen commented Mar 11, 2015

ok ;). Will add the example to the docs and squash tomorrow. Glad to have this in 0.16!

@bjonen bjonen force-pushed the fix_series branch 2 times, most recently from fc74971 to 00409b6 Compare March 12, 2015 00:15
@bjonen
Copy link
Contributor Author

bjonen commented Mar 12, 2015

Added example and rebased

if not com.is_categorical_dtype(self.series.dtype):
footer += ', '
if self.name is not False and name is not None:
# # categories have already a comma + linebreak
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just delete these commented out lines?

@shoyer
Copy link
Member

shoyer commented Mar 12, 2015

@bjonen Looks like your test changes for Categorical broke in the rebase because of #9622.

I would still love to see a unit test that verifies that the aligned display for the Series representation in the presence of truncation. This should be pretty easy to cook up, either with some explicit test cases for which you verify the entire result of __repr__, or with some sort of synthetic test that verifies line length, e.g.,

result = repr(s)
lines = result.split('\n')
fixed_width_lines = result[:5] + result[6:11]
assert len(set(len(line.strip()) for line in fixed_width_lines)) == 1 

@bjonen
Copy link
Contributor Author

bjonen commented Mar 12, 2015

Sure I will add this test and take out the print that is failing the 3.x build.

@bjonen bjonen force-pushed the fix_series branch 2 times, most recently from dc0c55f to 0441088 Compare March 13, 2015 00:34
@bjonen
Copy link
Contributor Author

bjonen commented Mar 13, 2015

Anything else you would like to add?

series = self.series
if truncate_v:
if max_rows == 0:
row_num = len(series)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line of code is unreachable given how you defined truncate_v above

@@ -968,6 +949,13 @@ def to_string(self, buf=None, na_rep='NaN', float_format=None,
Add the Series dtype
name : boolean, default False
Add the Series name (which may be None)
max_rows : int, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all of these new options need tests

@shoyer
Copy link
Member

shoyer commented Mar 13, 2015

I would still be a little happier with some explicit tests, too, just to nail down that this looks right. I would probably include cases where the string width of the series values (a) is 1, (b) is 2, (c) changes and (d) changes in descending ordering.

Also, for the width consistency test, I would cook up a bunch of synthetic series and toss them all through the same checks.

row_num = len(series)
if max_rows == 1:
row_num = max_rows
series = series.iloc[:max_rows]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is max_rows == 1 covered by tests? If not, then it need some...

@shoyer
Copy link
Member

shoyer commented Mar 13, 2015

Sorry to be so insistent about testing -- but covering all your bases makes a big difference :). You don't necessarily need every last test I've suggested (use your judgment), but it is almost impossible to have too many tests, and we would like to confident that everything works properly. Generally, it's good to verify every edge case and ensure that tests exercise every line of code.

@shoyer
Copy link
Member

shoyer commented Mar 13, 2015

I think I've finally done that full thorough read now... :)

@bjonen
Copy link
Contributor Author

bjonen commented Mar 13, 2015

Thanks @shoyer for the comments. I can do this on the weekend. What is the deadline for 0.16?

@shoyer
Copy link
Member

shoyer commented Mar 13, 2015

I think @jreback wants to release a release candidate for 0.16 any day now.....

@jreback
Copy link
Contributor

jreback commented Mar 13, 2015

@bjonen I am going to do the rc today, but this still for 0.16.0, ping when you have updated (weekend ok)

@bjonen bjonen force-pushed the fix_series branch 3 times, most recently from 183299c to e74b381 Compare March 15, 2015 20:34
@bjonen
Copy link
Contributor Author

bjonen commented Mar 15, 2015

@jreback This is ready from my side. @shoyer Thanks again for the detailed comments and no worries for insisting on adding more tests.

self.series = series
self.buf = buf if buf is not None else StringIO()
self.name = name
self.na_rep = na_rep
self.length = length
self.header = header
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header not necessary i guess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

header in this case of series only concerns the index name i think. I removed header because it was unused in SeriesFormatter previously and it was also not accessible through the Series.to_string. But I'm ok to add it back. That should be quick.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe just for consistency add back
I don't know what it does

@jreback
Copy link
Contributor

jreback commented Mar 15, 2015

lgtm, @shoyer ?

@bjonen bjonen force-pushed the fix_series branch 2 times, most recently from bedf6ee to c2bb7ed Compare March 15, 2015 23:57
@bjonen
Copy link
Contributor Author

bjonen commented Mar 15, 2015

Added header to Series.to_string(), added test.

@jreback
Copy link
Contributor

jreback commented Mar 16, 2015

@bjonen thanks!

pls ping when green.

def test_format_explicit(self):
test_sers = self.gen_test_series()
with option_context("display.max_rows", 4):
res = test_sers['onel'].__repr__()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test repr(test_sers['onel']) instead of .__repr__()? I prefer to test the end user API, if possible.

Same goes below...

@shoyer
Copy link
Member

shoyer commented Mar 16, 2015

I have one style nit, but otherwise looks good to me.

@jorisvandenbossche
Copy link
Member

@bjonen I just gave this a quick test. And in any case, this solves the original issues I reported (the wrong number of rows when truncated, and the floating point things, and the large max_rows value). Thanks for that!

But, I noticed something else now. Previously, the ... were in the index column, now in the values. Is this change deliberate?

With this PR:

In [10]:  s4 = pd.Series(range(10))

In [12]: pd.options.display.max_rows = 8

In [13]: s4
Out[13]:
0    0
1    1
2    2
3    3
    ..
6    6
7    7
8    8
9    9
dtype: int64

In 0.15.2:

In [10]: s4 = pd.Series(range(10))

In [16]: pd.options.display.max_rows = 8

In [17]: s4
Out[17]:
0    0
1    1
...
8    8
9    9
Length: 10, dtype: int64

It is just an observations, I am not yet sure what I like most. For dataframes with put dots in both the index column and values:

In [14]: s4.to_frame()
Out[14]:
    0
0   0
1   1
2   2
3   3
.. ..
6   6
7   7
8   8
9   9

[10 rows x 1 columns]

@jorisvandenbossche
Copy link
Member

And a last thing: could you also add a test that tests the second bullet point in #8532: when setting pd.options.display.max_rows at something larger than 60 had no effect (but that is fixed now with your PR!)

@bjonen
Copy link
Contributor Author

bjonen commented Mar 16, 2015

That's a fair point. While consistent with DataFrame, having the dots in both index and data for a Series looked like overkill to me. That's why it is only in one. I don't have a strong preference where to put it (index or values). You prefer to keep it in the index as in 0.15?

jreback added a commit that referenced this pull request Mar 17, 2015
FIX: Fix problems with Series text representation.
@jreback jreback merged commit e106df7 into pandas-dev:master Mar 17, 2015
@jreback
Copy link
Contributor

jreback commented Mar 17, 2015

@bjonen thanks for this! you are formatting expert!

as usual, pls look at the built docs (give it a couple of hours) and confirm that they look look ok.!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Series not correctly formatted in truncated output Incorrect truncating of Series according to max_rows
4 participants