-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
FIX: Fix problems with Series text representation. #9182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
dtype=True) | ||
elif self.name is None: | ||
result = u('Series([], dtype: %s)') % (self.dtype) | ||
# if max_rows and len(self.index) > max_rows: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove commented out code
Looks like this still needs tests to verify that the mentioned issues are fixed? |
@bjonen hows this coming? |
If you want I can make the adjustments proposed by @shoyer (thanks by the way). I will not have time to do major re-factoring in the next week or two. |
# """ | ||
|
||
# Internal function, should always return unicode string | ||
# """ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is not needed, pls just remove it (rather than comment out). If you want to keep the commented out for some reason, then put a comment why its here.
@bjonen can you put in the top-section the before and after? |
# a = pd.Series(pd.Categorical(["a","b"] *25, name="a")) | ||
# exp = u("".join(["%s a\n%s b\n"%(i,i+1) for i in range(0,10,2)]) + "...\n" + | ||
# "".join(["%s a\n%s b\n"%(i,i+1) for i in range(40,50,2)]) + | ||
# "Name: a, Length: 50, dtype: category\n" + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, can't take this out. assume the cat repr needs fixing? or is this just a dupe test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking on this one...
@bjonen also pls add a release note. I think this is pretty straight forward so don't even need an example, just put a sentence or 2 and refernce the issues. Can open another issue to address the TODO you mentioned (which can be in 0.16.0 or next version). |
@bjonen I'ld like to get this in 0.16.0 can you rebase / squash and add an example? |
@jreback Example is at the top of the PR, or do you mean into the docs? I canl get this ready to go tomorrow. Hope that's early enough. |
@bjonen Yes, would be great to have this in the docs :) |
ok ;). Will add the example to the docs and squash tomorrow. Glad to have this in 0.16! |
fc74971
to
00409b6
Compare
Added example and rebased |
if not com.is_categorical_dtype(self.series.dtype): | ||
footer += ', ' | ||
if self.name is not False and name is not None: | ||
# # categories have already a comma + linebreak |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just delete these commented out lines?
@bjonen Looks like your test changes for Categorical broke in the rebase because of #9622. I would still love to see a unit test that verifies that the aligned display for the Series representation in the presence of truncation. This should be pretty easy to cook up, either with some explicit test cases for which you verify the entire result of result = repr(s)
lines = result.split('\n')
fixed_width_lines = result[:5] + result[6:11]
assert len(set(len(line.strip()) for line in fixed_width_lines)) == 1 |
Sure I will add this test and take out the |
dc0c55f
to
0441088
Compare
Anything else you would like to add? |
series = self.series | ||
if truncate_v: | ||
if max_rows == 0: | ||
row_num = len(series) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line of code is unreachable given how you defined truncate_v
above
@@ -968,6 +949,13 @@ def to_string(self, buf=None, na_rep='NaN', float_format=None, | |||
Add the Series dtype | |||
name : boolean, default False | |||
Add the Series name (which may be None) | |||
max_rows : int, optional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of these new options need tests
I would still be a little happier with some explicit tests, too, just to nail down that this looks right. I would probably include cases where the string width of the series values (a) is 1, (b) is 2, (c) changes and (d) changes in descending ordering. Also, for the width consistency test, I would cook up a bunch of synthetic series and toss them all through the same checks. |
row_num = len(series) | ||
if max_rows == 1: | ||
row_num = max_rows | ||
series = series.iloc[:max_rows] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is max_rows == 1
covered by tests? If not, then it need some...
Sorry to be so insistent about testing -- but covering all your bases makes a big difference :). You don't necessarily need every last test I've suggested (use your judgment), but it is almost impossible to have too many tests, and we would like to confident that everything works properly. Generally, it's good to verify every edge case and ensure that tests exercise every line of code. |
I think I've finally done that full thorough read now... :) |
Thanks @shoyer for the comments. I can do this on the weekend. What is the deadline for 0.16? |
I think @jreback wants to release a release candidate for 0.16 any day now..... |
@bjonen I am going to do the rc today, but this still for 0.16.0, ping when you have updated (weekend ok) |
183299c
to
e74b381
Compare
self.series = series | ||
self.buf = buf if buf is not None else StringIO() | ||
self.name = name | ||
self.na_rep = na_rep | ||
self.length = length | ||
self.header = header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
header not necessary i guess?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
header in this case of series only concerns the index name i think. I removed header
because it was unused in SeriesFormatter previously and it was also not accessible through the Series.to_string. But I'm ok to add it back. That should be quick.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just for consistency add back
I don't know what it does
lgtm, @shoyer ? |
bedf6ee
to
c2bb7ed
Compare
Added |
@bjonen thanks! pls ping when green. |
def test_format_explicit(self): | ||
test_sers = self.gen_test_series() | ||
with option_context("display.max_rows", 4): | ||
res = test_sers['onel'].__repr__() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test repr(test_sers['onel'])
instead of .__repr__()
? I prefer to test the end user API, if possible.
Same goes below...
I have one style nit, but otherwise looks good to me. |
@bjonen I just gave this a quick test. And in any case, this solves the original issues I reported (the wrong number of rows when truncated, and the floating point things, and the large max_rows value). Thanks for that! But, I noticed something else now. Previously, the With this PR:
In 0.15.2:
It is just an observations, I am not yet sure what I like most. For dataframes with put dots in both the index column and values:
|
And a last thing: could you also add a test that tests the second bullet point in #8532: when setting |
That's a fair point. While consistent with |
FIX: Fix problems with Series text representation.
@bjonen thanks for this! you are formatting expert! as usual, pls look at the built docs (give it a couple of hours) and confirm that they look look ok.! |
This PR harmonizes the way DataFrame and Series are printed.
closes #8532
closes #7508
Before
Now