Skip to content

BUG: DataFrame.to_string with formatters, header and index False #13350

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 360 commits into from
Closed

BUG: DataFrame.to_string with formatters, header and index False #13350

wants to merge 360 commits into from

Conversation

chiroptical
Copy link

@chiroptical chiroptical commented Jun 2, 2016

closes #13032

  • tests added / passed - added test specific to format bug
  • passes pep8radius master --diff
  • whatsnew entry - not needed

Found this bug experimenting with formatters. First pull request to pandas, but I believe guidelines are quite clear. I can explain what was happening in more detail if that is necessary.

@jreback
Copy link
Contributor

jreback commented Jun 2, 2016

is this related to #13032 ?

can you show an example of before / after

@jreback jreback added the Output-Formatting __repr__ of pandas objects, to_string label Jun 2, 2016
@@ -9,6 +9,17 @@
_multiprocess_can_split_ = True


def test_to_string_formatters_index_header():
from pandas import DataFrame
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue number (this PR since there is no number) as a comment

if you need the import put at the top of the file

@chiroptical
Copy link
Author

Turns out is exactly related to #13032. I did see this issue, but didn't make the connection initially. In my testing, the lines should be removed not modified. To be honest, I didn't understand what these lines accomplished when I was reviewing the code originally.

Minimal code to reproduce (stripped down from #13032, easier to see with formatters):

>>> import pandas as pd
>>> frame = pd.DataFrame(data={0: 0, 1: 0}, index=[0])
>>> formatter = lambda x: '{:10.3f}'.format(x)
>>> print(frame.to_string(index=False, header=False))

Before (adding slashes to make space counting easier):
\ \ \ \ 0.000\ \ \ \ \ \ 0.000
After:
\ \ \ \ \ 0.000\ \ \ \ \ \ 0.000

Output from PR for #13032 (without formatter):

>>> print(df.to_string(index=False))
      one       two     three
 1.722364  0.846757  0.094394
-0.578834  0.836656  0.665414
 0.345460  1.782786  1.760175

Output from PR for #13032 (with formatter 10.3f, slashes added once for column count):

>>> print(df.to_string(index=False,formatters=[formatter,formatter,formatter]))
       one        two      three
\ \ \ \ \ 1.722      0.847      0.094
    -0.579      0.837      0.665
     0.345      1.783      1.760

I will add the PR and Issues numbers plus make corrections shortly.

@codecov-io
Copy link

codecov-io commented Jun 3, 2016

Current coverage is 84.23% (diff: 100%)

Merging #13350 into master will increase coverage by <.01%

@@             master     #13350   diff @@
==========================================
  Files           138        138          
  Lines         50713      50721     +8   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42715      42723     +8   
  Misses         7998       7998          
  Partials          0          0          

Powered by Codecov. Last update ce56542...73d7b7e

@jreback jreback added the Bug label Jun 3, 2016
@chiroptical
Copy link
Author

Additionally, the failed Travis CI checks don't appear to be related to my subtrations/additions. Lastly, the unit test must use assert because of the location in the code. Should it be moved?

@jreback
Copy link
Contributor

jreback commented Jun 3, 2016

@barrymoo can you also read thru the comments on that issue (and the referenced one) and see if covering bases.

@chiroptical
Copy link
Author

@jreback Of course,

From comment #13032 (comment):
Before:

Name    Value
                                  Short        1
                                 Longer  9374518
Much Longer name to the Max -----------    32432

After (desired output, correct?):

                                    Name    Value
                                   Short        1
                                  Longer  9374518
 Much Longer name to the Max -----------    32432

I think the others comments would be unaffected. To be fair though, I am very new to pandas.

@chiroptical
Copy link
Author

chiroptical commented Jun 3, 2016

I see what you mean now. From #11833:

Code:

>>> import pandas as pd
>>> df = pd.DataFrame({'a':range(5)})
>>> df.to_string(index=False)
u' a\n 0\n 1\n 2\n 3\n 4'
>>> formatter = lambda x: '{:1d}'.format(x)
>>> df.to_string(formatters=[formatter], index=False)
u'a\n0\n1\n2\n3\n4'

I should be able to fix this issue quickly.

@chiroptical
Copy link
Author

A different perspective: where are those spaces coming from in the first place? (I will try to track this down)

@chiroptical
Copy link
Author

chiroptical commented Jun 3, 2016

Code:

#!/usr/bin/env python
import pandas as pd
import numpy as np

# Test 1
frame = pd.DataFrame(data={0: 0, 1: 0}, index=[0])
formatter = lambda x: '{:10.3f}'.format(x)
string = frame.to_string(index=False, header=False)
print('--> Begin Test 1 <--')
print(string)
print('-->  End Test 1  <--')

# Test 2
df = pd.DataFrame(np.random.randn(3, 3), columns=['one', 'two', 'three'])
string = df.to_string(index=False)
print('--> Begin Test 2 <--')
print(string)
print('-->  End Test 2  <--')

# Test 3
df = pd.DataFrame({'a':range(5)})
string = df.to_string(index=False)
print('--> Begin Test 3 <--')
print(string)
print('-->  End Test 3  <--')

# Test 4
NAMES = ['Short', 'Longer', 'Much Longer name to the Max -----------']
VALUES = [1, 9374518, 32432]
d = pd.DataFrame({'Name': NAMES, 'Value': VALUES})
string = d.to_string(index=False)
print('--> Begin Test 4 <--')
print(string)
print('-->  End Test 4  <--')

Produces:

$ python quick-test.py
--> Begin Test 1 <--
0 0
-->  End Test 1  <--
--> Begin Test 2 <--
      one       two     three
-0.117275 -0.410192 -2.170441
 0.194766  0.521318  0.936951
-0.923841 -1.829388  1.078478
-->  End Test 2  <--
--> Begin Test 3 <--
a
0
1
2
3
4
-->  End Test 3  <--
--> Begin Test 4 <--
                                   Name   Value
                                  Short       1
                                 Longer 9374518
Much Longer name to the Max -----------   32432
-->  End Test 4  <--

I believe this is the desired output. Unfortunately, this code fails about 20 tests. Hopefully, it is because the spacing in the expected output has changed slightly.

@chiroptical
Copy link
Author

About 30 failed checks from nosetests pandas/tests/formats/test_format.py. I will checking the expected outputs this weekend. Note, I edited #13350 (comment) to include the new code edits.

@chiroptical
Copy link
Author

chiroptical commented Jun 3, 2016

I have created barrymoo/pandas-pr-13350-supplement to document the test failures (and eventually generate new expected strings). This is a work-in-progress.

@chiroptical
Copy link
Author

I have added some more tests to the supplement. There is one test which I am having some difficulty with, starts: https://github.com/barrymoo/pandas-pr-13350-supplement/blob/master/tests.py#L474. I think the frame is supposed to overcome the terminal size, but it doesn't.

@chiroptical
Copy link
Author

Hey @jreback I finished my supplement but I need some opinions about output formatting especially concerning the test_*_east_asian* tests (I don't know what these should look like). Is there anyone else we could pull in to review this? That way I can fix the rest of the formatting concerns and fix everything with one PR.

For the supplement, clone it, activate the dev environment (tested with 2 & 3, but have not examined diffs of the output), run python tests.py, and review formatting. Submit issues to the other repo with concerns of specific tests.

@jreback
Copy link
Contributor

jreback commented Jun 16, 2016

@barrymoo not sure what you mean by supplement. simply update this PR, comments can just be done here.
cc @sinhrks

@sinhrks
Copy link
Member

sinhrks commented Jun 16, 2016

@barrymoo I live in Japan and am willing to check the output format amd provide test cases:)

@chiroptical
Copy link
Author

@jreback I ripped out the failing tests so one can easily print the results out on the command line. That way I can get some input from the community on how people want things formatted and make additional changes.

@chiroptical
Copy link
Author

chiroptical commented Jun 21, 2016

Here's a great example for why I need the supplement. For test_datetimelike_frame, my changes lead to the following output

                          dt  x
0  2011-01-01 00:00:00-05:00  1
1  2011-01-01 00:00:00-05:00  2
..                       ... ..
8                        NaT  9
9                        NaT 10

But, do you like:

                          dt  x
 0 2011-01-01 00:00:00-05:00  1
 1 2011-01-01 00:00:00-05:00  2
..                       ... ..
 8                       NaT  9
 9                       NaT 10

or...

                         dt  x
0 2011-01-01 00:00:00-05:00  1
1 2011-01-01 00:00:00-05:00  2
.                       ... ..
8                       NaT  9
9                       NaT 10

I can easily generate all of these outputs. Or, would you rather I pick what I like and get all of the tests working.

@evanpw
Copy link
Contributor

evanpw commented Jun 26, 2016

I also worked on this; getting all of the tests to pass afterward is a nightmare. It looks like this change removes the leading space on integers but leaves it on floats. Is that true?

@chiroptical
Copy link
Author

@evanpw it's very time consuming. I am still working through all of the tests, but I don't have a ton of free time. If you're looking at all positive numbers there is an extra space for the nonexistent "-" sign. There is an example in one of the above comments.

@evanpw
Copy link
Contributor

evanpw commented Jun 26, 2016

After this change there won't be an extra leading space for a column of positive integers, but there will still be one for a column of positive floats, right?

@chiroptical
Copy link
Author

That's correct, but I can likely fix that too. Again the majority of this work is fixing the tests.

sinhrks and others added 5 commits July 18, 2016 18:08
Author: sinhrks <[email protected]>

Closes #13677 from sinhrks/append_series and squashes the following commits:

4bc7b54 [sinhrks] ENH: Series.append now has ignore_index kw
closes #13598

Author: wcwagner <[email protected]>

Closes #13690 from wcwagner/bug/13598 and squashes the following commits:

9669f3f [wcwagner] BUG: "Replaced isinstance with is_integer, and changed test_pad_width to use getattr"
40a3188 [wcwagner] BUG: "Switched to single test method asserting functions that use pad raise correctly."
06795db [wcwagner] BUG: "Added tests for width parameter on center, ljust, rjust, zfill."
468df3a [wcwagner] BUG: Add  type check for width parameter in str.pad method GH13598
closes #13603

Author: yui-knk <[email protected]>

Closes #13687 from yui-knk/fix_13603 and squashes the following commits:

0960395 [yui-knk] BUG: Cast a key to NaT before get loc from Index
…st entry

closes #13695

Author: Jeff Reback <[email protected]>

Closes #13698 from jreback/merge_asof and squashes the following commits:

c46dcfa [Jeff Reback] BUG: merge_asof not handling allow_exact_matches and tolerance on first entry
jreback and others added 19 commits September 5, 2016 18:02
closes #12995     flake8-ed *.pyx files and fixed errors.    Removed
the E226 check because that inhibits pointers (e.g. char*).

Author: gfyoung <[email protected]>

Closes #14147 from gfyoung/pyx-flake8 and squashes the following commits:

386ed58 [gfyoung] MAINT: flake8 *.pyx files
…#14164)

API/DEPR: Remove +/- as setops for DatetimeIndex/PeriodIndex (GH9630)

xref #13777, deprecations put in place in #9630
* MAINT: Replace datetools import in tests

* MAINT: Replace datetools import internally

* DOC: Replace datetools import in docs

* MAINT: Remove datetool imports from scripts

* DEPR: Deprecate pandas.core.datetools

Closes gh-14094.
Concatting categoricals with non-matching categories will now return object dtype instead of raising an error.

* ENH: concat and append now can handleunordered categories

* reomove union_categoricals kw from concat
* DOC: remove examples on Panel4D (caused warnings) and refer to older docs

* DOC: fix build warnings

* resolve comments
* DOC: clean-up 0.19.0 whatsnew file
* further clean-up
* Update highlights
* consistent use of behaviour/behavior
* s/favour/favor
closes #14088

Author: John Liekezer <[email protected]>

Closes #14090 from conquistador1492/issue_14088 and squashes the following commits:

c91425b [John Liekezer] BUG: fix tz-aware datetime convert to DatetimeIndex (GH 14088)
closes #14190

Author: Chris <[email protected]>

Closes #14191 from chris-b1/cat-ctor and squashes the following commits:

4cad147 [Chris] add some nulls to tests
da865e2 [Chris] BUG: Categorical constructor not idempotent with ext dtype
closes #14171

Author: Josh Howes <[email protected]>

Closes #14182 from josh-howes/bugfix/14171-series-str-contains-only-nan-values and squashes the following commits:

c7e9721 [Josh Howes] BUG: fix str.contains for series containing only nan values
@jreback
Copy link
Contributor

jreback commented Sep 9, 2016

can you rebase / update?

@chiroptical
Copy link
Author

chiroptical commented Sep 10, 2016

I didn't do this correct, sorry have not mastered this bit of git yet. I will submit a different pull request

@chiroptical chiroptical deleted the dataframe-to_string-minor-bug-fix branch September 10, 2016 02:25
@chiroptical chiroptical mentioned this pull request Sep 10, 2016
4 tasks
@jorisvandenbossche jorisvandenbossche added this to the No action milestone Sep 14, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Justification is broken with to_string(index=False)