Skip to content

Test long string formatting in to_string and to_html #1852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
httassadar opened this issue Sep 7, 2012 · 12 comments
Closed

Test long string formatting in to_string and to_html #1852

httassadar opened this issue Sep 7, 2012 · 12 comments
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string Testing pandas testing functions or related to the test suite

Comments

@httassadar
Copy link

Start with a data frame -- large-ish number of rows

df = DataFrame({'a':['a1']*100+['a2']*100, 'b':['b']*200})
dfg = df.groupby(['a'])
dfg.aggregate(len) ## this gives right answer, every entry is of length 100
dfg.aggregate(lambda x: ', '.join(x.values)) ## this is wrong, the output string is somehow truncated?

It works fine if the string is not very long, eg start with
df = DataFrame({'a':['a1']_10+['a2']_10, 'b':['b']*20)

@lodagro
Copy link
Contributor

lodagro commented Sep 7, 2012

This is not a bug, but a matter of how repr works, the values inside are as expected. pandas truncates columns in repr to 50 characters. This can be controlled with pandas.set_printoptions(), or you can force a print by using to_string()

>>> print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()

@lodagro lodagro closed this as completed Sep 7, 2012
@wesm
Copy link
Member

wesm commented Sep 7, 2012

Indeed, pandas tries to prevent flooding the screen in the console. I wonder if adding something like ... at the end of truncated strings would be useful to indicate truncation?

@httassadar
Copy link
Author

OK, yes, the value is there.

But df.to_string() does not retrieve it, it stops at the truncation -- I guess it calls repr and go from there?

Same for df.to_html()

@lodagro
Copy link
Contributor

lodagro commented Sep 7, 2012

Only __repr__ truncates, not to_string(), neither to_html.

In [16]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
                                                                                                                                                                                                                                                                                                             b
a                                                                                                                                                                                                                                                                                                             
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
In [19]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_html()
b
a
a1 b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
a2 b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b

@httassadar
Copy link
Author

Hmm, I'm on version 0.8.1 and it does truncate for to_string() and to_html()

But anyway, Lodagro's pandas.set_printoptions works, so problem solved :)

Thanks all.

@wesm
Copy link
Member

wesm commented Sep 7, 2012

@lodagro I'm going to reopen this and change the title so we add a test on the to_string and to_html behavior

@wesm wesm reopened this Sep 7, 2012
@lodagro
Copy link
Contributor

lodagro commented Sep 7, 2012

i assume you want to check that to_string() and to_html() never truncate, irrespective of the printoptions settings?

@wesm
Copy link
Member

wesm commented Sep 7, 2012

I guess that makes the most sense

@jreback
Copy link
Contributor

jreback commented Sep 26, 2013

@lodagro keep this open?

@lodagro
Copy link
Contributor

lodagro commented Sep 30, 2013

@jreback this issue moved away from the original question. Idea was to keep it open and make sure that to_string() and to_html() do not truncate. Just gave 0.12.0-651-gc8ab2dd a spin with the code above, both still truncate. If the idea of no truncation still stands, this issue can not be closed yet.

In [4]: from pandas import *

In [5]: cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:df = DataFrame({'a':['a1']*100+['a2']*100, 'b':['b']*200})
:dfg = df.groupby(['a'])
:dfg.aggregate(len) ## this gives right answer, every entry is of length 100
:dfg.aggregate(lambda x: ', '.join(x.values))
:--
Out[5]: 
                                                    b
a                                                    
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...

In [6]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
                                                    b
a                                                    
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...

In [7]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_html()                                                                                                                                                                                                      
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>b</th>
    </tr>
    <tr>
      <th>a</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>a1</th>
      <td> b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...</td>
    </tr>
    <tr>
      <th>a2</th>
      <td> b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...</td>
    </tr>
  </tbody>
</table>

In [8]: 

@jreback
Copy link
Contributor

jreback commented Sep 30, 2013

I think this will go toward solving this (maybe need some additional options), #5012 reopening and moving to 0.14:

thanks....always welcome a PR!

@jreback jreback modified the milestones: 0.15.0, 0.14.0 Mar 11, 2014
@jreback jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015
@jorisvandenbossche jorisvandenbossche modified the milestones: Contributions Welcome, No action Jul 6, 2018
@jorisvandenbossche
Copy link
Member

Closing as this is working as expected (and a matter of the repr, as noted above)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO HTML read_html, to_html, Styler.apply, Styler.applymap Output-Formatting __repr__ of pandas objects, to_string Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

5 participants