Test long string formatting in to_string and to_html #1852

httassadar · 2012-09-07T13:11:08Z

Start with a data frame -- large-ish number of rows

df = DataFrame({'a':['a1']*100+['a2']*100, 'b':['b']*200})
dfg = df.groupby(['a'])
dfg.aggregate(len) ## this gives right answer, every entry is of length 100
dfg.aggregate(lambda x: ', '.join(x.values)) ## this is wrong, the output string is somehow truncated?

It works fine if the string is not very long, eg start with
df = DataFrame({'a':['a1']_10+['a2']_10, 'b':['b']*20)

The text was updated successfully, but these errors were encountered:

lodagro · 2012-09-07T13:23:07Z

This is not a bug, but a matter of how repr works, the values inside are as expected. pandas truncates columns in repr to 50 characters. This can be controlled with pandas.set_printoptions(), or you can force a print by using to_string()

>>> print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()

wesm · 2012-09-07T13:29:16Z

Indeed, pandas tries to prevent flooding the screen in the console. I wonder if adding something like ... at the end of truncated strings would be useful to indicate truncation?

httassadar · 2012-09-07T14:09:47Z

OK, yes, the value is there.

But df.to_string() does not retrieve it, it stops at the truncation -- I guess it calls repr and go from there?

Same for df.to_html()

lodagro · 2012-09-07T14:49:40Z

Only __repr__ truncates, not to_string(), neither to_html.

In [16]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
                                                                                                                                                                                                                                                                                                             b
a                                                                                                                                                                                                                                                                                                             
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b

In [19]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_html()

	b
a
a1	b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b
a2	b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b

httassadar · 2012-09-07T15:05:08Z

Hmm, I'm on version 0.8.1 and it does truncate for to_string() and to_html()

But anyway, Lodagro's pandas.set_printoptions works, so problem solved :)

Thanks all.

wesm · 2012-09-07T17:58:21Z

@lodagro I'm going to reopen this and change the title so we add a test on the to_string and to_html behavior

lodagro · 2012-09-07T18:33:51Z

i assume you want to check that to_string() and to_html() never truncate, irrespective of the printoptions settings?

wesm · 2012-09-07T18:37:21Z

I guess that makes the most sense

jreback · 2013-09-26T00:24:40Z

@lodagro keep this open?

lodagro · 2013-09-30T10:57:05Z

@jreback this issue moved away from the original question. Idea was to keep it open and make sure that to_string() and to_html() do not truncate. Just gave 0.12.0-651-gc8ab2dd a spin with the code above, both still truncate. If the idea of no truncation still stands, this issue can not be closed yet.

In [4]: from pandas import *

In [5]: cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:df = DataFrame({'a':['a1']*100+['a2']*100, 'b':['b']*200})
:dfg = df.groupby(['a'])
:dfg.aggregate(len) ## this gives right answer, every entry is of length 100
:dfg.aggregate(lambda x: ', '.join(x.values))
:--
Out[5]: 
                                                    b
a                                                    
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...

In [6]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_string()
                                                    b
a                                                    
a1  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...
a2  b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...

In [7]: print dfg.aggregate(lambda x: ', '.join(x.values)).to_html()                                                                                                                                                                                                      
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>b</th>
    </tr>
    <tr>
      <th>a</th>
      <th></th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>a1</th>
      <td> b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...</td>
    </tr>
    <tr>
      <th>a2</th>
      <td> b, b, b, b, b, b, b, b, b, b, b, b, b, b, b, b...</td>
    </tr>
  </tbody>
</table>

In [8]:

jreback · 2013-09-30T11:34:11Z

I think this will go toward solving this (maybe need some additional options), #5012 reopening and moving to 0.14:

thanks....always welcome a PR!

jorisvandenbossche · 2018-07-06T22:02:41Z

Closing as this is working as expected (and a matter of the repr, as noted above)

lodagro closed this as completed Sep 7, 2012

lodagro mentioned this issue Sep 7, 2012

indicate __repr__ truncation by ... #1854

Closed

wesm reopened this Sep 7, 2012

jreback modified the milestones: 0.15.0, 0.14.0 Mar 11, 2014

jreback modified the milestones: 0.16.0, 0.17.0 Jan 26, 2015

jorisvandenbossche mentioned this issue Aug 10, 2017

DataFrame.to_string truncates long strings #9784

Closed

jorisvandenbossche closed this as completed Jul 6, 2018

jorisvandenbossche modified the milestones: Contributions Welcome, No action Jul 6, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test long string formatting in to_string and to_html #1852

Test long string formatting in to_string and to_html #1852

httassadar commented Sep 7, 2012

lodagro commented Sep 7, 2012

wesm commented Sep 7, 2012

httassadar commented Sep 7, 2012

lodagro commented Sep 7, 2012

httassadar commented Sep 7, 2012

wesm commented Sep 7, 2012

lodagro commented Sep 7, 2012

wesm commented Sep 7, 2012

jreback commented Sep 26, 2013

lodagro commented Sep 30, 2013

jreback commented Sep 30, 2013

jorisvandenbossche commented Jul 6, 2018

Test long string formatting in to_string and to_html #1852

Test long string formatting in to_string and to_html #1852

Comments

httassadar commented Sep 7, 2012

lodagro commented Sep 7, 2012

wesm commented Sep 7, 2012

httassadar commented Sep 7, 2012

lodagro commented Sep 7, 2012

httassadar commented Sep 7, 2012

wesm commented Sep 7, 2012

lodagro commented Sep 7, 2012

wesm commented Sep 7, 2012

jreback commented Sep 26, 2013

lodagro commented Sep 30, 2013

jreback commented Sep 30, 2013

jorisvandenbossche commented Jul 6, 2018