DataFrame output too wide / not truncated properly #32461

emsems · 2020-03-05T12:29:15Z

Problem description

It seems to me, that DataFrames are not always correctly trunctaed to fit the terminal width (using pandas 1.0.1).
My pandas config is set such, that it auto detects terminal width and the representation of the DataFrame should fit in.

Relevant pandas config settings

display.width : int                                                           
    Width of the display in characters. In case python/IPython is running in  
    a terminal this can be set to None and pandas will correctly auto-detect  
    the width.                                                                
    Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
    terminal and hence it is not possible to correctly detect the width.      
    [default: 80] [currently: 80]                                             
display.max_columns : int                                                     
    If max_cols is exceeded, switch to truncate view. Depending on            
    `large_repr`, objects are either centrally truncated or printed as        
    a summary view. 'None' value means unlimited.                             
                                                                              
    In case python/IPython is running in a terminal and `large_repr`          
    equals 'truncate' this can be set to 0 and pandas will auto-detect        
    the width of the terminal and print a truncated object which fits         
    the screen width. The IPython notebook, IPython qtconsole, or IDLE        
    do not run in a terminal and hence it is not possible to do               
    correct auto-detection.                                                   
    [default: 0] [currently: 0]                                               
display.max_colwidth : int or None                                            
    The maximum width in characters of a column in the repr of                
    a pandas data structure. When the column overflows, a "..."               
    placeholder is embedded in the output. A 'None' value means unlimited.    
    [default: 50] [currently: 50]                                             
display.expand_frame_repr : boolean                                           
    Whether to print out the full DataFrame repr for wide DataFrames across   
    multiple lines, `max_columns` is still respected, but the output will     
    wrap-around across multiple "pages" if its width exceeds `display.width`. 
    [default: True] [currently: True]                                         
display.large_repr : 'truncate'/'info'                                        
    For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can  
    show a truncated table (the default from 0.13), or switch to the view from
    df.info() (the behaviour in earlier versions of pandas).                  
    [default: truncate] [currently: truncate]                                 
display.column_space No description available.                                
    [default: 12] [currently: 12]

Code Sample

Please resize terminal to have width 127 to reproduce
Check with shutil.get_terminal_size()

import pandas as pd
from io import StringIO
import shutil

# terminal width (in my case: columns=127, lines=40)
print('terminal width: {} characters'.format(shutil.get_terminal_size()[0]))

s = 'id,tstamp,00aaaaa,01a,02aaaaa,03a,04aaaaa,05,06aaaa,07aaaaaaaaaaaaa,08aaaaaa,09aaaaaaa,10aa,11aaaaaa,12aaaaaaaaa,13aaaaaaaa,14aaaaaaa,15aaaaaaaa,16a,17aaaaa,18aa,19aaaaaa,20,21,22aaaaaa,23aaaaaa,24aaaaa,25aa,26aaaaaa,27a,28aaaaa,29aaaa\r\n779491690,2019-02-01 00:00:02+00:00,,161.38538188324176,297.461393148902,,,,0.466667,False,False,3,,0.007,53.8849,,0.0323102,-0.4,,0.008,,0.0,,,,17.1,-1e-06,,0.024,,0.045,159.72512756708358\r\n779491691,2019-02-01 00:05:02+00:00,,162.2999981618803,299.5123814547798,,,,0.553571,True,False,3,,0.007,-85.1749,,0.0969305,-0.5,,0.008,,0.0,,,,17.0,-3e-06,,0.031,,0.049,160.6961983114413\r\n779491692,2019-02-01 00:10:02+00:00,,163.1754248277306,301.7498948568431,,,,0.530612,False,False,3,,0.007,,,-0.0646204,-0.4,,0.007,,0.0,,,,17.0,2e-06,,0.026,,0.049,161.6468595698913\r\n779491693,2019-02-01 00:15:02+00:00,,164.00520705009447,304.19960616946184,,,,0.466667,False,False,3,,0.007,,,-0.0323102,-0.4,,0.008,,0.0,,,,17.0,1e-06,,0.024,,0.045,162.5736942291059\r\n779491694,2019-02-01 00:20:02+00:00,,164.78185830034352,306.8905124087089,,,,0.511792,True,False,3,,0.007,128.438,,-0.0323102,-0.4,,0.008,,0.0,,,,17.0,1e-06,,0.031,,0.053,163.47261824257413\r\n'
f = StringIO()
f.write(s)
f.seek(0)
df = pd.read_csv(f, index_col=[0, 1])
print(df)

The lines of the string representation of the DataFrame are too long, therefore each line spans across two lines (depending on the terminal width; with the given settings it needs 129 characters instead of the available 127). To me this looks like a bug in the DataFrameFormatter. As I understand probably in the write_result method line 839ff
Could that be?

By the way, I also don't quite understand the display.expand_frame_repr setting. It is now set to True. When I set it to False the DataFrame does not get truncated but the full represenation is printed across multiple lines. Shouldn't that be excactly the other way round?

The following issue seems to be related, but I did not find the exact same problem. Hope I didn't miss anything: #16911

Expected Output

A DataFrame representation truncated such, that each line fits the terminal width.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200209
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : 2.4.3
blosc : None
feather : 0.4.0
xlsxwriter : 1.2.8
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
lxml.etree : None
matplotlib : 3.2.0
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : 0.15.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.48.0

The text was updated successfully, but these errors were encountered:

alexklapheke · 2023-01-26T01:56:12Z

I'm having this problem too (can confirm it's still happening on main branch). The width of the DataFrame's repr exceeds terminal width by up to 4 characters. I think this function is not accounting for the added width of the ... that stands in for the missing columns:

pandas/pandas/io/formats/string.py

Lines 159 to 192 in 2e218d1

    
           def _fit_strcols_to_terminal_width(self, strcols: list[list[str]]) -> str: 
        
               from pandas import Series 
        
               lines = self.adj.adjoin(1, *strcols).split("\n") 
        
               max_len = Series(lines).str.len().max() 
        
               # plus truncate dot col 
        
               width, _ = get_terminal_size() 
        
               dif = max_len - width 
        
               # '+ 1' to avoid too wide repr (GH PR #17023) 
        
               adj_dif = dif + 1 
        
               col_lens = Series([Series(ele).apply(len).max() for ele in strcols]) 
        
               n_cols = len(col_lens) 
        
               counter = 0 
        
               while adj_dif > 0 and n_cols > 1: 
        
                   counter += 1 
        
                   mid = round(n_cols / 2) 
        
                   mid_ix = col_lens.index[mid] 
        
                   col_len = col_lens[mid_ix] 
        
                   # adjoin adds one 
        
                   adj_dif -= col_len + 1 
        
                   col_lens = col_lens.drop(mid_ix) 
        
                   n_cols = len(col_lens) 
        
               # subtract index column 
        
               max_cols_fitted = n_cols - self.fmt.index 
        
               # GH-21180. Ensure that we print at least two. 
        
               max_cols_fitted = max(max_cols_fitted, 2) 
        
               self.fmt.max_cols_fitted = max_cols_fitted 
        
               # Call again _truncate to cut frame appropriately 
        
               # and then generate string representation 
        
               self.fmt.truncate() 
        
               strcols = self._get_strcols() 
        
               return self.adj.adjoin(1, *strcols)

billziss-gh · 2024-01-22T13:48:33Z

Minimal example that reproduces this problem with Pandas 2.1.4:

>>> import os, pandas
>>> os.get_terminal_size()
os.terminal_size(columns=93, lines=47)
>>> df=pandas.DataFrame({"Date": "2023-08-31 00:00:00-04:00", "Open": 187.839996, "High": 189
.119995, "Low": 187.479996, "Close": 187.869995, "Volume": 60735600, "Dividends": 0.0, "Stock
 Splits": 0.0}, index=[10769])
>>> df
                            Date        Open        High  ...    Volume  Dividends  Stock Spl
its
10769  2023-08-31 00:00:00-04:00  187.839996  189.119995  ...  60735600        0.0
0.0

[1 rows x 8 columns]

jbrockmendel added the Output-Formatting __repr__ of pandas objects, to_string label Mar 17, 2020

mroeschke added the Bug label May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataFrame output too wide / not truncated properly #32461

DataFrame output too wide / not truncated properly #32461

emsems commented Mar 5, 2020

INSTALLED VERSIONS

alexklapheke commented Jan 26, 2023

billziss-gh commented Jan 22, 2024

DataFrame output too wide / not truncated properly #32461

DataFrame output too wide / not truncated properly #32461

Comments

emsems commented Mar 5, 2020

Problem description

Code Sample

Expected Output

Output of pd.show_versions()

INSTALLED VERSIONS

alexklapheke commented Jan 26, 2023

billziss-gh commented Jan 22, 2024

Output of `pd.show_versions()`