Skip to content

DataFrame output too wide / not truncated properly #32461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
emsems opened this issue Mar 5, 2020 · 2 comments
Open

DataFrame output too wide / not truncated properly #32461

emsems opened this issue Mar 5, 2020 · 2 comments
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string

Comments

@emsems
Copy link

emsems commented Mar 5, 2020

Problem description

It seems to me, that DataFrames are not always correctly trunctaed to fit the terminal width (using pandas 1.0.1).
My pandas config is set such, that it auto detects terminal width and the representation of the DataFrame should fit in.

Relevant pandas config settings

display.width : int                                                           
    Width of the display in characters. In case python/IPython is running in  
    a terminal this can be set to None and pandas will correctly auto-detect  
    the width.                                                                
    Note that the IPython notebook, IPython qtconsole, or IDLE do not run in a
    terminal and hence it is not possible to correctly detect the width.      
    [default: 80] [currently: 80]                                             
display.max_columns : int                                                     
    If max_cols is exceeded, switch to truncate view. Depending on            
    `large_repr`, objects are either centrally truncated or printed as        
    a summary view. 'None' value means unlimited.                             
                                                                              
    In case python/IPython is running in a terminal and `large_repr`          
    equals 'truncate' this can be set to 0 and pandas will auto-detect        
    the width of the terminal and print a truncated object which fits         
    the screen width. The IPython notebook, IPython qtconsole, or IDLE        
    do not run in a terminal and hence it is not possible to do               
    correct auto-detection.                                                   
    [default: 0] [currently: 0]                                               
display.max_colwidth : int or None                                            
    The maximum width in characters of a column in the repr of                
    a pandas data structure. When the column overflows, a "..."               
    placeholder is embedded in the output. A 'None' value means unlimited.    
    [default: 50] [currently: 50]                                             
display.expand_frame_repr : boolean                                           
    Whether to print out the full DataFrame repr for wide DataFrames across   
    multiple lines, `max_columns` is still respected, but the output will     
    wrap-around across multiple "pages" if its width exceeds `display.width`. 
    [default: True] [currently: True]                                         
display.large_repr : 'truncate'/'info'                                        
    For DataFrames exceeding max_rows/max_cols, the repr (and HTML repr) can  
    show a truncated table (the default from 0.13), or switch to the view from
    df.info() (the behaviour in earlier versions of pandas).                  
    [default: truncate] [currently: truncate]                                 
display.column_space No description available.                                
    [default: 12] [currently: 12]                                             

Code Sample

Please resize terminal to have width 127 to reproduce
Check with shutil.get_terminal_size()

import pandas as pd
from io import StringIO
import shutil

# terminal width (in my case: columns=127, lines=40)
print('terminal width: {} characters'.format(shutil.get_terminal_size()[0]))

s = 'id,tstamp,00aaaaa,01a,02aaaaa,03a,04aaaaa,05,06aaaa,07aaaaaaaaaaaaa,08aaaaaa,09aaaaaaa,10aa,11aaaaaa,12aaaaaaaaa,13aaaaaaaa,14aaaaaaa,15aaaaaaaa,16a,17aaaaa,18aa,19aaaaaa,20,21,22aaaaaa,23aaaaaa,24aaaaa,25aa,26aaaaaa,27a,28aaaaa,29aaaa\r\n779491690,2019-02-01 00:00:02+00:00,,161.38538188324176,297.461393148902,,,,0.466667,False,False,3,,0.007,53.8849,,0.0323102,-0.4,,0.008,,0.0,,,,17.1,-1e-06,,0.024,,0.045,159.72512756708358\r\n779491691,2019-02-01 00:05:02+00:00,,162.2999981618803,299.5123814547798,,,,0.553571,True,False,3,,0.007,-85.1749,,0.0969305,-0.5,,0.008,,0.0,,,,17.0,-3e-06,,0.031,,0.049,160.6961983114413\r\n779491692,2019-02-01 00:10:02+00:00,,163.1754248277306,301.7498948568431,,,,0.530612,False,False,3,,0.007,,,-0.0646204,-0.4,,0.007,,0.0,,,,17.0,2e-06,,0.026,,0.049,161.6468595698913\r\n779491693,2019-02-01 00:15:02+00:00,,164.00520705009447,304.19960616946184,,,,0.466667,False,False,3,,0.007,,,-0.0323102,-0.4,,0.008,,0.0,,,,17.0,1e-06,,0.024,,0.045,162.5736942291059\r\n779491694,2019-02-01 00:20:02+00:00,,164.78185830034352,306.8905124087089,,,,0.511792,True,False,3,,0.007,128.438,,-0.0323102,-0.4,,0.008,,0.0,,,,17.0,1e-06,,0.031,,0.053,163.47261824257413\r\n'
f = StringIO()
f.write(s)
f.seek(0)
df = pd.read_csv(f, index_col=[0, 1])
print(df)

The lines of the string representation of the DataFrame are too long, therefore each line spans across two lines (depending on the terminal width; with the given settings it needs 129 characters instead of the available 127). To me this looks like a bug in the DataFrameFormatter. As I understand probably in the write_result method line 839ff
Could that be?

By the way, I also don't quite understand the display.expand_frame_repr setting. It is now set to True. When I set it to False the DataFrame does not get truncated but the full represenation is printed across multiple lines. Shouldn't that be excactly the other way round?

The following issue seems to be related, but I did not find the exact same problem. Hope I didn't miss anything: #16911

Expected Output

A DataFrame representation truncated such, that each line fits the terminal width.

Output of pd.show_versions()

INSTALLED VERSIONS

commit : None
python : 3.7.6.final.0
python-bits : 64
OS : Windows
OS-release : 10
machine : AMD64
processor : Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : None.None

pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.2.0.post20200209
Cython : 0.29.15
pytest : None
hypothesis : None
sphinx : 2.4.3
blosc : None
feather : 0.4.0
xlsxwriter : 1.2.8
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.11.1
IPython : 7.13.0
pandas_datareader: None
bs4 : None
bottleneck : 1.3.2
fastparquet : 0.3.3
gcsfs : None
lxml.etree : None
matplotlib : 3.2.0
numexpr : 2.7.1
odfpy : None
openpyxl : 3.0.3
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : 1.3.13
tables : 3.6.1
tabulate : None
xarray : 0.15.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.8
numba : 0.48.0

@jbrockmendel jbrockmendel added the Output-Formatting __repr__ of pandas objects, to_string label Mar 17, 2020
@mroeschke mroeschke added the Bug label May 16, 2020
@alexklapheke
Copy link

I'm having this problem too (can confirm it's still happening on main branch). The width of the DataFrame's repr exceeds terminal width by up to 4 characters. I think this function is not accounting for the added width of the ... that stands in for the missing columns:

def _fit_strcols_to_terminal_width(self, strcols: list[list[str]]) -> str:
from pandas import Series
lines = self.adj.adjoin(1, *strcols).split("\n")
max_len = Series(lines).str.len().max()
# plus truncate dot col
width, _ = get_terminal_size()
dif = max_len - width
# '+ 1' to avoid too wide repr (GH PR #17023)
adj_dif = dif + 1
col_lens = Series([Series(ele).apply(len).max() for ele in strcols])
n_cols = len(col_lens)
counter = 0
while adj_dif > 0 and n_cols > 1:
counter += 1
mid = round(n_cols / 2)
mid_ix = col_lens.index[mid]
col_len = col_lens[mid_ix]
# adjoin adds one
adj_dif -= col_len + 1
col_lens = col_lens.drop(mid_ix)
n_cols = len(col_lens)
# subtract index column
max_cols_fitted = n_cols - self.fmt.index
# GH-21180. Ensure that we print at least two.
max_cols_fitted = max(max_cols_fitted, 2)
self.fmt.max_cols_fitted = max_cols_fitted
# Call again _truncate to cut frame appropriately
# and then generate string representation
self.fmt.truncate()
strcols = self._get_strcols()
return self.adj.adjoin(1, *strcols)

@billziss-gh
Copy link

Minimal example that reproduces this problem with Pandas 2.1.4:

>>> import os, pandas
>>> os.get_terminal_size()
os.terminal_size(columns=93, lines=47)
>>> df=pandas.DataFrame({"Date": "2023-08-31 00:00:00-04:00", "Open": 187.839996, "High": 189
.119995, "Low": 187.479996, "Close": 187.869995, "Volume": 60735600, "Dividends": 0.0, "Stock
 Splits": 0.0}, index=[10769])
>>> df
                            Date        Open        High  ...    Volume  Dividends  Stock Spl
its
10769  2023-08-31 00:00:00-04:00  187.839996  189.119995  ...  60735600        0.0
0.0

[1 rows x 8 columns]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

5 participants