Skip to content

pd.io.formats.format.DataFrameFormatter issue: columns not truncating properly #16911

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bsolomon1124 opened this issue Jul 13, 2017 · 9 comments
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string

Comments

@bsolomon1124
Copy link

bsolomon1124 commented Jul 13, 2017

In several IDEs, DataFrames with long column names (not necessarily a large number of columns) do not seem to truncate properly. I originally posted this in SO thinking it was a pd.set_option that I had ignored, but one answer pointed out that it may be an issue of pd.io.formats.format.DataFrameFormatter checking max_cols against the number of columns, not the total width of the columns, in deciding whether to truncate.

Problem description

I would like to keep
pd.set_option('expand_frame_repr', False)
but still truncate the view of DataFrames as shown in "Expected Output" below. I've noticed that this seems to be dependent on the length of columns rather than number of columns. For instance, this df displays in a readable way:

df = pd.DataFrame(np.random.randn(1000, 1000),
                  columns=['col' + str(i) for i in range(1000)])

but this one is unreadable:

df.add_prefix('really_long_column_name')

producing the "Messy output' below. For users wanting to keep pd.set_option('expand_frame_repr', False) but still have a truncated view, shouldn't pd.io.formats.format.DataFrameFormatter check the total length of all columns? (or somehow consider the effect of both column width and number of columns)

Messy output
https://i.stack.imgur.com/yzZUI.png

Expected output
https://i.stack.imgur.com/arvRm.png

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.6.0.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.20.1
nose: 1.3.7
pip: 9.0.1
setuptools: 27.2.0
Cython: 0.25.2
numpy: 1.11.3
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.5.1
patsy: 0.4.1
dateutil: 2.6.0
pytz: 2016.10
blosc: None
bottleneck: 1.2.0
tables: 3.2.2
numexpr: 2.6.1
matplotlib: 2.0.0
openpyxl: 2.4.1
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.6
lxml: 3.7.2
bs4: 4.5.3
html5lib: 0.9999999
httplib2: None
apiclient: None
sqlalchemy: 1.1.5
pymysql: None
psycopg2: None
jinja2: 2.9.4
boto: 2.45.0
pandas_datareader: 0.4.0

@gfyoung gfyoung added Output-Formatting __repr__ of pandas objects, to_string Usage Question labels Jul 14, 2017
@gfyoung
Copy link
Member

gfyoung commented Jul 14, 2017

@bsolomon1 : I notice that you're using 0.19.2, and the code for that section pointed out in your SO post has been expanded since then (see current implementation here).

Try installing 0.20.3 first and see if that fixes your issue. If not, try installing master. Regardless, your question seems pretty reasonable, but as I don't use DataFrame printing all too often, I would want to defer to others who have used it more than I have.

@bsolomon1124
Copy link
Author

thanks for pointing out, I'm in 0.19 for compatibility with some other packages, but I've tried updating manually to 0.20 and had same issue.

@gfyoung
Copy link
Member

gfyoung commented Jul 14, 2017

Okay, good to know. Could you update your pandas version in the issue? I would wait for a day or two (or maybe until Monday since it's the weekend) to see if there's any other feedback. Otherwise, you are more than welcome to give this a shot at implementing if you're interested.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2017

this is a duplicate issue IIRC. if someone would have a look.

@gfyoung
Copy link
Member

gfyoung commented Jul 14, 2017

Is it #7059? We can close that and then move forward with this one then.

@jreback
Copy link
Contributor

jreback commented Jul 14, 2017

it looks similar. @bsolomon1 if you'd have a look as well.

@bsolomon1124
Copy link
Author

Sorry I am late getting back to this @jreback. I'm working in 0.20.3 and still seeing the same issue. (See the second link ("messy output") in my initial comment.) I do agree this is similar to 7059 and that they are talking about the same bug , and I think either an elipsis in the middle, or at the end of, long col names would be a nice solution.

@jreback
Copy link
Contributor

jreback commented Jul 24, 2017

I'm working in 0.20.3 and still seeing the same issue.

why would you think this is fixed? this is an open issue.

@bsolomon1124
Copy link
Author

@jreback I was just addressing @gfyoung 's comment. ("Try installing 0.20.3 first and see if that fixes your issue" from early in the thread.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

4 participants