-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Weird console formatting defaults #22524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This is on git master |
Update: this seems to only be happening inside a Docker container since it's failing to detect the console dimensions. I guess the "headless IPython" use case is pretty unusual |
The workaround is to set |
Here's another one: -In [163]: info[:5]
-Out[163]:
- description group id \
-0 Cheese, caraway Dairy and Egg Products 1008
-1 Cheese, cheddar Dairy and Egg Products 1009
-2 Cheese, edam Dairy and Egg Products 1018
-3 Cheese, feta Dairy and Egg Products 1019
-4 Cheese, mozzarella, part skim milk Dairy and Egg Products 1028
- manufacturer
-0
-1
-2
-3
-4
+In [164]: info[:5]
+Out[164]:
+ description ... manufacturer
+0 Cheese, caraway ...
+1 Cheese, cheddar ...
+2 Cheese, edam ...
+3 Cheese, feta ...
+4 Cheese, mozzarella, part skim milk ... What is the option to do the old behavior (print the overflowing columns on the next line)? |
Appears I can make more of the problems going away with this (in case anyone stumbles on this thread later):
|
Another example of odd formatting even though the console width is set to 80 .....: columns='smoker', margins=True)
-Out[132]:
- size tip_pct
-smoker No Yes All No Yes All
-time day
-Dinner Fri 2.000000 2.222222 2.166667 0.139622 0.165347 0.158916
- Sat 2.555556 2.476190 2.517241 0.158048 0.147906 0.153152
- Sun 2.929825 2.578947 2.842105 0.160113 0.187250 0.166897
- Thur 2.000000 NaN 2.000000 0.159744 NaN 0.159744
-Lunch Fri 3.000000 1.833333 2.000000 0.187735 0.188937 0.188765
- Thur 2.500000 2.352941 2.459016 0.160311 0.163863 0.161301
-All 2.668874 2.408602 2.569672 0.159328 0.163196 0.160803</programlisting>
+Out[133]:
+ size ... tip_pct
+smoker No Yes ... Yes All
+time day ...
+Dinner Fri 2.000000 2.222222 ... 0.165347 0.158916
+ Sat 2.555556 2.476190 ... 0.147906 0.153152
+ Sun 2.929825 2.578947 ... 0.187250 0.166897
+ Thur 2.000000 NaN ... NaN 0.159744
+Lunch Fri 3.000000 1.833333 ... 0.188937 0.188765
+ Thur 2.500000 2.352941 ... 0.163863 0.161301
+All 2.668874 2.408602 ... 0.163196 0.160803
+[7 rows x 6 columns]</programlisting> |
Another example of a dataset that looks fine in 0.20.x and not in 0.23.x -In [74]: data
-Out[74]:
- user_id movie_id rating timestamp gender age occupation zip \
-0 1 1193 5 978300760 F 1 10 48067
-1 2 1193 5 978298413 M 56 16 70072
-2 12 1193 4 978220179 M 25 12 32793
-3 15 1193 4 978199279 M 25 7 22903
-4 17 1193 5 978158471 M 50 1 95350
-... ... ... ... ... ... ... ... ...
-1000204 5949 2198 5 958846401 M 18 17 47901
-1000205 5675 2703 3 976029116 M 35 14 30030
-1000206 5780 2845 1 958153068 M 18 17 92886
-1000207 5851 3607 5 957756608 F 18 20 55410
-1000208 5938 2909 4 957273353 M 25 1 35401
- title genres
-0 One Flew Over the Cuckoo's Nest (1975) Drama
-1 One Flew Over the Cuckoo's Nest (1975) Drama
-2 One Flew Over the Cuckoo's Nest (1975) Drama
-3 One Flew Over the Cuckoo's Nest (1975) Drama
-4 One Flew Over the Cuckoo's Nest (1975) Drama
-... ... ...
-1000204 Modulations (1998) Documentary
-1000205 Broken Vessels (1998) Drama
-1000206 White Boys (1999) Drama
-1000207 One Little Indian (1973) Comedy|Drama|Western
-1000208 Five Wives, Three Secretaries and Me (1998) Documentary
-[1000209 rows x 10 columns]
+ <programlisting language="python" format="linespecific">In [74]: data = pd.merge(pd.merge(ratings, users), movies)
-In [75]: data.iloc[0]
+In [75]: data
Out[75]:
+ user_id ... genres
+0 1 ... Drama
+1 2 ... Drama
+2 12 ... Drama
+3 15 ... Drama
+4 17 ... Drama
+... ... ... ...
+1000204 5949 ... Documentary
+1000205 5675 ... Drama
+1000206 5780 ... Drama
+1000207 5851 ... Comedy|Drama|Western
+1000208 5938 ... Documentary
+[1000209 rows x 10 columns] |
cc @jreback |
I can try to dig into this, but if there's a temporary workaround it would be helpful to know. This is the last item blocking me releasing accumulated errata fixes to be published |
Does doing |
Yes, doing that ( That said, I personally have had several times that the new truncating behaviour was somewhat annoying.
I don't think this should influence that |
FWIW I think the new behavior is worse. The old behavior is more consistent with what other projects (like R) do IIUC |
I also think the assertion that it is "better" or that the old behavior "was relatively difficult to read" is truly in the eye of the beholder / very subjective |
Just today a reader was confused by the default options |
I think to your point before this is a rather subjective call. From personal experience I found the new display options worse at first but changed my stance over time. Not sure how reasonable this is but does it make sense in the book somewhere to introduce the display options and maybe set the option to 20 globally to reduce the amount of changes? |
That's what I'm planning to do... In general, I would be hesitant to describe a subjective change such as this "better" -- it would be more accurate to say that the pandas developers' consensus was to change the formatting. Instead of saying that something is "difficult" or "easy" it would be better to say that "many users feel that..." or "some people have said that...". If other people feel differently than I do, I respect their opinion. Notice that I say "I think the behavior is worse" (my opinion) and not "The behavior is worse" |
I am personally often getting into specific situations where the new default style is IMO quite a bit worse than before. Eg compare this:
with this (the old default):
This might be quite specific to this use case (where the 'geometry' column is often a wide (truncated) column, so quickly gives a case where the columns don't fit on one line). I would also say that the current behaviour is a bug: the output is too wide for the specific console size I was doing this in (and this size was correctly detected), so you get an overflow, although there is a lot of unneeded whitespace between the two columns. |
@pandas-dev/pandas-core @cbrnr now the new default has been out there for a while, what are the opinions about it in general? Mostly an improvement? (it's quite possible that the specific cases I am working with like the one above are not very representative, and that in many other cases it is an improvement) |
I'd be in favor of reverting |
For me the new behavior has worked quite well (standard terminal on macOS). However, it seems like there are still some rough edges in the way the number of columns is detected (and maybe in some environments this is even impossible). All counter-examples where the new behavior is worse are actually cases where automatic detection is not working correctly (or so it seems), because lines shouldn't overflow and the available columns should be used as efficiently as possible. In cases where the number of columns cannot be detected, pandas should default to the old behavior of 20 columns. If headless IPython doesn't work and can be detected somehow, this case should be added to default to the old setting. @wesm I agree that this is a subjective decision, and it is especially bad that the new behavior doesn't work as expected in many cases. I still think that if the new behavior works, it is a more convenient overview of a data frame. Note that one motivation of this change is that this is the default behavior in the Tidyverse - so whereas you are right that base R does behave like the old setting, Tidyverse packages (tibble) behave like the new setting (except that it seems to work more consistently because there is really only one IDE that people are using, namely RStudio). I still think a |
Note: my example above is a case where the console width is correctly detected. But apparently there are cases where we generate a repr with the new default that is still wider than that.
This is of course limited to ipython-based environments (console, notebook), but you can already put that in a startup script. |
I just don't want to import pandas whenever I start IPython/Jupyter because I don't always use it.
100% agreed. If the old setting works better for some corner cases, then the new setting should either be fixed (meaning that the repr should never exceed the console width; it looks like this could be related to how the column widths are determined), or pandas should revert to the old setting. |
Other example that might be a bug, in a datacamp integrated console: But this one is very strange, as I don't see this locally, even if I reduce my terminal width to 80, it still shows 6 out of 10 columns (since that is what is fitting in 80 char width). |
I'm not sure. The behavior depends on In the Datacamp shell, |
Yep, that seems to be the case. +1 |
In addition, even if we switch back to 20 columns in such cases, we still need a value for the number of columns. Is this currently 80? This might also lead to problems even with the old behavior if we guess the wrong value. |
Normally, we fall back to |
Working on book updates and was surprised to see pandas do this:
before
after
Why is the middle column being hidden in a small 3x3 data frame?
The text was updated successfully, but these errors were encountered: