-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: regression in max_info_columns behaviour? #6939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
So it was added in #5974: rationale was Because now, if you have eg a dataframe with 1000 columns, |
This is the intended behaviour of that PR (as far as I understand):
|
ok so why don't we hijack max_info_columns to determine info_verbose rather than a separate parm? |
That was how it was in 0.13 and before, and that seems more logical to me personally. |
although it is still a little bit different. The columns argument sets the default for a threshold when using long and when short info output, while a EDIT: I suppose a user can then just set the |
@cpcloud OK with this? (you were also involved a bit in the info view rework in 0.13.1, not exactly this but the showing of the dtypes) |
|
Added two to do's at the top of the PR:
|
cc @bjonen you are working on this one? |
cc @sinhrks want to dig in on this? would be really helpful |
I can take this one. |
@TomAugspurger that would be awesome! (I only think its really necessary to address @jorisvandenbossche first point in 0.14.0; the rest can wait) - though I t hink their IS a kw for that now anyhow |
Just to make sure. Current behavior: In [9]: df = pd.DataFrame(np.random.randn(10, 5))
In [10]: pd.set_option('max_info_columns', 4)
In [11]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 5 columns):
0 float64
1 float64
2 float64
3 float64
4 float64
dtypes: float64(5) But that should be In [12]: df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Columns: 5 entries, 0 to 4
dtypes: float64(5)
In [13]: Which is the output you get from |
i thnk that is right, it closes #6568 with an automatic passing of no tests currently for either |
@jorisvandenbossche the change is fine by me |
There's going to be a bit of a conflict between the options and the In [1]: df = pd.DataFrame(np.random.randn(10, 5))
# This stuff is fine. It doesn't exceed the limit
In [3]: df.info(verbose=False)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Columns: 5 entries, 0 to 4
dtypes: float64(5)
In [4]: df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Data columns (total 5 columns):
0 10 non-null float64
1 10 non-null float64
2 10 non-null float64
3 10 non-null float64
4 10 non-null float64
dtypes: float64(5)
In [5]: pd.set_option('display.large_repr', 'info',
'display.max_info_columns', 4,
'display.max_columns', 2)
# Now we have too many cols, so we want to truncate the info repr:
In [6]: df
Out[6]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Columns: 5 entries, 0 to 4
dtypes: float64(5)
In [7]: df.info(verbose=False) # same
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Columns: 5 entries, 0 to 4
dtypes: float64(5)
In [8]: df.info(verbose=True) # No way to get all the columns
<class 'pandas.core.frame.DataFrame'>
Int64Index: 10 entries, 0 to 9
Columns: 5 entries, 0 to 4
dtypes: float64(5) Pretty much, should the verbose keyword override the |
yes, if a user actually specifies a 'local' keyword override the 'global' setting I think then you can see if the user actually passed anything |
Ok... I think that to implement this, I'll need to change the default of Cases
|
that looks right |
notebook showing this http://nbviewer.ipython.org/gist/TomAugspurger/58838d627194a9113f66 Not sure if it's intentional, but the html info_repr doesn't have the
at the top of the repr. |
hmm seems like they should be the same (except for the basic/table repr) |
Yeah they probably should be. It's the same way on master. I'll see if I can track it down |
@TomAugspurger About the conflict between the options and the verbose keyword to info, this conflict was already there in 0.13. But I agree with the proposed way to solve this (this is then even an improvement over 0.13, not only fixing the regression). Your overview of cases seem indeed right, only "don't print column count summary" is not fully correct, it should rather be "don't print summary (count/dtype) of all columns seperately", as whether the counts are shown or not is determined by Thanks for taking this up! |
@TomAugspurger progress on this? |
Update:
max_info_columns
should not behave the same asmax_info_rows
. Ifmax_info_columns
is exceeded, it should flip to short summary (asverbose=False
)show_counts
(or another name) to specify if you want to show the non-null counts (to be able to override themax_info_rows
option for a specific info call)When you have more columns than specified in
max_info_columns
,df.info()
will now still show all columns, but just without the information about the number of null values:while previously in 0.13 this gave the same behaviour as for
info(verbose=False)
:which seems much more logical to me.
I suppose this is related to #5682, which added the behaviour to also show the dtype per column.
Update: it was deliberately added in #5974
The text was updated successfully, but these errors were encountered: