Skip to content

0.13.1: info(verbose=True) does not return non-null counts for large DataFrames #6940

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bluefir opened this issue Apr 23, 2014 · 7 comments
Closed
Labels
Duplicate Report Duplicate issue or pull request Output-Formatting __repr__ of pandas objects, to_string
Milestone

Comments

@bluefir
Copy link

bluefir commented Apr 23, 2014

>>>df.info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 9063396 entries, (2011-10-19 00:00:00, 000312) to (2014-03-31 00:00:00, Y8565N10)
Data columns (total 5 columns):
weight_benchmark    float64
weight_portfolio    float64
weight_active       float64
weight_msci_acwi    float64
investable          bool
dtypes: bool(1), float64(4)

>>>df.iloc[:1000].info(verbose=True)
<class 'pandas.core.frame.DataFrame'>
MultiIndex: 1000 entries, (2011-10-19 00:00:00, 000312) to (2011-10-19 00:00:00, 17275510)
Data columns (total 5 columns):
weight_benchmark    1000 non-null float64
weight_portfolio    1000 non-null float64
weight_active       1000 non-null float64
weight_msci_acwi    1000 non-null float64
investable          1000 non-null bool
dtypes: bool(1), float64(4)
@jreback jreback added this to the 0.14.0 milestone Apr 23, 2014
@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

thanks

noted (and being fixed) in #6939 / #6890

@jreback jreback closed this as completed Apr 23, 2014
@jorisvandenbossche
Copy link
Member

@bluefir Note that after it will be fixed (see the issue @jreback linked to), it will also not show the non-null counts for large dataframes, but the shorter summary (as you get for info(verbose=False)).
But you can always change the threshold with setting pd.options.display.max_info_rows/columns to higher values.

@jorisvandenbossche
Copy link
Member

Actually, what you reported was something else. The other issues are related but not the same (they are about the behaviour of max_info_columns. You have only a very large amount of rows. This is just a change in behaviour in 0.13.1 (see http://pandas.pydata.org/pandas-docs/stable/whatsnew.html#output-formatting-enhancements, the second point and #5974).

This was added because the non-null count for very large dataframes can be quite expensive. So dataframes with more rows than specified in pd.options.display.max_info_rows will only show the dtypes and not the counts. The default value for that is 1690785.

@jorisvandenbossche
Copy link
Member

@jreback Thinking, should there also be a keyword arg in info to determine if you want counts or not? (like for the verbose and max_columns)

@bluefir
Copy link
Author

bluefir commented Apr 23, 2014

@jorisvandenbossche Thanks a lot for the explanations. I missed that change in behavior.

@jreback
Copy link
Contributor

jreback commented Apr 23, 2014

@jorisvandenbossche yes..that makes sense..can you annotate on the other issue pls

@jorisvandenbossche
Copy link
Member

done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Duplicate Report Duplicate issue or pull request Output-Formatting __repr__ of pandas objects, to_string
Projects
None yet
Development

No branches or pull requests

3 participants