-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Slow printing of large data frames #2807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Set DataFrame._verbose_info to be False. This should be made configurable, or maybe even just disable the null-counts over 1 million rows |
@wesm For the configurable, could this be done by registering an option such as: # necessary imports
def check_verbose_info(x):
try:
bool(x)
except:
raise ValueError('invalid value for frame.verbose_info')
pd.config.register_option('frame.verbose_info', True, check_verbose_info) If so, where would this need to go? |
@cpcloud, have a look at |
With the latest release this is slow even after setting _verbose_info. |
that's a bug then (probably it is checking whether the dataframe is "too wide" before reaching that code). if no one fixes the bug before i have a chance i will get to it in the next couple of weeks |
The delay comes from the dtype count, which I intended to put of a fix for review but ended up pushing to master by mistake |
I changed the way the way get_dtype_counts works in 0.11. simple fix for this |
@cpcloud do u have a test example? |
@y-p saw your fix thanks.....this obviously shouldn't have been using as_blocks |
|
I'll call this closed. |
I'm always running the latest version :). @jreback You could do something like df = DataFrame(rand(1e7, 16))
%timeit repr(df) |
Interestingly, |
Would you guys still take a pull request for the config option of omitting non null info for frames with > 1e6 rows? |
Looks like the fix was only partial. |
I'm confused. maybe I'm just hitting memory pressure on my machine. |
before fix:
after fix:
@cpcloud , go ahead and open a pull request for an option setting a threshold for |
@y-p I can confirm that I get similar results to yours. |
I have a pull request ready minus tests. I'm not sure how to go about testing a display configuration option in any kind of non-kludgy way. My first thought was to assert that the string 'null' is not in the repr, but that seems like a very fragile way to test it. |
go for an assertion on line count |
Hmm. Does the thresholding option obviate the need for |
seems reasonable, just as long as the threshold logic alows an "infinite" value. |
BTW, is it okay to include pep8 cleanups in pull requests? I'm using flake8 + flymake to get the pep8 violations. |
If it's just around the area you touched that's fine. If it's the whole file and |
Okay. Sounds good. Thanks. |
half the fix is in master, and there's a pending PR #2918 for the other half of controlling verbose_info |
I have DataFrames with about 14 million rows and 16 columns and I have to wait at least 3-4 seconds for it to
repr
in an IPython session. Is there anything that can be done about this?The text was updated successfully, but these errors were encountered: