-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
df.dtypes.values is not O(1) and repr(df) is therefore slow for large frames #5968
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
do you have a link to the dataset...can't seem to find mine |
ftp://ftp.fec.gov/FEC/Presidential_Map/2012/P00000001/P00000001-ALL.zip |
I believe that this is the problem. It is trying to see if their are floats in an object array. I would simply not do this at all
|
It's probably there to support the |
for an object dtype you could warn if it 'looks' like float, but otherwise skip it |
isn't |
It doesn't need to do this for values not displayed in the output. that's it. |
right! |
That's not where the bottleneck is.
aren't dtypes just a lookup? |
this issue address this, but it needs reworking to make it more internal as I havev indicated: #5740 |
Related (djeavu): 3cb6961, #2807 (comment) |
I have got a PR...give me a few |
There's an off chance this might be the cause of a lot of the slowdowns we saw in 0.13 |
anything with |
frame_get_dtype_counts | 0.1843 | 0.1113 | 1.6552 |
Unrelated in fact. |
@jreback: dsm->unutbu. Can't take credit for that one. :^) |
@dsm054 sorry....you are right!! morning confusion |
That only cuts it in half. Is this expected?
Isn't slicing supposed to be cheap? |
let me look |
easy enough....
|
2 secs -> 50 ms for repr(df). hellz yeah. |
For the FEC dataset, it takes about 1.5 sec to get a repr. and prun
puts it all in
infer_dtype
.The text was updated successfully, but these errors were encountered: