-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Column-major DataFrames stored in HDFStore are returned as row-major #22073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I wonder if it has to deal with how we store the values. Tracing the code leads me somewhere to here: Lines 4163 to 4165 in f433061
|
I think this problem could even be more fundamental ... Pandas copy and groupby-sum aggregations (and maybe other operations) change the major-order on the underlying data of the returned object. This has a huge impact on aggregation performance. Pandas should not do that implicitly. |
This looks correct on master. I suppose it could use a test
|
are we looking for unit test or performance test here? |
Unit test |
take |
I think the core issue isn't actually solved. The only reason we now get back a proper column-major dataframe is because the Lines 3207 to 3212 in a0071f9
But at this point, before the |
Code Sample, a copy-pastable example if possible
Problem description
I ran across this when doing some benchmarking. This has some rather serious performance implications for large DataFrames. Is this the result of an underlying limitation in HDF5?
The text was updated successfully, but these errors were encountered: