-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
C Functionality for Accessing DataFrame Values in Column Order #32343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hello @WillAyd! Thanks for opening this PR. We checked the lines you've touched for PEP 8 issues, and found:
|
goto LOOP_ERROR; | ||
} | ||
|
||
// TODO: We should actually be grabbing an ndarray reference here |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is rather verbose, I should point out that this comment might be the most helpful in understanding what is going on here (or at least what should be)
Is doing this in cython not an option? i guess including it in the json code would be troublesome? Do you envision this being used outside of the json code? e.g. core.internals? FWIW |
Truthfully didn't think much about that option, but maybe. I personally find this easier and think going the Cython route would still require a lot of C "bridging" (particularly to manage memory) though I'm probably biased by the amount of time I've spent in the JSON extension recently
For sure; would still have to evolve and wouldn't be something tackled at the outset, but I think giving third party developers a C API to the DataFrame could open up a lot of things |
Yah, ive never gotten this to work myself |
I'll reopen later when more polished |
This is very rough and was a little hesitant to even push here, but figured worth opening up to feedback.
The main motivation for this PR is to get block management out of the JSON serializer, but it could potentially be extended beyond that for external uses
The gist of this is that we can hold a struct
PdOrderedArrays
which gives the length of the DataFrame and sequential access to Numpy arrays that mirror what the user sees when dealing with a DataFrame, all by deconstructing the blocks that already exist down in some C functions