-
Notifications
You must be signed in to change notification settings - Fork 21
PR: Add allow_copy
flag to interchange protocol
#44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
To summarize the options for naming this keyword that were suggested yesterday:
I like the positive Regarding the default value (now
I believe that there will be enough cases that will always require a copy (object-strings, columns backed by non-contiguous arrays, non-columnar data storage like for Koalas and Ibis) that using |
allow_copy
flag to interchange protocol
Adding the following to the requirements docs:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some more testing and review, and found some issues. Also this PR is not against main
, so I'll close it. Resubmitted as gh-51.
|
||
``allow_copy`` is a keyword that defines if the given implementation | ||
is going to support striding buffers. It is optional, and the libraries | ||
do not need to implement it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrasing this a bit:
``allow_copy`` is a keyword that defines whether or not the library is
allowed to make a copy of the data. This can for example happen if a
library supports strided buffers (those would need a copy, because this
protocol specifies contiguous buffers).
|
||
``allow_copy`` is a keyword that defines if the given implementation | ||
is going to support striding buffers. It is optional, and the libraries | ||
do not need to implement it. Currently, if the flag is set to ``True`` it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"False
and a copy is needed"
# complexity for libraries that don't support it (e.g. Arrow), | ||
# but would help with numpy-based libraries like Pandas. | ||
raise RuntimeError("Design needs fixing - non-contiguous buffer") | ||
if not allow_copy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This removes the actual check for striding (not x.strides == (x.dtype.itemsize,):
), that needs to be put back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logic:
if not x.strides == (x.dtype.itemsize,):
# The protocol does not support strided buffers, so a copy is
# necessary. If that's not allowed, we need to raise an exception.
if allow_copy:
x = x.copy()
else:
raise RuntimeError("Exports cannot be zero-copy in the case "
"of a non-contiguous buffer")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And then adding a test case shows there's a problem - we don't hold on to the memory.
This PR adds a flag to throw an exception if the export cannot be zero-copy. (e.g. for pandas, possible due to block manager where rows are contiguous and columns are not) .
allow_zero_copy
flag to the DataFrame class.RuntimeError
when it is truetest_noncontiguous_columns