You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR
- adds variable-length string support to the dataframe interchange protocol.
- modifies the protocol to return a dictionary containing all possible data buffers, rather than separate methods for retrieving data, validity, and offsets buffers, as discussed in consortium meetings.
- modifies the protocol to return both the buffers and their associated dtypes. This is necessary in order to interpret, e.g., the `offsets` and `mask` buffers.
- updates source code documentation (punctuation, typos, and spelling).
- manually copies string data to and from byte arrays, as it assumes that strings are stored as `object` dtype. The implementation will need to be updated to accommodate pandas' string extension dtype which is based on arrow. Currently, the string extension dtype is considered experimental and subject to change. The use of `object` dtype is still used as the default string dtype for backward compatiblity.
- uses a byte array mask for indicating missing values in string data buffers. This can be updated to use a bit array for space efficiency, at the cost of additional code complexity.
- currently does not handle the scenario where a non-pandas dataframe provided to `__dataframe__` uses a bit array to indicate missing values.
0 commit comments