@@ -100,6 +100,8 @@ this is a consequence, and that that should be acceptable to them.
100
100
4 . Must allow the consumer to access the following "metadata" of the dataframe:
101
101
number of rows, number of columns, column names, column data types.
102
102
_ Note: this implies that a data type specification needs to be created._
103
+ _ Note: column names are required. If a dataframe doesn't have them, dummy
104
+ ones like ` '0', '1', ... ` can be used._
103
105
5 . Must include device support.
104
106
6 . Must avoid device transfers by default (e.g. copy data from GPU to CPU),
105
107
and provide an explicit way to force such transfers (e.g. a ` force= ` or
@@ -121,6 +123,8 @@ this is a consequence, and that that should be acceptable to them.
121
123
_ Rationale: prescribing a single in-memory representation in this
122
124
protocol would lead to unnecessary copies being made if that represention
123
125
isn't the native one a library uses._
126
+ _ Note: the memory layout is columnnar. Row-major dataframes can use this
127
+ protocol, but not in a zero-copy fashion (see requirement 2 above)._
124
128
12 . Must support chunking, i.e. accessing the data in "batches" of rows.
125
129
There must be metadata the consumer can access to learn in how many
126
130
chunks the data is stored. The consumer may also convert the data in
@@ -148,7 +152,9 @@ We'll also list some things that were discussed but are not requirements:
148
152
### To be decided
149
153
150
154
_ The connection between dataframe and array interchange protocols_ . If we
151
- treat a dataframe as a set of 1-D arrays, it may be expected that there is a
155
+ treat a dataframe as a set of columns which each are a set of 1-D arrays
156
+ (there may be more than one in the case of using masks for missing data, or
157
+ in the future for nested dtypes), it may be expected that there is a
152
158
connection to be made with the array data interchange method. The array
153
159
interchange is based on DLPack; its major limitation from the point of view
154
160
of dataframes is the lack of support of all required data types (string,
0 commit comments