Skip to content

Commit 5f278b3

Browse files
committed
Address review comments
1 parent c7575c1 commit 5f278b3

File tree

1 file changed

+7
-1
lines changed

1 file changed

+7
-1
lines changed

protocol/dataframe_protocol_summary.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,8 @@ this is a consequence, and that that should be acceptable to them.
100100
4. Must allow the consumer to access the following "metadata" of the dataframe:
101101
number of rows, number of columns, column names, column data types.
102102
_Note: this implies that a data type specification needs to be created._
103+
_Note: column names are required. If a dataframe doesn't have them, dummy
104+
ones like `'0', '1', ...` can be used._
103105
5. Must include device support.
104106
6. Must avoid device transfers by default (e.g. copy data from GPU to CPU),
105107
and provide an explicit way to force such transfers (e.g. a `force=` or
@@ -121,6 +123,8 @@ this is a consequence, and that that should be acceptable to them.
121123
_Rationale: prescribing a single in-memory representation in this
122124
protocol would lead to unnecessary copies being made if that represention
123125
isn't the native one a library uses._
126+
_Note: the memory layout is columnnar. Row-major dataframes can use this
127+
protocol, but not in a zero-copy fashion (see requirement 2 above)._
124128
12. Must support chunking, i.e. accessing the data in "batches" of rows.
125129
There must be metadata the consumer can access to learn in how many
126130
chunks the data is stored. The consumer may also convert the data in
@@ -148,7 +152,9 @@ We'll also list some things that were discussed but are not requirements:
148152
### To be decided
149153

150154
_The connection between dataframe and array interchange protocols_. If we
151-
treat a dataframe as a set of 1-D arrays, it may be expected that there is a
155+
treat a dataframe as a set of columns which each are a set of 1-D arrays
156+
(there may be more than one in the case of using masks for missing data, or
157+
in the future for nested dtypes), it may be expected that there is a
152158
connection to be made with the array data interchange method. The array
153159
interchange is based on DLPack; its major limitation from the point of view
154160
of dataframes is the lack of support of all required data types (string,

0 commit comments

Comments
 (0)