diff --git a/protocol/conceptual_model_df_memory.png b/protocol/conceptual_model_df_memory.png
new file mode 100644
index 00000000..b5bbd310
Binary files /dev/null and b/protocol/conceptual_model_df_memory.png differ
diff --git a/protocol/conceptual_model_df_memory.svg b/protocol/conceptual_model_df_memory.svg
new file mode 100644
index 00000000..9081e42f
--- /dev/null
+++ b/protocol/conceptual_model_df_memory.svg
@@ -0,0 +1,627 @@
+
+
diff --git a/protocol/dataframe_protocol_summary.md b/protocol/dataframe_protocol_summary.md
new file mode 100644
index 00000000..41f1421a
--- /dev/null
+++ b/protocol/dataframe_protocol_summary.md
@@ -0,0 +1,371 @@
+# The `__dataframe__` protocol
+
+This document aims to describe the scope of the dataframe interchange protocol,
+as well as its essential design requirements/principles and the functionality
+it needs to support.
+
+
+## Purpose of `__dataframe__`
+
+The purpose of `__dataframe__` is to be a _data interchange_ protocol. I.e.,
+a way to convert one type of dataframe into another type (for example,
+convert a Koalas dataframe into a Pandas dataframe, or a cuDF dataframe into
+a Vaex dataframe).
+
+Currently (June 2020) there is no way to do this in an
+implementation-independent way.
+
+The main use case this protocol intends to enable is to make it possible to
+write code that can accept any type of dataframe instead of being tied to a
+single type of dataframe. To illustrate that:
+
+```python
+def somefunc(df, ...):
+ """`df` can be any dataframe supporting the protocol, rather than (say)
+ only a pandas.DataFrame"""
+ # could also be `cudf.from_dataframe(df)`, or `vaex.from_dataframe(df)`
+ # note: this should throw a TypeError if it cannot be done without a device
+ # transfer (e.g. move data from GPU to CPU) - add `force=True` in that case
+ new_pandas_df = pd.from_dataframe(df)
+ # From now on, use Pandas dataframe internally
+```
+
+### Non-goals
+
+Providing a _complete, standardized dataframe API_ is not a goal of the
+`__dataframe__` protocol. Instead, this is a goal of the full dataframe API
+standard, which the Consortium for Python Data API Standards aims to provide
+in the future. When that full API standard is implemented by dataframe
+libraries, the example above can change to:
+
+```python
+def get_df_module(df):
+ """Utility function to support programming against a dataframe API"""
+ if hasattr(df, '__dataframe_namespace__'):
+ # Retrieve the namespace
+ pdx = df.__dataframe_namespace__()
+ else:
+ # Here we can raise an exception if we only want to support compliant dataframes,
+ # or convert to our default choice of dataframe if we want to accept (e.g.) dicts
+ pdx = pd
+ df = pd.DataFrame(df)
+
+ return pdx, df
+
+
+def somefunc(df, ...):
+ """`df` can be any dataframe conforming to the dataframe API standard"""
+ pdx, df = get_df_module(df)
+ # From now on, use `df` methods and `pdx` functions/objects
+```
+
+
+### Constraints
+
+An important constraint on the `__dataframe__` protocol is that it should not
+make achieving the goal of the complete standardized dataframe API more
+difficult to achieve.
+
+There is a small concern here. Say that a library adopts `__dataframe__` first,
+and it goes from supporting only Pandas to officially supporting other
+dataframes like `modin.pandas.DataFrame`. At that point, changing to
+supporting the full dataframe API standard as a next step _implies a
+backwards compatibility break_ for users that now start relying on Modin
+dataframe support. E.g., the second transition will change from returning a
+Pandas dataframe from `somefunc(df_modin)` to returning a Modin dataframe
+later. It must be made very clear to libraries accepting `__dataframe__` that
+this is a consequence, and that that should be acceptable to them.
+
+
+### Progression / timeline
+
+- **Current status**: most dataframe-consuming libraries work _only_ with
+ Pandas, and rely on many Pandas-specific functions, methods and behavior.
+- **Status after `__dataframe__`**: with minor code changes (as in first
+ example above), libraries can start supporting all conforming dataframes,
+ convert them to Pandas dataframes, and still rely on the same
+ Pandas-specific functions, methods and behavior.
+- **Status after standard dataframe API adoption**: libraries can start
+ supporting all conforming dataframes _without converting to Pandas or
+ relying on its implementation details_. At this point, it's possible to
+ "program to an interface" rather than to a specific library like Pandas.
+
+
+## Conceptual model of a dataframe
+
+For a protocol to exchange dataframes between libraries, we need both a model
+of what we mean by "dataframe" conceptually for the purposes of the protocol,
+and a model of how the data is represented in memory:
+
+
+
+The smallest building blocks are **1-D arrays** (or "buffers"), which are
+contiguous in memory and contain data with the same dtype. A **column**
+consists of one or more 1-D arrays (if, e.g., missing data is represented with
+a boolean mask, that's a separate array). A **dataframe** contains one or more columns.
+A column or a dataframe can be "chunked"; a **chunk** is a subset of a column
+or dataframe that contains a set of (neighboring) rows.
+
+
+## Protocol design requirements
+
+1. Must be a standard Python-level API that is unambiguously specified, and
+ not rely on implementation details of any particular dataframe library.
+2. Must treat dataframes as an ordered collection of columns (which are
+ conceptually 1-D arrays with a dtype and missing data support).
+ _Note: this relates to the API for `__dataframe__`, and does not imply
+ that the underlying implementation must use columnar storage!_
+3. Must allow the consumer to select a specific set of columns for conversion.
+4. Must allow the consumer to access the following "metadata" of the dataframe:
+ number of rows, number of columns, column names, column data types.
+ _Note: this implies that a data type specification needs to be created._
+ _Note: column names are required; they must be strings and unique. If a
+ dataframe doesn't have them, dummy ones like `'0', '1', ...` can be used._
+5. Must include device support.
+6. Must avoid device transfers by default (e.g. copy data from GPU to CPU),
+ and provide an explicit way to force such transfers (e.g. a `force=` or
+ `copy=` keyword that the caller can set to `True`).
+7. Must be zero-copy wherever possible.
+8. Must support missing values (`NA`) for all supported dtypes.
+9. Must supports string, categorical and datetime dtypes.
+10. Must allow the consumer to inspect the representation for missing values
+ that the producer uses for each column or data type. Sentinel values,
+ bit masks, and boolean masks must be supported.
+ Must also be able to define whether the semantic meaning of `NaN` and
+ `NaT` is "not-a-number/datetime" or "missing".
+ _Rationale: this enables the consumer to control how conversion happens,
+ for example if the producer uses `-128` as a sentinel value in an `int8`
+ column while the consumer uses a separate bit mask, that information
+ allows the consumer to make this mapping._
+11. Must allow the producer to describe its memory layout in sufficient
+ detail. In particular, for missing data and data types that may have
+ multiple in-memory representations (e.g., categorical), those
+ representations must all be describable in order to let the consumer map
+ that to the representation it uses.
+ _Rationale: prescribing a single in-memory representation in this
+ protocol would lead to unnecessary copies being made if that represention
+ isn't the native one a library uses._
+ _Note: the memory layout is columnar. Row-major dataframes can use this
+ protocol, but not in a zero-copy fashion (see requirement 2 above)._
+12. Must support chunking, i.e. accessing the data in "batches" of rows.
+ There must be metadata the consumer can access to learn in how many
+ chunks the data is stored. The consumer may also convert the data in
+ more chunks than it is stored in, i.e. it can ask the producer to slice
+ its columns to shorter length. That request may not be such that it would
+ force the producer to concatenate data that is already stored in separate
+ chunks.
+ _Rationale: support for chunking is more efficient for libraries that
+ natively store chunks, and it is needed for dataframes that do not fit in
+ memory (e.g. dataframes stored on disk or lazily evaluated)._
+
+We'll also list some things that were discussed but are not requirements:
+
+1. Object dtype does not need to be supported
+2. Nested/structured dtypes within a single column does not need to be
+ supported.
+ _Rationale: not used a lot, additional design complexity not justified.
+ May be added in the future (does have support in the Arrow C Data
+ Interface). Also note that Arrow and NumPy structured dtypes have
+ different memory layouts, e.g. a `(float, int)` dtype would be stored as
+ two separate child arrays in Arrow and as a single `f0, i0, f1, i1, ...`
+ interleaved array in NumPy._
+3. Extension dtypes, i.e. a way to extend the set of dtypes that is
+ explicitly support, are out of scope.
+ _Rationale: complex to support, not used enough to justify that complexity._
+4. "virtual columns", i.e. columns for which the data is not yet in memory
+ because it uses lazy evaluation, are not supported other than through
+ letting the producer materialize the data in memory when the consumer
+ calls `__dataframe__`.
+ _Rationale: the full dataframe API will support this use case by
+ "programming to an interface"; this data interchange protocol is
+ fundamentally built around describing data in memory_.
+
+
+### To be decided
+
+_The connection between dataframe and array interchange protocols_. If we
+treat a dataframe as a set of columns which each are a set of 1-D arrays
+(there may be more than one in the case of using masks for missing data, or
+in the future for nested dtypes), it may be expected that there is a
+connection to be made with the array data interchange method. The array
+interchange is based on DLPack; its major limitation from the point of view
+of dataframes is the lack of support of all required data types (string,
+categorical, datetime) and missing data. A requirement could be added that
+`__dlpack__` should be supported in case the data types in a column are
+supported by DLPack. Missing data via a boolean mask as a separate array
+could also be supported.
+
+_Should there be a standard `from_dataframe` constructor function?_ This
+isn't completely necessary, however it's expected that a full dataframe API
+standard will have such a function. The array API standard also has such a
+function, namely `from_dlpack`. Adding at least a recommendation on syntax
+for this function makes sense, e.g., simply `from_dataframe(df)`.
+Discussion at https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685903651
+is relevant.
+
+
+## Frequently asked questions
+
+### Can the Arrow C Data Interface be used for this?
+
+What we are aiming for is quite similar to the Arrow C Data Interface (see
+the [rationale for the Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html#rationale)),
+except `__dataframe__` is a Python-level rather than C-level interface.
+The data types format specification of that interface is something that could
+be used unchanged.
+
+The main limitation is to be that it does not have device support
+-- `@kkraus14` will bring this up on the Arrow dev mailing list. Another
+identified issue is that the "deleter" on the Arrow C struct is present at the
+column level, and there are use cases for having it at the buffer level
+(mixed-device dataframes, more granular control over memory).
+
+Note that categoricals are supported, Arrow uses the phrasing
+"dictionary-encoded types" for categorical. Also, what it calls "array" means
+"column" in the terminology of this document (and every Python dataframe
+library).
+
+The Arrow C Data Interface says specifically it was inspired by [Python's
+buffer protocol](https://docs.python.org/3/c-api/buffer.html), which is also
+a C-only and CPU-only interface. See `__array_interface__` below for a
+Python-level equivalent of the buffer protocol.
+
+Note that specifying the precise semantics for implementers (both producing
+and consuming libraries) will be important. The Arrow C Data interface relies
+on providing a deletion / finalization method similar to DLPack. The desired
+semantics here need to be ironed out. See Arrow docs on
+[release callback semantics](https://arrow.apache.org/docs/format/CDataInterface.html#release-callback-semantics-for-consumers).
+
+
+### Is `__dataframe__` analogous to `__array__` or `__array_interface__`?
+
+Yes, it is fairly analogous to `__array_interface__`. There will be some
+differences though, for example `__array_interface__` doesn't know about
+devices, and it's a `dict` with a pointer to memory so there's an assumption
+that the data lives in CPU memory (which may not be true, e.g. in the case of
+cuDF or Vaex).
+
+It is _not_ analogous to `__array__`, which is NumPy-specific. `__array__` is a
+method attached to array/tensor-like objects, and calling it is requesting
+the object it's attached to to turn itself into a NumPy array. Hence, the
+library that implements `__array__` must depend (optionally at least) on
+NumPy, and call a NumPy `ndarray` constructor itself from within `__array__`.
+
+
+### What is wrong with `.to_numpy?` and `.to_arrow()`?
+
+Such methods ask the object it is attached to to turn itself into a NumPy or
+Arrow array. Which means each library must have at least an optional
+dependency on NumPy and on Arrow if it implements those methods. This leads
+to unnecessary coupling between libraries, and hence is a suboptimal choice -
+we'd like to avoid this if we can.
+
+Instead, it should be dataframe consumers that rely on NumPy or Arrow, since
+they are the ones that need such a particular format. So, it can call the
+constructor it needs. For example, `x = np.asarray(df['colname'])` (where
+`df` supports `__dataframe__`).
+
+
+### Does an interface describing memory work for virtual columns?
+
+Vaex is an example of a library that can have "virtual columns" (see `@maartenbreddels`
+[comment here](https://github.com/data-apis/dataframe-api/issues/29#issuecomment-686373569)).
+If the protocol includes a description of data layout in memory, does that
+work for such a virtual column?
+
+Yes. Virtual columns need to be materialized in memory before they can be
+turned into a column for a different type of dataframe - that will be true
+for every discussed form of the protocol; whether there's a `to_arrow()` or
+something else does not matter. Vaex can choose _how_ to materialize (e.g.,
+to an Arrow array, a NumPy array, or a raw memory buffer) - as long as the
+returned description of memory layout is valid, all those options can later
+be turned into the desired column format without a data copy, so the
+implementation choice here really doesn't matter much.
+
+_Note: the above statement on materialization assumes that there are many
+forms a virtual column can be implemented, and that those are all
+custom/different and that at this point it makes little sense to standardize
+that. For example, one could do this with a simple string DSL (`'col_C =
+col_A + col_B'`, with a fancier C++-style lazy evaluation, with a
+computational graph approach like Dask uses, etc.)._
+
+
+## Possible direction for implementation
+
+### Rough initial prototypes (historical)
+
+The `cuDFDataFrame`, `cuDFColumn` and `cuDFBuffer` sketched out by @kkraus14
+[here](https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685123386)
+looked like it was in the right direction.
+
+[This prototype](https://github.com/wesm/dataframe-protocol/pull/1) by Wes
+McKinney was the first attempt, and has some useful features.
+
+
+### Relevant existing protocols
+
+Here are the four most relevant existing protocols, and what requirements they support:
+
+| *supports* | buffer protocol | `__array_interface__` | DLPack | Arrow C Data Interface |
+|---------------------|:---------------:|:---------------------:|:------:|:----------------------:|
+| Python API | | Y | (1) | |
+| C API | Y | Y | Y | Y |
+| arrays | Y | Y | Y | Y |
+| dataframes | | | | |
+| chunking | | | | |
+| devices | | | Y | |
+| bool/int/uint/float | Y | Y | Y | Y |
+| missing data | (2) | (3) | (4) | Y |
+| string dtype | (4) | (4) | | Y |
+| datetime dtypes | | (5) | | Y |
+| categoricals | (6) | (6) | (7) | Y |
+
+1. The Python API is only an interface to call the C API under the hood, it
+ doesn't contain a description of how the data is laid out in memory.
+2. Can be done only via separate masks of boolean arrays.
+3. `__array_interface__` has a `mask` attribute, which is a separate boolean array also implementing the `__array_interface__` protocol.
+4. Only fixed-length strings as sequence of char or unicode.
+5. Only NumPy datetime and timedelta, not timezones. For the purpose of data
+ interchange, timezones could be represented separately in metadata if
+ desired.
+6. No explicit support, however categoricals can be mapped to either integers
+ or strings. Unclear how to communicate that information from producer to consumer.
+7. No explicit support, categoricals can only be mapped to integers.
+
+| *implementation* | buffer protocol | `__array_interface__` | DLPack | Arrow C Data Interface |
+|---------------------|:---------------:|:---------------------:|:---------:|:----------------------:|
+| refcounting | Y | Y | | |
+| call deleter | | | Y | Y |
+| supporting C code | in CPython | in NumPy | spec-only | spec-only |
+
+It is worth noting that for all four protocols, both dataframes and chunking
+can be easily layered on top. However only arrays, which are the only parts
+that have a specific memory layout, are explicitly specified in all those
+protocols. For Arrow this would be a little easier than for other protocols
+given the inclusion of "children" and "dictionary-encoded types", and indeed
+PyArrow already does provide such functionality.
+
+#### Data type descriptions
+
+There are multiple options for how to specify the dtype. The buffer protocol,
+NumPy and Arrow use format strings, DLPack uses enums. Furthermore dtype
+literals can be used for a Python API; this is what the array API standard
+does. Here are some examples:
+
+| *dtype* | buffer protocol | `__array_interface__` | DLPack | Arrow C Data Interface |
+|---------------------|:---------------:|:---------------------:|:---------:|:----------------------:|
+| int8 | `'=b'` | `'i1'` | `(0, 8)` | `'c'` |
+| int32 | `=i'` | `'i4'` | `(0, 32)` | `'i'` |
+| float64 | `'=d'` | `'` are denoting endianness; Arrow only supports native endianness.
+
+
+## References
+
+- [Python buffer protocol](https://docs.python.org/3/c-api/buffer.html)
+- [`__array_interface__` protocol](https://numpy.org/devdocs/reference/arrays.interface.html)
+- [Arrow C Data Interface](https://arrow.apache.org/docs/format/CDataInterface.html)
+- [DLPack](https://github.com/dmlc/dlpack)
+- [Array data interchange in API standard](https://data-apis.github.io/array-api/latest/design_topics/data_interchange.html)
diff --git a/protocol/images/dataframe_conceptual_model.png b/protocol/images/dataframe_conceptual_model.png
new file mode 100644
index 00000000..3643b389
Binary files /dev/null and b/protocol/images/dataframe_conceptual_model.png differ