pandas-dev · jorisvandenbossche · Aug 8, 2019 · Mar 13, 2019 · Mar 13, 2019 · Mar 13, 2019
diff --git a/doc/source/development/developer.rst b/doc/source/development/developer.rst
@@ -37,10 +37,17 @@ So that a ``pandas.DataFrame`` can be faithfully reconstructed, we store a
 
 .. code-block:: text
 
-   {'index_columns': ['__index_level_0__', '__index_level_1__', ...],
+   {'index_columns': [<descr0>, <descr1>, ...],
     'column_indexes': [<ci0>, <ci1>, ..., <ciN>],
     'columns': [<c0>, <c1>, ...],
-    'pandas_version': $VERSION}
+    'pandas_version': $VERSION,
+    'creator': {
+      'library': $LIBRARY,
+      'version': $LIBRARY_VERSION
+    }}
+
+The "descriptor" values ``<descr0>`` in the ``'index_columns'`` field are
+dictionaries with values as described below.
 
 Here, ``<c0>``/``<ci0>`` and so forth are dictionaries containing the metadata
 for each column, *including the index columns*. This has JSON form:
@@ -53,26 +60,43 @@ for each column, *including the index columns*. This has JSON form:
     'numpy_type': numpy_type,
     'metadata': metadata}
 
-.. note::
+See below for the detailed specification for these
+
+Index Metadata Descriptors
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``RangeIndex`` can be stored as metadata only, not requiring serialization. The
+descriptor format for these as is follows:
+
+.. code-block:: python
+
+   index = pd.RangeIndex(0, 10, 2)
+   {'kind': 'range',
+    'name': index.name,
+    'start': index._start,
+    'stop': index._stop,
+    'step': index._step}
 
-   Every index column is stored with a name matching the pattern
-   ``__index_level_\d+__`` and its corresponding column information is can be
-   found with the following code snippet.
+Other index types must be serialized as data columns along with the other
+DataFrame columns. The metadata for these is a dict with ``kind`` field
+``'serialized'`` and ``'field_name'`` field indicating which data column
+contains the index data. For example,
 
-   Following this naming convention isn't strictly necessary, but strongly
-   suggested for compatibility with Arrow.
+.. code-block:: python
 
-   Here's an example of how the index metadata is structured in pyarrow:
+   {'kind': 'serialized',
+    'field_name': '__index_level_0__'}
 
-    .. code-block:: python
+Every index column is stored with a name matching the pattern
+``__index_level_\d+__``. Following this naming convention isn't strictly
+necessary, but strongly suggested for compatibility with Arrow and
+disambiguation. The ``'field_name'`` is the actual name of the column in the
+serialized Parquet table. If the ``Index`` has a non-None ``name`` attribute,
+then it can be found in the ``name`` field of the metadata for that serialized
+data column as described below.
 
-       # assuming there's at least 3 levels in the index
-       index_columns = metadata['index_columns']  # noqa: F821
-       columns = metadata['columns']  # noqa: F821
-       ith_index = 2
-       assert index_columns[ith_index] == '__index_level_2__'
-       ith_index_info = columns[-len(index_columns):][ith_index]
-       ith_index_level_name = ith_index_info['name']
+Column Metadata
+~~~~~~~~~~~~~~~
 
 ``pandas_type`` is the logical type of the column, and is one of:
 
@@ -121,7 +145,8 @@ As an example of fully-formed metadata:
 
 .. code-block:: text
 
-   {'index_columns': ['__index_level_0__'],
+   {'index_columns': [{'kind': 'serialized',
+                       'field_name': '__index_level_0__'}],
     'column_indexes': [
         {'name': None,
          'field_name': 'None',
@@ -161,4 +186,8 @@ As an example of fully-formed metadata:
          'numpy_type': 'int64',
          'metadata': None}
     ],
-    'pandas_version': '0.20.0'}
+    'pandas_version': '0.20.0',
+    'creator': {
+      'library': 'pyarrow',
+      'version': '0.13.0'
+    }}