@@ -37,12 +37,19 @@ So that a ``pandas.DataFrame`` can be faithfully reconstructed, we store a
37
37
38
38
.. code-block :: text
39
39
40
- {'index_columns': ['__index_level_0__', '__index_level_1__' , ...],
40
+ {'index_columns': [<descr0>, <descr1> , ...],
41
41
'column_indexes': [<ci0>, <ci1>, ..., <ciN>],
42
42
'columns': [<c0>, <c1>, ...],
43
- 'pandas_version': $VERSION}
43
+ 'pandas_version': $VERSION,
44
+ 'creator': {
45
+ 'library': $LIBRARY,
46
+ 'version': $LIBRARY_VERSION
47
+ }}
44
48
45
- Here, ``<c0> ``/``<ci0> `` and so forth are dictionaries containing the metadata
49
+ The "descriptor" values ``<descr0> `` in the ``'index_columns' `` field are
50
+ strings (referring to a column) or dictionaries with values as described below.
51
+
52
+ The ``<c0> ``/``<ci0> `` and so forth are dictionaries containing the metadata
46
53
for each column, *including the index columns *. This has JSON form:
47
54
48
55
.. code-block :: text
@@ -53,26 +60,37 @@ for each column, *including the index columns*. This has JSON form:
53
60
'numpy_type': numpy_type,
54
61
'metadata': metadata}
55
62
56
- .. note ::
63
+ See below for the detailed specification for these.
64
+
65
+ Index Metadata Descriptors
66
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
67
+
68
+ ``RangeIndex `` can be stored as metadata only, not requiring serialization. The
69
+ descriptor format for these as is follows:
57
70
58
- Every index column is stored with a name matching the pattern
59
- ``__index_level_\d+__ `` and its corresponding column information is can be
60
- found with the following code snippet.
71
+ .. code-block :: python
61
72
62
- Following this naming convention isn't strictly necessary, but strongly
63
- suggested for compatibility with Arrow.
73
+ index = pd.RangeIndex(0 , 10 , 2 )
74
+ {' kind' : ' range' ,
75
+ ' name' : index.name,
76
+ ' start' : index.start,
77
+ ' stop' : index.stop,
78
+ ' step' : index.step}
64
79
65
- Here's an example of how the index metadata is structured in pyarrow:
80
+ Other index types must be serialized as data columns along with the other
81
+ DataFrame columns. The metadata for these is a string indicating the name of
82
+ the field in the data columns, for example ``'__index_level_0__' ``.
66
83
67
- .. code-block :: python
84
+ If an index has a non-None ``name `` attribute, and there is no other column
85
+ with a name matching that value, then the ``index.name `` value can be used as
86
+ the descriptor. Otherwise (for unnamed indexes and ones with names colliding
87
+ with other column names) a disambiguating name with pattern matching
88
+ ``__index_level_\d+__ `` should be used. In cases of named indexes as data
89
+ columns, ``name `` attribute is always stored in the column descriptors as
90
+ above.
68
91
69
- # assuming there's at least 3 levels in the index
70
- index_columns = metadata[' index_columns' ] # noqa: F821
71
- columns = metadata[' columns' ] # noqa: F821
72
- ith_index = 2
73
- assert index_columns[ith_index] == ' __index_level_2__'
74
- ith_index_info = columns[- len (index_columns):][ith_index]
75
- ith_index_level_name = ith_index_info[' name' ]
92
+ Column Metadata
93
+ ~~~~~~~~~~~~~~~
76
94
77
95
``pandas_type `` is the logical type of the column, and is one of:
78
96
@@ -161,4 +179,8 @@ As an example of fully-formed metadata:
161
179
'numpy_type': 'int64',
162
180
'metadata': None}
163
181
],
164
- 'pandas_version': '0.20.0'}
182
+ 'pandas_version': '0.20.0',
183
+ 'creator': {
184
+ 'library': 'pyarrow',
185
+ 'version': '0.13.0'
186
+ }}
0 commit comments