@@ -41,15 +41,37 @@ So that a ``pandas.DataFrame`` can be faithfully reconstructed, we store a
41
41
'pandas_version': $VERSION}
42
42
43
43
Here, ``<c0> ``/``<ci0> `` and so forth are dictionaries containing the metadata
44
- for each column. This has JSON form:
44
+ for each column, * including the index columns * . This has JSON form:
45
45
46
46
.. code-block :: text
47
47
48
48
{'name': column_name,
49
+ 'field_name': parquet_column_name,
49
50
'pandas_type': pandas_type,
50
51
'numpy_type': numpy_type,
51
52
'metadata': metadata}
52
53
54
+ .. note ::
55
+
56
+ Every index column is stored with a name matching the pattern
57
+ ``__index_level_\d+__ `` and its corresponding column information is can be
58
+ found with the following code snippet.
59
+
60
+ Following this naming convention isn't strictly necessary, but strongly
61
+ suggested for compatibility with Arrow.
62
+
63
+ Here's an example of how the index metadata is structured in pyarrow:
64
+
65
+ .. code-block :: python
66
+
67
+ # assuming there's at least 3 levels in the index
68
+ index_columns = metadata[' index_columns' ]
69
+ columns = metadata[' columns' ]
70
+ ith_index = 2
71
+ assert index_columns[ith_index] == ' __index_level_2__'
72
+ ith_index_info = columns[- len (index_columns):][ith_index]
73
+ ith_index_level_name = ith_index_info[' name' ]
74
+
53
75
``pandas_type `` is the logical type of the column, and is one of:
54
76
55
77
* Boolean: ``'bool' ``
@@ -100,32 +122,39 @@ As an example of fully-formed metadata:
100
122
{'index_columns': ['__index_level_0__'],
101
123
'column_indexes': [
102
124
{'name': None,
103
- 'pandas_type': 'string',
125
+ 'field_name': 'None',
126
+ 'pandas_type': 'unicode',
104
127
'numpy_type': 'object',
105
- 'metadata': None }
128
+ 'metadata': {'encoding': 'UTF-8'} }
106
129
],
107
130
'columns': [
108
131
{'name': 'c0',
132
+ 'field_name': 'c0',
109
133
'pandas_type': 'int8',
110
134
'numpy_type': 'int8',
111
135
'metadata': None},
112
136
{'name': 'c1',
137
+ 'field_name': 'c1',
113
138
'pandas_type': 'bytes',
114
139
'numpy_type': 'object',
115
140
'metadata': None},
116
141
{'name': 'c2',
142
+ 'field_name': 'c2',
117
143
'pandas_type': 'categorical',
118
144
'numpy_type': 'int16',
119
145
'metadata': {'num_categories': 1000, 'ordered': False}},
120
146
{'name': 'c3',
147
+ 'field_name': 'c3',
121
148
'pandas_type': 'datetimetz',
122
149
'numpy_type': 'datetime64[ns]',
123
150
'metadata': {'timezone': 'America/Los_Angeles'}},
124
151
{'name': 'c4',
152
+ 'field_name': 'c4',
125
153
'pandas_type': 'object',
126
154
'numpy_type': 'object',
127
155
'metadata': {'encoding': 'pickle'}},
128
- {'name': '__index_level_0__',
156
+ {'name': None,
157
+ 'field_name': '__index_level_0__',
129
158
'pandas_type': 'int64',
130
159
'numpy_type': 'int64',
131
160
'metadata': None}
0 commit comments