Skip to content

Commit a9a89f8

Browse files
committed
DOC: updated releasenotes, v0.11.1 whatsnew, io.rst
CLN: changed formatting option: multi_index_columns_compat -> tupleize_cols BUG: incorrectly writing sparse levels for the multi_index DOC: slight docs changes TST: added tests/fixes for dissallowed options in to_csv (cols=not None,index=False) TST: from_csv not accepting tupleize_cols ENH: allow index=False in to_csv with a multi_index column allow reading of a multi_index column with with index_col=None DOC: updates to examples in io.rst and v0.11.1.rst TST: disallow names, usecols, non-numeric in index_cols BUG: raise on too many rows in the header if multi_index of columns
1 parent b0dadc5 commit a9a89f8

File tree

9 files changed

+265
-100
lines changed

9 files changed

+265
-100
lines changed

RELEASE.rst

+15
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,15 @@ pandas 0.11.1
3434
courtesy of @cpcloud. (GH3477_)
3535
- Support for reading Amazon S3 files. (GH3504_)
3636
- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
37+
- Added support for writing in ``to_csv`` and reading in ``read_csv``,
38+
multi-index columns. The ``header`` option in ``read_csv`` now accepts a
39+
list of the rows from which to read the index. Added the option,
40+
``tupleize_cols`` to provide compatiblity for the pre 0.11.1 behavior of
41+
writing and reading multi-index columns via a list of tuples. The default in
42+
0.11.1 is to write lists of tuples and *not* interpret list of tuples as a
43+
multi-index column.
44+
Note: The default value will change in 0.12 to make the default *to* write and
45+
read multi-index columns in the new format. (GH3571_, GH1651_, GH3141_)
3746

3847
**Improvements to existing features**
3948

@@ -180,13 +189,19 @@ pandas 0.11.1
180189
.. _GH3596: https://github.com/pydata/pandas/issues/3596
181190
.. _GH3617: https://github.com/pydata/pandas/issues/3617
182191
.. _GH3435: https://github.com/pydata/pandas/issues/3435
192+
<<<<<<< HEAD
183193
.. _GH3611: https://github.com/pydata/pandas/issues/3611
184194
.. _GH3062: https://github.com/pydata/pandas/issues/3062
185195
.. _GH3624: https://github.com/pydata/pandas/issues/3624
186196
.. _GH3626: https://github.com/pydata/pandas/issues/3626
187197
.. _GH3601: https://github.com/pydata/pandas/issues/3601
188198
.. _GH3631: https://github.com/pydata/pandas/issues/3631
189199
.. _GH1512: https://github.com/pydata/pandas/issues/1512
200+
=======
201+
.. _GH3571: https://github.com/pydata/pandas/issues/3571
202+
.. _GH1651: https://github.com/pydata/pandas/issues/1651
203+
.. _GH3141: https://github.com/pydata/pandas/issues/3141
204+
>>>>>>> DOC: updated releasenotes, v0.11.1 whatsnew, io.rst
190205

191206

192207
pandas 0.11.0

doc/source/io.rst

+38-21
Original file line numberDiff line numberDiff line change
@@ -115,10 +115,10 @@ They can take a number of arguments:
115115
- ``error_bad_lines``: if False then any lines causing an error will be skipped :ref:`bad lines <io.bad_lines>`
116116
- ``usecols``: a subset of columns to return, results in much faster parsing
117117
time and lower memory usage.
118-
- ``mangle_dup_columns``: boolean, default True, then duplicate columns will be specified
118+
- ``mangle_dupe_cols``: boolean, default True, then duplicate columns will be specified
119119
as 'X.0'...'X.N', rather than 'X'...'X'
120-
- ``multi_index_columns_compat``: boolean, default False, leave a list of tuples on columns
121-
as is (default is to convert to a Multi Index on the columns)
120+
- ``tupleize_cols``: boolean, default True, if False, convert a list of tuples
121+
to a multi-index of columns, otherwise, leave the column index as a list of tuples
122122

123123
.. ipython:: python
124124
:suppress:
@@ -260,24 +260,6 @@ If the header is in a row other than the first, pass the row number to
260260
data = 'skip this skip it\na,b,c\n1,2,3\n4,5,6\n7,8,9'
261261
pd.read_csv(StringIO(data), header=1)
262262
263-
.. _io.multi_index_columns:
264-
265-
Specifying a multi-index columns
266-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
267-
268-
By specifying list of row locations for the ``header`` argument, you
269-
can read in a multi-index for the columns. Specifying non-consecutive
270-
rows will skip the interveaing rows. The ``index_col`` must also be
271-
specified.
272-
273-
.. ipython:: python
274-
275-
data = 'C0,C_l0_g0,C_l0_g1\nC1,C_l1_g0,C_l1_g1\nR0,,\nR_l0_g0,R0C0,R0C1\nR_l0_g1,R1C0,R1C1\nR_l0_g2,R2C0,R2C1\n'
276-
pd.read_csv(StringIO(data), header=[0,1], index_col=[0])
277-
278-
You can pass ``multi_index_columns_compat=True`` to preserve the pre-0.12 behavior of
279-
not converting a list of tuples in the columns to a Multi Index.
280-
281263
.. _io.usecols:
282264

283265
Filtering columns (``usecols``)
@@ -787,6 +769,36 @@ column numbers to turn multiple columns into a ``MultiIndex``:
787769
df
788770
df.ix[1978]
789771
772+
.. _io.multi_index_columns:
773+
774+
Specifying a multi-index columns
775+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
776+
777+
By specifying list of row locations for the ``header`` argument, you
778+
can read in a multi-index for the columns. Specifying non-consecutive
779+
rows will skip the interveaing rows.
780+
781+
.. ipython:: python
782+
783+
from pandas.util.testing import makeCustomDataframe as mkdf
784+
df = mkdf(5,3,r_idx_nlevels=2,c_idx_nlevels=4)
785+
df.to_csv('mi.csv',tupleize_cols=False)
786+
print open('mi.csv').read()
787+
pd.read_csv('mi.csv',header=[0,1,2,3],index_col=[0,1],tupleize_cols=False)
788+
789+
Note: The default behavior in 0.11.1 remains unchanged (``tupleize_cols=True``),
790+
but starting with 0.12, the default *to* write and read multi-index columns will be in the new
791+
format (``tupleize_cols=False``)
792+
793+
Note: If an ``index_col`` is not specified (e.g. you don't have an index, or wrote it
794+
with ``df.to_csv(..., index=False``), then any ``names`` on the columns index will be *lost*.
795+
796+
.. ipython:: python
797+
:suppress:
798+
799+
import os
800+
os.remove('mi.csv')
801+
790802
.. _io.sniff:
791803

792804
Automatically "sniffing" the delimiter
@@ -870,6 +882,8 @@ function takes a number of arguments. Only the first is required.
870882
- ``sep`` : Field delimiter for the output file (default ",")
871883
- ``encoding``: a string representing the encoding to use if the contents are
872884
non-ascii, for python versions prior to 3
885+
- ``tupleize_cols``: boolean, default True, if False, write as a list of tuples,
886+
otherwise write in an expanded line format suitable for ``read_csv``
873887

874888
Writing a formatted string
875889
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -901,6 +915,9 @@ The Series object also has a ``to_string`` method, but with only the ``buf``,
901915
which, if set to ``True``, will additionally output the length of the Series.
902916

903917

918+
HTML
919+
----
920+
904921
Reading HTML format
905922
~~~~~~~~~~~~~~~~~~~~~~
906923

doc/source/v0.11.1.txt

+37
Original file line numberDiff line numberDiff line change
@@ -73,13 +73,47 @@ Enhancements
7373
an index with a different frequency than the existing, or attempting
7474
to append an index with a different name than the existing
7575
- support datelike columns with a timezone as data_columns (GH2852_)
76+
7677
- ``fillna`` methods now raise a ``TypeError`` if the ``value`` parameter is
7778
a list or tuple.
7879
- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
7980
- ``DataFrame.replace()`` now allows regular expressions on contained
8081
``Series`` with object dtype. See the examples section in the regular docs
8182
:ref:`Replacing via String Expression <missing_data.replace_expression>`
8283

84+
- Multi-index column support for reading and writing csvs
85+
86+
- The ``header`` option in ``read_csv`` now accepts a
87+
list of the rows from which to read the index.
88+
89+
- The option, ``tupleize_cols`` can now be specified in both ``to_csv`` and
90+
``read_csv``, to provide compatiblity for the pre 0.11.1 behavior of
91+
writing and reading multi-index columns via a list of tuples. The default in
92+
0.11.1 is to write lists of tuples and *not* interpret list of tuples as a
93+
multi-index column.
94+
95+
Note: The default behavior in 0.11.1 remains unchanged, but starting with 0.12,
96+
the default *to* write and read multi-index columns will be in the new
97+
format. (GH3571_, GH1651_, GH3141_)
98+
99+
- If an ``index_col`` is not specified (e.g. you don't have an index, or wrote it
100+
with ``df.to_csv(..., index=False``), then any ``names`` on the columns index will
101+
be *lost*.
102+
103+
.. ipython:: python
104+
105+
from pandas.util.testing import makeCustomDataframe as mkdf
106+
df = mkdf(5,3,r_idx_nlevels=2,c_idx_nlevels=4)
107+
df.to_csv('mi.csv',tupleize_cols=False)
108+
print open('mi.csv').read()
109+
pd.read_csv('mi.csv',header=[0,1,2,3],index_col=[0,1],tupleize_cols=False)
110+
111+
.. ipython:: python
112+
:suppress:
113+
114+
import os
115+
os.remove('mi.csv')
116+
83117
See the `full release notes
84118
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
85119
on GitHub for a complete list.
@@ -96,3 +130,6 @@ on GitHub for a complete list.
96130
.. _GH1512: https://github.com/pydata/pandas/issues/1512
97131
.. _GH2285: https://github.com/pydata/pandas/issues/2285
98132
.. _GH3631: https://github.com/pydata/pandas/issues/3631
133+
.. _GH3571: https://github.com/pydata/pandas/issues/3571
134+
.. _GH1651: https://github.com/pydata/pandas/issues/1651
135+
.. _GH3141: https://github.com/pydata/pandas/issues/3141

pandas/core/format.py

+20-11
Original file line numberDiff line numberDiff line change
@@ -775,7 +775,7 @@ def __init__(self, obj, path_or_buf, sep=",", na_rep='', float_format=None,
775775
cols=None, header=True, index=True, index_label=None,
776776
mode='w', nanRep=None, encoding=None, quoting=None,
777777
line_terminator='\n', chunksize=None, engine=None,
778-
multi_index_columns_compat=False):
778+
tupleize_cols=True):
779779

780780
self.engine = engine # remove for 0.12
781781

@@ -804,7 +804,15 @@ def __init__(self, obj, path_or_buf, sep=",", na_rep='', float_format=None,
804804
msg= "columns.is_unique == False not supported with engine='python'"
805805
raise NotImplementedError(msg)
806806

807-
self.multi_index_columns_compat=multi_index_columns_compat
807+
self.tupleize_cols = tupleize_cols
808+
self.has_mi_columns = isinstance(obj.columns, MultiIndex
809+
) and not self.tupleize_cols
810+
811+
# validate mi options
812+
if self.has_mi_columns:
813+
if cols is not None:
814+
raise Exception("cannot specify cols with a multi_index on the columns")
815+
808816
if cols is not None:
809817
if isinstance(cols,Index):
810818
cols = cols.to_native_types(na_rep=na_rep,float_format=float_format)
@@ -960,9 +968,8 @@ def _save_header(self):
960968
obj = self.obj
961969
index_label = self.index_label
962970
cols = self.cols
971+
has_mi_columns = self.has_mi_columns
963972
header = self.header
964-
has_mi_columns = isinstance(obj.columns, MultiIndex
965-
) and not self.multi_index_columns_compat
966973
encoded_labels = []
967974

968975
has_aliases = isinstance(header, (tuple, list, np.ndarray))
@@ -1017,15 +1024,17 @@ def _save_header(self):
10171024
# write out the names for each level, then ALL of the values for each level
10181025
for i in range(columns.nlevels):
10191026

1020-
# name is the first column
1021-
col_line = [ columns.names[i] ]
1027+
# we need at least 1 index column to write our col names
1028+
col_line = []
1029+
if self.index:
1030+
1031+
# name is the first column
1032+
col_line.append( columns.names[i] )
10221033

1023-
# skipp len labels-1
1024-
if self.index and isinstance(index_label,list) and len(index_label)>1:
1025-
col_line.extend([ '' ] * (len(index_label)-1))
1034+
if isinstance(index_label,list) and len(index_label)>1:
1035+
col_line.extend([ '' ] * (len(index_label)-1))
10261036

1027-
for j in range(len(columns)):
1028-
col_line.append(columns.levels[i][j])
1037+
col_line.extend(columns.get_level_values(i))
10291038

10301039
writer.writerow(col_line)
10311040

pandas/core/frame.py

+9-6
Original file line numberDiff line numberDiff line change
@@ -1250,7 +1250,7 @@ def _from_arrays(cls, arrays, columns, index, dtype=None):
12501250

12511251
@classmethod
12521252
def from_csv(cls, path, header=0, sep=',', index_col=0,
1253-
parse_dates=True, encoding=None):
1253+
parse_dates=True, encoding=None, tupleize_cols=False):
12541254
"""
12551255
Read delimited file into DataFrame
12561256
@@ -1266,6 +1266,9 @@ def from_csv(cls, path, header=0, sep=',', index_col=0,
12661266
is used. Different default from read_table
12671267
parse_dates : boolean, default True
12681268
Parse dates. Different default from read_table
1269+
tupleize_cols : boolean, default True
1270+
write multi_index columns as a list of tuples (if True)
1271+
or new (expanded format) if False)
12691272
12701273
Notes
12711274
-----
@@ -1280,7 +1283,7 @@ def from_csv(cls, path, header=0, sep=',', index_col=0,
12801283
from pandas.io.parsers import read_table
12811284
return read_table(path, header=header, sep=sep,
12821285
parse_dates=parse_dates, index_col=index_col,
1283-
encoding=encoding)
1286+
encoding=encoding,tupleize_cols=False)
12841287

12851288
@classmethod
12861289
def from_dta(dta, path, parse_dates=True, convert_categoricals=True, encoding=None, index_col=None):
@@ -1392,7 +1395,7 @@ def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None,
13921395
cols=None, header=True, index=True, index_label=None,
13931396
mode='w', nanRep=None, encoding=None, quoting=None,
13941397
line_terminator='\n', chunksize=None,
1395-
multi_index_columns_compat=False, **kwds):
1398+
tupleize_cols=True, **kwds):
13961399
"""
13971400
Write DataFrame to a comma-separated values (csv) file
13981401
@@ -1430,9 +1433,9 @@ def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None,
14301433
quoting : optional constant from csv module
14311434
defaults to csv.QUOTE_MINIMAL
14321435
chunksize : rows to write at a time
1433-
multi_index_columns_compat : boolean, default False
1436+
tupleize_cols : boolean, default True
14341437
write multi_index columns as a list of tuples (if True)
1435-
or new (expanded format)m if False)
1438+
or new (expanded format) if False)
14361439
"""
14371440
if nanRep is not None: # pragma: no cover
14381441
import warnings
@@ -1450,7 +1453,7 @@ def to_csv(self, path_or_buf, sep=",", na_rep='', float_format=None,
14501453
header=header, index=index,
14511454
index_label=index_label,mode=mode,
14521455
chunksize=chunksize,engine=kwds.get("engine"),
1453-
multi_index_columns_compat=multi_index_columns_compat)
1456+
tupleize_cols=tupleize_cols)
14541457
formatter.save()
14551458

14561459
def to_excel(self, excel_writer, sheet_name='sheet1', na_rep='',

0 commit comments

Comments
 (0)