Skip to content

API: default for mangle_dup_cols is now False for read_csv. Fair warning in 0.12 (GH3612) #5010

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/io.rst
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ They can take a number of arguments:
- ``error_bad_lines``: if False then any lines causing an error will be skipped :ref:`bad lines <io.bad_lines>`
- ``usecols``: a subset of columns to return, results in much faster parsing
time and lower memory usage.
- ``mangle_dupe_cols``: boolean, default True, then duplicate columns will be specified
- ``mangle_dupe_cols``: boolean, default False, then duplicate columns will be specified
as 'X.0'...'X.N', rather than 'X'...'X'
- ``tupleize_cols``: boolean, default False, if False, convert a list of tuples
to a multi-index of columns, otherwise, leave the column index as a list of tuples
Expand Down
1 change: 1 addition & 0 deletions doc/source/release.rst
Original file line number Diff line number Diff line change
Expand Up @@ -246,6 +246,7 @@ API Changes
- Begin removing methods that don't make sense on ``GroupBy`` objects
(:issue:`4887`).
- Remove deprecated ``read_clipboard/to_clipboard/ExcelFile/ExcelWriter`` from ``pandas.io.parsers`` (:issue:`3717`)
- default for ``mangele_dup_cols`` is now ``False`` for ``read_csv``. Fair warning in 0.12 (:issue:`3612`)

Internal Refactoring
~~~~~~~~~~~~~~~~~~~~
Expand Down
8 changes: 8 additions & 0 deletions doc/source/v0.13.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,17 @@ API changes
df1 and df2
s1 and s2

Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

These were announced changes in 0.12 or prior that are taking effect as of 0.13.0

- Remove deprecated ``Factor`` (:issue:`3650`)
- Remove deprecated ``set_printoptions/reset_printoptions`` (:issue:``3046``)
- Remove deprecated ``_verbose_info`` (:issue:`3215`)
- Remove deprecated ``read_clipboard/to_clipboard/ExcelFile/ExcelWriter`` from ``pandas.io.parsers`` (:issue:`3717`)
- default for ``tupleize_cols`` is now ``False`` for both ``to_csv`` and ``read_csv``. Fair warning in 0.12 (:issue:`3604`)
- default for ``mangele_dup_cols`` is now ``False`` for ``read_csv``. Fair warning in 0.12 (:issue:`3612`)

Indexing API Changes
~~~~~~~~~~~~~~~~~~~~
Expand Down
8 changes: 4 additions & 4 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@
usecols : array-like
Return a subset of the columns.
Results in much faster parsing time and lower memory usage.
mangle_dupe_cols: boolean, default True
mangle_dupe_cols: boolean, default False
Duplicate columns will be specified as 'X.0'...'X.N', rather than 'X'...'X'
tupleize_cols: boolean, default False
Leave a list of tuples on columns as is (default is to convert to
Expand Down Expand Up @@ -245,7 +245,7 @@ def _read(filepath_or_buffer, kwds):
'encoding': None,
'squeeze': False,
'compression': None,
'mangle_dupe_cols': True,
'mangle_dupe_cols': False,
'tupleize_cols':False,
}

Expand Down Expand Up @@ -334,7 +334,7 @@ def parser_f(filepath_or_buffer,
verbose=False,
encoding=None,
squeeze=False,
mangle_dupe_cols=True,
mangle_dupe_cols=False,
tupleize_cols=False,
):

Expand Down Expand Up @@ -1260,7 +1260,7 @@ def __init__(self, f, **kwds):
self.skipinitialspace = kwds['skipinitialspace']
self.lineterminator = kwds['lineterminator']
self.quoting = kwds['quoting']
self.mangle_dupe_cols = kwds.get('mangle_dupe_cols',True)
self.mangle_dupe_cols = kwds.get('mangle_dupe_cols',False)

self.has_index_names = False
if 'has_index_names' in kwds:
Expand Down
4 changes: 0 additions & 4 deletions pandas/io/tests/test_parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -804,10 +804,6 @@ def test_duplicate_columns(self):
6,7,8,9,10
11,12,13,14,15
"""
# check default beahviour
df = self.read_table(StringIO(data), sep=',',engine=engine)
self.assertEqual(list(df.columns), ['A', 'A.1', 'B', 'B.1', 'B.2'])

df = self.read_table(StringIO(data), sep=',',engine=engine,mangle_dupe_cols=False)
self.assertEqual(list(df.columns), ['A', 'A', 'B', 'B', 'B'])

Expand Down
2 changes: 1 addition & 1 deletion pandas/parser.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -309,7 +309,7 @@ cdef class TextReader:
skiprows=None,
skip_footer=0,
verbose=False,
mangle_dupe_cols=True,
mangle_dupe_cols=False,
tupleize_cols=False):

self.parser = parser_new()
Expand Down