Skip to content

Commit a36109d

Browse files
committed
Deprecate cols in to_csv, to_excel, drop_duplicates, and duplicated. Use decorator. Update docs and unit tests. [fix pandas-dev#6645, fix#6680]
1 parent 7ffa655 commit a36109d

File tree

9 files changed

+201
-104
lines changed

9 files changed

+201
-104
lines changed

doc/source/comparison_with_r.rst

+4-4
Original file line numberDiff line numberDiff line change
@@ -171,7 +171,7 @@ In ``pandas`` we may use :meth:`~pandas.pivot_table` method to handle this:
171171
'player': random.sample(list(string.ascii_lowercase),25),
172172
'batting avg': np.random.uniform(.200, .400, 25)
173173
})
174-
baseball.pivot_table(values='batting avg', cols='team', aggfunc=np.max)
174+
baseball.pivot_table(values='batting avg', columns='team', aggfunc=np.max)
175175
176176
For more details and examples see :ref:`the reshaping documentation
177177
<reshaping.pivot>`.
@@ -402,8 +402,8 @@ In Python the best way is to make use of :meth:`~pandas.pivot_table`:
402402
'week': [1,2]*6
403403
})
404404
mdf = pd.melt(df, id_vars=['month', 'week'])
405-
pd.pivot_table(mdf, values='value', rows=['variable','week'],
406-
cols=['month'], aggfunc=np.mean)
405+
pd.pivot_table(mdf, values='value', index=['variable','week'],
406+
columns=['month'], aggfunc=np.mean)
407407
408408
Similarly for ``dcast`` which uses a data.frame called ``df`` in R to
409409
aggregate information based on ``Animal`` and ``FeedType``:
@@ -433,7 +433,7 @@ using :meth:`~pandas.pivot_table`:
433433
'Amount': [10, 7, 4, 2, 5, 6, 2],
434434
})
435435
436-
df.pivot_table(values='Amount', rows='Animal', cols='FeedType', aggfunc='sum')
436+
df.pivot_table(values='Amount', index='Animal', columns='FeedType', aggfunc='sum')
437437
438438
The second approach is to use the :meth:`~pandas.DataFrame.groupby` method:
439439

doc/source/release.rst

+15-5
Original file line numberDiff line numberDiff line change
@@ -129,11 +129,6 @@ API Changes
129129
``DataFrame.stack`` operations where the name of the column index is used as
130130
the name of the inserted column containing the pivoted data.
131131

132-
- The :func:`pivot_table`/:meth:`DataFrame.pivot_table` and :func:`crosstab` functions
133-
now take arguments ``index`` and ``columns`` instead of ``rows`` and ``cols``. A
134-
``FutureWarning`` is raised to alert that the old ``rows`` and ``cols`` arguments
135-
will not be supported in a future release (:issue:`5505`)
136-
137132
- Allow specification of a more complex groupby, via ``pd.Grouper`` (:issue:`3794`)
138133

139134
- A tuple passed to ``DataFame.sort_index`` will be interpreted as the levels of
@@ -149,6 +144,21 @@ API Changes
149144
Deprecations
150145
~~~~~~~~~~~~
151146

147+
- The :func:`pivot_table`/:meth:`DataFrame.pivot_table` and :func:`crosstab` functions
148+
now take arguments ``index`` and ``columns`` instead of ``rows`` and ``cols``. A
149+
``FutureWarning`` is raised to alert that the old ``rows`` and ``cols`` arguments
150+
will not be supported in a future release (:issue:`5505`)
151+
152+
- The :meth:`DataFrame.drop_duplicates` and :meth:`DataFrame.duplicated` methods
153+
now take argument ``subset`` instead of ``cols`` to better align with
154+
:meth:`DataFrame.dropna`. A ``FutureWarning`` is raised to alert that the old
155+
``cols`` arguments will not be supported in a future release (:issue:`6680`)
156+
157+
- The :meth:`DataFrame.to_csv` and :meth:`DataFrame.to_excel` functions
158+
now takes argument ``columns`` instead of ``cols``. A
159+
``FutureWarning`` is raised to alert that the old ``cols`` arguments
160+
will not be supported in a future release (:issue:`6645`)
161+
152162
Prior Version Deprecations/Changes
153163
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
154164

doc/source/reshaping.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -283,9 +283,9 @@ We can produce pivot tables from this data very easily:
283283

284284
.. ipython:: python
285285
286-
pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])
287-
pivot_table(df, values='D', rows=['B'], cols=['A', 'C'], aggfunc=np.sum)
288-
pivot_table(df, values=['D','E'], rows=['B'], cols=['A', 'C'], aggfunc=np.sum)
286+
pivot_table(df, values='D', index=['A', 'B'], columns=['C'])
287+
pivot_table(df, values='D', index=['B'], columns=['A', 'C'], aggfunc=np.sum)
288+
pivot_table(df, values=['D','E'], index=['B'], columns=['A', 'C'], aggfunc=np.sum)
289289
290290
The result object is a DataFrame having potentially hierarchical indexes on the
291291
rows and columns. If the ``values`` column name is not given, the pivot table
@@ -294,14 +294,14 @@ hierarchy in the columns:
294294

295295
.. ipython:: python
296296
297-
pivot_table(df, rows=['A', 'B'], cols=['C'])
297+
pivot_table(df, index=['A', 'B'], columns=['C'])
298298
299299
You can render a nice output of the table omitting the missing values by
300300
calling ``to_string`` if you wish:
301301

302302
.. ipython:: python
303303
304-
table = pivot_table(df, rows=['A', 'B'], cols=['C'])
304+
table = pivot_table(df, index=['A', 'B'], columns=['C'])
305305
print(table.to_string(na_rep=''))
306306
307307
Note that ``pivot_table`` is also available as an instance method on DataFrame.
@@ -315,8 +315,8 @@ unless an array of values and an aggregation function are passed.
315315

316316
It takes a number of arguments
317317

318-
- ``rows``: array-like, values to group by in the rows
319-
- ``cols``: array-like, values to group by in the columns
318+
- ``index``: array-like, values to group by in the rows
319+
- ``columns``: array-like, values to group by in the columns
320320
- ``values``: array-like, optional, array of values to aggregate according to
321321
the factors
322322
- ``aggfunc``: function, optional, If no values array is passed, computes a
@@ -350,7 +350,7 @@ rows and columns:
350350

351351
.. ipython:: python
352352
353-
df.pivot_table(rows=['A', 'B'], cols='C', margins=True, aggfunc=np.std)
353+
df.pivot_table(index=['A', 'B'], columns='C', margins=True, aggfunc=np.std)
354354
355355
.. _reshaping.tile:
356356

doc/source/v0.14.0.txt

+13-6
Original file line numberDiff line numberDiff line change
@@ -173,11 +173,6 @@ These are out-of-bounds selections
173173
# New output, 4-level MultiIndex
174174
df_multi.set_index([df_multi.index, df_multi.index])
175175

176-
- The :func:`pivot_table`/:meth:`DataFrame.pivot_table` and :func:`crosstab` functions
177-
now take arguments ``index`` and ``columns`` instead of ``rows`` and ``cols``. A
178-
``FutureWarning`` is raised to alert that the old ``rows`` and ``cols`` arguments
179-
will not be supported in a future release (:issue:`5505`)
180-
181176
- Following keywords are now acceptable for :meth:`DataFrame.plot(kind='bar')` and :meth:`DataFrame.plot(kind='barh')`.
182177
- `width`: Specify the bar width. In previous versions, static value 0.5 was passed to matplotlib and it cannot be overwritten.
183178
- `position`: Specify relative alignments for bar plot layout. From 0 (left/bottom-end) to 1(right/top-end). Default is 0.5 (center). (:issue:`6604`)
@@ -313,8 +308,20 @@ Therse are prior version deprecations that are taking effect as of 0.14.0.
313308

314309
Deprecations
315310
~~~~~~~~~~~~
311+
- The :func:`pivot_table`/:meth:`DataFrame.pivot_table` and :func:`crosstab` functions
312+
now take arguments ``index`` and ``columns`` instead of ``rows`` and ``cols``. A
313+
``FutureWarning`` is raised to alert that the old ``rows`` and ``cols`` arguments
314+
will not be supported in a future release (:issue:`5505`)
315+
316+
- The :meth:`DataFrame.drop_duplicates` and :meth:`DataFrame.duplicated` methods
317+
now take argument ``subset`` instead of ``cols`` to better align with
318+
:meth:`DataFrame.dropna`. A ``FutureWarning`` is raised to alert that the old
319+
``cols`` arguments will not be supported in a future release (:issue:`6680`)
316320

317-
There are no deprecations of prior behavior in 0.14.0
321+
- The :meth:`DataFrame.to_csv` and :meth:`DataFrame.to_excel` functions
322+
now takes argument ``columns`` instead of ``cols``. A
323+
``FutureWarning`` is raised to alert that the old ``cols`` arguments
324+
will not be supported in a future release (:issue:`6645`)
318325

319326
Enhancements
320327
~~~~~~~~~~~~

pandas/core/frame.py

+29-20
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,8 @@
4141
from pandas.compat import(range, zip, lrange, lmap, lzip, StringIO, u,
4242
OrderedDict, raise_with_traceback)
4343
from pandas import compat
44-
from pandas.util.decorators import deprecate, Appender, Substitution
44+
from pandas.util.decorators import deprecate, Appender, Substitution, \
45+
deprecate_kwarg
4546

4647
from pandas.tseries.period import PeriodIndex
4748
from pandas.tseries.index import DatetimeIndex
@@ -1067,8 +1068,9 @@ def to_panel(self):
10671068

10681069
to_wide = deprecate('to_wide', to_panel)
10691070

1071+
@deprecate_kwarg(old_arg_name='cols', new_arg_name='columns')
10701072
def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
1071-
cols=None, header=True, index=True, index_label=None,
1073+
columns=None, header=True, index=True, index_label=None,
10721074
mode='w', nanRep=None, encoding=None, quoting=None,
10731075
quotechar='"', line_terminator='\n', chunksize=None,
10741076
tupleize_cols=False, date_format=None, doublequote=True,
@@ -1086,7 +1088,7 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
10861088
Missing data representation
10871089
float_format : string, default None
10881090
Format string for floating point numbers
1089-
cols : sequence, optional
1091+
columns : sequence, optional
10901092
Columns to write
10911093
header : boolean or list of string, default True
10921094
Write out column names. If a list of string is given it is assumed
@@ -1124,6 +1126,7 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
11241126
or new (expanded format) if False)
11251127
date_format : string, default None
11261128
Format string for datetime objects
1129+
cols : kwarg only alias of columns [deprecated]
11271130
"""
11281131
if nanRep is not None: # pragma: no cover
11291132
warnings.warn("nanRep is deprecated, use na_rep",
@@ -1134,7 +1137,7 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
11341137
line_terminator=line_terminator,
11351138
sep=sep, encoding=encoding,
11361139
quoting=quoting, na_rep=na_rep,
1137-
float_format=float_format, cols=cols,
1140+
float_format=float_format, cols=columns,
11381141
header=header, index=index,
11391142
index_label=index_label, mode=mode,
11401143
chunksize=chunksize, quotechar=quotechar,
@@ -1148,8 +1151,9 @@ def to_csv(self, path_or_buf=None, sep=",", na_rep='', float_format=None,
11481151
if path_or_buf is None:
11491152
return formatter.path_or_buf.getvalue()
11501153

1154+
@deprecate_kwarg(old_arg_name='cols', new_arg_name='columns')
11511155
def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
1152-
float_format=None, cols=None, header=True, index=True,
1156+
float_format=None, columns=None, header=True, index=True,
11531157
index_label=None, startrow=0, startcol=0, engine=None,
11541158
merge_cells=True, encoding=None):
11551159
"""
@@ -1189,6 +1193,7 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
11891193
encoding: string, default None
11901194
encoding of the resulting excel file. Only necessary for xlwt,
11911195
other writers support unicode natively.
1196+
cols : kwarg only alias of columns [deprecated]
11921197
11931198
Notes
11941199
-----
@@ -1202,6 +1207,7 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
12021207
>>> writer.save()
12031208
"""
12041209
from pandas.io.excel import ExcelWriter
1210+
12051211
need_save = False
12061212
if encoding == None:
12071213
encoding = 'ascii'
@@ -1212,7 +1218,7 @@ def to_excel(self, excel_writer, sheet_name='Sheet1', na_rep='',
12121218

12131219
formatter = fmt.ExcelFormatter(self,
12141220
na_rep=na_rep,
1215-
cols=cols,
1221+
cols=columns,
12161222
header=header,
12171223
float_format=float_format,
12181224
index=index,
@@ -2439,27 +2445,28 @@ def dropna(self, axis=0, how='any', thresh=None, subset=None,
24392445
else:
24402446
return result
24412447

2442-
def drop_duplicates(self, cols=None, take_last=False, inplace=False):
2448+
@deprecate_kwarg(old_arg_name='cols', new_arg_name='subset')
2449+
def drop_duplicates(self, subset=None, take_last=False, inplace=False):
24432450
"""
24442451
Return DataFrame with duplicate rows removed, optionally only
24452452
considering certain columns
24462453
24472454
Parameters
24482455
----------
2449-
cols : column label or sequence of labels, optional
2456+
subset : column label or sequence of labels, optional
24502457
Only consider certain columns for identifying duplicates, by
24512458
default use all of the columns
24522459
take_last : boolean, default False
24532460
Take the last observed row in a row. Defaults to the first row
24542461
inplace : boolean, default False
24552462
Whether to drop duplicates in place or to return a copy
2463+
cols : kwargs only argument of subset [deprecated]
24562464
24572465
Returns
24582466
-------
24592467
deduplicated : DataFrame
24602468
"""
2461-
2462-
duplicated = self.duplicated(cols, take_last=take_last)
2469+
duplicated = self.duplicated(subset, take_last=take_last)
24632470

24642471
if inplace:
24652472
inds, = (-duplicated).nonzero()
@@ -2468,18 +2475,20 @@ def drop_duplicates(self, cols=None, take_last=False, inplace=False):
24682475
else:
24692476
return self[-duplicated]
24702477

2471-
def duplicated(self, cols=None, take_last=False):
2478+
@deprecate_kwarg(old_arg_name='cols', new_arg_name='subset')
2479+
def duplicated(self, subset=None, take_last=False):
24722480
"""
24732481
Return boolean Series denoting duplicate rows, optionally only
24742482
considering certain columns
24752483
24762484
Parameters
24772485
----------
2478-
cols : column label or sequence of labels, optional
2486+
subset : column label or sequence of labels, optional
24792487
Only consider certain columns for identifying duplicates, by
24802488
default use all of the columns
24812489
take_last : boolean, default False
24822490
Take the last observed row in a row. Defaults to the first row
2491+
cols : kwargs only argument of subset [deprecated]
24832492
24842493
Returns
24852494
-------
@@ -2491,19 +2500,19 @@ def _m8_to_i8(x):
24912500
return x.view(np.int64)
24922501
return x
24932502

2494-
if cols is None:
2503+
if subset is None:
24952504
values = list(_m8_to_i8(self.values.T))
24962505
else:
2497-
if np.iterable(cols) and not isinstance(cols, compat.string_types):
2498-
if isinstance(cols, tuple):
2499-
if cols in self.columns:
2500-
values = [self[cols].values]
2506+
if np.iterable(subset) and not isinstance(subset, compat.string_types):
2507+
if isinstance(subset, tuple):
2508+
if subset in self.columns:
2509+
values = [self[subset].values]
25012510
else:
2502-
values = [_m8_to_i8(self[x].values) for x in cols]
2511+
values = [_m8_to_i8(self[x].values) for x in subset]
25032512
else:
2504-
values = [_m8_to_i8(self[x].values) for x in cols]
2513+
values = [_m8_to_i8(self[x].values) for x in subset]
25052514
else:
2506-
values = [self[cols].values]
2515+
values = [self[subset].values]
25072516

25082517
keys = lib.fast_zip_fillna(values)
25092518
duplicated = lib.duplicated(keys, take_last=take_last)

0 commit comments

Comments
 (0)