Skip to content

Commit d4cab87

Browse files
author
Brendan Boerner
committed
2 parents 5c9af20 + 0ecb4cb commit d4cab87

18 files changed

+568
-153
lines changed

doc/source/api.rst

+4
Original file line numberDiff line numberDiff line change
@@ -374,6 +374,8 @@ Reindexing / Selection / Label manipulation
374374

375375
Series.align
376376
Series.drop
377+
Series.drop_duplicates
378+
Series.duplicated
377379
Series.equals
378380
Series.first
379381
Series.head
@@ -1165,6 +1167,8 @@ Modifying and Computations
11651167
Index.diff
11661168
Index.sym_diff
11671169
Index.drop
1170+
Index.drop_duplicates
1171+
Index.duplicated
11681172
Index.equals
11691173
Index.factorize
11701174
Index.identical

doc/source/indexing.rst

+1
Original file line numberDiff line numberDiff line change
@@ -1476,6 +1476,7 @@ You can control the action of a chained assignment via the option ``mode.chained
14761476
which can take the values ``['raise','warn',None]``, where showing a warning is the default.
14771477
14781478
.. ipython:: python
1479+
:okwarning:
14791480
14801481
dfb = DataFrame({'a' : ['one', 'one', 'two',
14811482
'three', 'two', 'one', 'six'],

doc/source/io.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -2722,7 +2722,7 @@ The default is 50,000 rows returned in a chunk.
27222722

27232723
.. code-block:: python
27242724
2725-
for df in read_hdf('store.h5','df', chunsize=3):
2725+
for df in read_hdf('store.h5','df', chunksize=3):
27262726
print(df)
27272727
27282728
Note, that the chunksize keyword applies to the **source** rows. So if you

doc/source/reshaping.rst

+11-9
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,15 @@ unstacks the **last level**:
151151
stacked.unstack(1)
152152
stacked.unstack(0)
153153
154+
.. _reshaping.unstack_by_name:
155+
156+
If the indexes have names, you can use the level names instead of specifying
157+
the level numbers:
158+
159+
.. ipython:: python
160+
161+
stacked.unstack('second')
162+
154163
Notice that the ``stack`` and ``unstack`` methods implicitly sort the index
155164
levels involved. Hence a call to ``stack`` and then ``unstack``, or viceversa,
156165
will result in a **sorted** copy of the original DataFrame or Series:
@@ -165,15 +174,6 @@ will result in a **sorted** copy of the original DataFrame or Series:
165174
while the above code will raise a ``TypeError`` if the call to ``sort`` is
166175
removed.
167176

168-
.. _reshaping.unstack_by_name:
169-
170-
If the indexes have names, you can use the level names instead of specifying
171-
the level numbers:
172-
173-
.. ipython:: python
174-
175-
stacked.unstack('second')
176-
177177
.. _reshaping.stack_multiple:
178178

179179
Multiple Levels
@@ -218,6 +218,8 @@ calling ``sortlevel``, of course). Here is a more complex example:
218218
columns = MultiIndex.from_tuples([('A', 'cat'), ('B', 'dog'),
219219
('B', 'cat'), ('A', 'dog')],
220220
names=['exp', 'animal'])
221+
index = MultiIndex.from_product([('bar', 'baz', 'foo', 'qux'), ('one', 'two')],
222+
names=['first', 'second'])
221223
df = DataFrame(randn(8, 4), index=index, columns=columns)
222224
df2 = df.ix[[0, 1, 2, 4, 5, 7]]
223225
df2

doc/source/v0.15.0.txt

+12-2
Original file line numberDiff line numberDiff line change
@@ -223,6 +223,15 @@ API changes
223223
s
224224
s.loc[['D']]
225225

226+
- ``Index`` now supports ``duplicated`` and ``drop_duplicates``. (:issue:`4060`)
227+
228+
.. ipython:: python
229+
230+
idx = Index([1, 2, 3, 4, 1, 2])
231+
idx
232+
idx.duplicated()
233+
idx.drop_duplicates()
234+
226235
.. _whatsnew_0150.dt:
227236

228237
.dt accessor
@@ -437,6 +446,7 @@ There are no experimental changes in 0.15.0
437446

438447
Bug Fixes
439448
~~~~~~~~~
449+
- Bug in multiindexes dtypes getting mixed up when DataFrame is saved to SQL table (:issue:`8021`)
440450
- Bug in Series 0-division with a float and integer operand dtypes (:issue:`7785`)
441451
- Bug in ``Series.astype("unicode")`` not calling ``unicode`` on the values correctly (:issue:`7758`)
442452
- Bug in ``DataFrame.as_matrix()`` with mixed ``datetime64[ns]`` and ``timedelta64[ns]`` dtypes (:issue:`7778`)
@@ -454,7 +464,7 @@ Bug Fixes
454464
- Bug in pickle deserialization that failed for pre-0.14.1 containers with dup items trying to avoid ambiguity
455465
when matching block and manager items, when there's only one block there's no ambiguity (:issue:`7794`)
456466

457-
467+
- Bug in HDFStore iteration when passing a where (:issue:`8014`)
458468

459469
- Bug in repeated timeseries line and area plot may result in ``ValueError`` or incorrect kind (:issue:`7733`)
460470

@@ -516,7 +526,7 @@ Bug Fixes
516526
times were returned when crossing DST boundaries (:issue:`7835`, :issue:`7901`).
517527

518528

519-
529+
- Bug in area plot draws legend with incorrect ``alpha`` when ``stacked=True`` (:issue:`8027`)
520530

521531
- ``Period`` and ``PeriodIndex`` addition/subtraction with ``np.timedelta64`` results in incorrect internal representations (:issue:`7740`)
522532

pandas/core/base.py

+60-1
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,14 @@
88
from pandas.core import common as com
99
import pandas.core.nanops as nanops
1010
import pandas.tslib as tslib
11+
import pandas.lib as lib
1112
from pandas.util.decorators import Appender, cache_readonly
1213

14+
15+
_shared_docs = dict()
16+
_indexops_doc_kwargs = dict(klass='IndexOpsMixin', inplace='')
17+
18+
1319
class StringMixin(object):
1420

1521
"""implements string methods so long as object defines a `__unicode__`
@@ -474,12 +480,66 @@ def searchsorted(self, key, side='left'):
474480
#### needs tests/doc-string
475481
return self.values.searchsorted(key, side=side)
476482

483+
_shared_docs['drop_duplicates'] = (
484+
"""Return %(klass)s with duplicate values removed
485+
486+
Parameters
487+
----------
488+
take_last : boolean, default False
489+
Take the last observed index in a group. Default first
490+
%(inplace)s
491+
492+
Returns
493+
-------
494+
deduplicated : %(klass)s
495+
""")
496+
497+
@Appender(_shared_docs['drop_duplicates'] % _indexops_doc_kwargs)
498+
def drop_duplicates(self, take_last=False, inplace=False):
499+
duplicated = self.duplicated(take_last=take_last)
500+
result = self[~duplicated.values]
501+
if inplace:
502+
return self._update_inplace(result)
503+
else:
504+
return result
505+
506+
_shared_docs['duplicated'] = (
507+
"""Return boolean %(klass)s denoting duplicate values
508+
509+
Parameters
510+
----------
511+
take_last : boolean, default False
512+
Take the last observed index in a group. Default first
513+
514+
Returns
515+
-------
516+
duplicated : %(klass)s
517+
""")
518+
519+
@Appender(_shared_docs['duplicated'] % _indexops_doc_kwargs)
520+
def duplicated(self, take_last=False):
521+
keys = com._ensure_object(self.values)
522+
duplicated = lib.duplicated(keys, take_last=take_last)
523+
try:
524+
return self._constructor(duplicated,
525+
index=self.index).__finalize__(self)
526+
except AttributeError:
527+
from pandas.core.index import Index
528+
return Index(duplicated)
529+
477530
#----------------------------------------------------------------------
478531
# unbox reductions
479532

480533
all = _unbox(np.ndarray.all)
481534
any = _unbox(np.ndarray.any)
482535

536+
#----------------------------------------------------------------------
537+
# abstracts
538+
539+
def _update_inplace(self, result):
540+
raise NotImplementedError
541+
542+
483543
class DatetimeIndexOpsMixin(object):
484544
""" common ops mixin to support a unified inteface datetimelike Index """
485545

@@ -497,7 +557,6 @@ def _box_values(self, values):
497557
"""
498558
apply box func to passed values
499559
"""
500-
import pandas.lib as lib
501560
return lib.map_infer(values, self._box_func)
502561

503562
@cache_readonly

pandas/core/index.py

+16-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@
1212
import pandas.algos as _algos
1313
import pandas.index as _index
1414
from pandas.lib import Timestamp, is_datetime_array
15-
from pandas.core.base import PandasObject, FrozenList, FrozenNDArray, IndexOpsMixin
15+
from pandas.core.base import PandasObject, FrozenList, FrozenNDArray, IndexOpsMixin, _shared_docs
1616
from pandas.util.decorators import Appender, cache_readonly, deprecate
1717
from pandas.core.common import isnull, array_equivalent
1818
import pandas.core.common as com
@@ -30,6 +30,8 @@
3030

3131
_unsortable_types = frozenset(('mixed', 'mixed-integer'))
3232

33+
_index_doc_kwargs = dict(klass='Index', inplace='')
34+
3335

3436
def _try_get_item(x):
3537
try:
@@ -209,6 +211,10 @@ def _simple_new(cls, values, name=None, **kwargs):
209211
result._reset_identity()
210212
return result
211213

214+
def _update_inplace(self, result):
215+
# guard when called from IndexOpsMixin
216+
raise TypeError("Index can't be updated inplace")
217+
212218
def is_(self, other):
213219
"""
214220
More flexible, faster check like ``is`` but that works through views
@@ -2022,6 +2028,15 @@ def drop(self, labels):
20222028
raise ValueError('labels %s not contained in axis' % labels[mask])
20232029
return self.delete(indexer)
20242030

2031+
@Appender(_shared_docs['drop_duplicates'] % _index_doc_kwargs)
2032+
def drop_duplicates(self, take_last=False):
2033+
result = super(Index, self).drop_duplicates(take_last=take_last)
2034+
return self._constructor(result)
2035+
2036+
@Appender(_shared_docs['duplicated'] % _index_doc_kwargs)
2037+
def duplicated(self, take_last=False):
2038+
return super(Index, self).duplicated(take_last=take_last)
2039+
20252040
@classmethod
20262041
def _add_numeric_methods_disabled(cls):
20272042
""" add in numeric methods to disable """

pandas/core/series.py

+12-37
Original file line numberDiff line numberDiff line change
@@ -52,10 +52,13 @@
5252

5353
__all__ = ['Series']
5454

55+
5556
_shared_doc_kwargs = dict(
5657
axes='index',
5758
klass='Series',
58-
axes_single_arg="{0,'index'}"
59+
axes_single_arg="{0,'index'}",
60+
inplace="""inplace : boolean, default False
61+
If True, performs operation inplace and returns None."""
5962
)
6063

6164

@@ -265,6 +268,9 @@ def _set_subtyp(self, is_all_dates):
265268
else:
266269
object.__setattr__(self, '_subtyp', 'series')
267270

271+
def _update_inplace(self, result):
272+
return generic.NDFrame._update_inplace(self, result)
273+
268274
# ndarray compatibility
269275
@property
270276
def dtype(self):
@@ -1114,45 +1120,14 @@ def mode(self):
11141120
from pandas.core.algorithms import mode
11151121
return mode(self)
11161122

1123+
@Appender(base._shared_docs['drop_duplicates'] % _shared_doc_kwargs)
11171124
def drop_duplicates(self, take_last=False, inplace=False):
1118-
"""
1119-
Return Series with duplicate values removed
1120-
1121-
Parameters
1122-
----------
1123-
take_last : boolean, default False
1124-
Take the last observed index in a group. Default first
1125-
inplace : boolean, default False
1126-
If True, performs operation inplace and returns None.
1127-
1128-
Returns
1129-
-------
1130-
deduplicated : Series
1131-
"""
1132-
duplicated = self.duplicated(take_last=take_last)
1133-
result = self[-duplicated]
1134-
if inplace:
1135-
return self._update_inplace(result)
1136-
else:
1137-
return result
1125+
return super(Series, self).drop_duplicates(take_last=take_last,
1126+
inplace=inplace)
11381127

1128+
@Appender(base._shared_docs['duplicated'] % _shared_doc_kwargs)
11391129
def duplicated(self, take_last=False):
1140-
"""
1141-
Return boolean Series denoting duplicate values
1142-
1143-
Parameters
1144-
----------
1145-
take_last : boolean, default False
1146-
Take the last observed index in a group. Default first
1147-
1148-
Returns
1149-
-------
1150-
duplicated : Series
1151-
"""
1152-
keys = _ensure_object(self.values)
1153-
duplicated = lib.duplicated(keys, take_last=take_last)
1154-
return self._constructor(duplicated,
1155-
index=self.index).__finalize__(self)
1130+
return super(Series, self).duplicated(take_last=take_last)
11561131

11571132
def idxmin(self, axis=None, out=None, skipna=True):
11581133
"""

0 commit comments

Comments
 (0)