Skip to content

Commit 4e563ea

Browse files
committed
Merge pull request #9258 from shoyer/get_nearest
API/ENH: add method='nearest' to Index.get_indexer/reindex and method to get_loc
2 parents c4f2be6 + f116421 commit 4e563ea

15 files changed

+508
-157
lines changed

doc/source/basics.rst

+11-13
Original file line numberDiff line numberDiff line change
@@ -948,15 +948,9 @@ chosen from the following table:
948948

949949
pad / ffill, Fill values forward
950950
bfill / backfill, Fill values backward
951+
nearest, Fill from the nearest index value
951952

952-
Other fill methods could be added, of course, but these are the two most
953-
commonly used for time series data. In a way they only make sense for time
954-
series or otherwise ordered data, but you may have an application on non-time
955-
series data where this sort of "interpolation" logic is the correct thing to
956-
do. More sophisticated interpolation of missing values would be an obvious
957-
extension.
958-
959-
We illustrate these fill methods on a simple TimeSeries:
953+
We illustrate these fill methods on a simple Series:
960954

961955
.. ipython:: python
962956
@@ -969,18 +963,22 @@ We illustrate these fill methods on a simple TimeSeries:
969963
ts2.reindex(ts.index)
970964
ts2.reindex(ts.index, method='ffill')
971965
ts2.reindex(ts.index, method='bfill')
966+
ts2.reindex(ts.index, method='nearest')
972967
973-
Note these methods require that the indexes are **order increasing**.
968+
These methods require that the indexes are **ordered** increasing or
969+
decreasing.
974970

975-
Note the same result could have been achieved using :ref:`fillna
976-
<missing_data.fillna>`:
971+
Note that the same result could have been achieved using
972+
:ref:`fillna <missing_data.fillna>` (except for ``method='nearest'``) or
973+
:ref:`interpolate <missing_data.interpolation>`:
977974

978975
.. ipython:: python
979976
980977
ts2.reindex(ts.index).fillna(method='ffill')
981978
982-
Note that ``reindex`` will raise a ValueError if the index is not
983-
monotonic. ``fillna`` will not make any checks on the order of the index.
979+
``reindex`` will raise a ValueError if the index is not monotonic increasing or
980+
descreasing. ``fillna`` and ``interpolate`` will not make any checks on the
981+
order of the index.
984982

985983
.. _basics.drop:
986984

doc/source/whatsnew/v0.16.0.txt

+28
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,15 @@ users upgrade to this version.
2020
New features
2121
~~~~~~~~~~~~
2222

23+
- Reindex now supports ``method='nearest'`` for frames or series with a monotonic increasing or decreasing index (:issue:`9258`):
24+
25+
.. ipython:: python
26+
27+
df = pd.DataFrame({'x': range(5)})
28+
df.reindex([0.2, 1.8, 3.5], method='nearest')
29+
30+
This method is also exposed by the lower level ``Index.get_indexer`` and ``Index.get_loc`` methods.
31+
2332
.. _whatsnew_0160.api:
2433

2534
.. _whatsnew_0160.api_breaking:
@@ -189,6 +198,9 @@ Enhancements
189198

190199
- Added ``StringMethods.find()`` and ``rfind()`` which behave as the same as standard ``str`` (:issue:`9386`)
191200

201+
- ``Index.get_indexer`` now supports ``method='pad'`` and ``method='backfill'`` even for any target array, not just monotonic targets. These methods also work for monotonic decreasing as well as monotonic increasing indexes (:issue:`9258`).
202+
- ``Index.asof`` now works on all index types (:issue:`9258`).
203+
192204
- Added ``StringMethods.isnumeric`` and ``isdecimal`` which behave as the same as standard ``str`` (:issue:`9439`)
193205
- The ``read_excel()`` function's :ref:`sheetname <_io.specifying_sheets>` argument now accepts a list and ``None``, to get multiple or all sheets respectively. If more than one sheet is specified, a dictionary is returned. (:issue:`9450`)
194206

@@ -252,6 +264,22 @@ Bug Fixes
252264

253265
- Fixed character encoding bug in ``read_stata`` and ``StataReader`` when loading data from a URL (:issue:`9231`).
254266

267+
- Looking up a partial string label with ``DatetimeIndex.asof`` now includes values that match the string, even if they are after the start of the partial string label (:issue:`9258`). Old behavior:
268+
269+
.. ipython:: python
270+
:verbatim:
271+
272+
In [4]: pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
273+
Out[4]: Timestamp('2000-01-31 00:00:00')
274+
275+
Fixed behavior:
276+
277+
.. ipython:: python
278+
279+
pd.to_datetime(['2000-01-31', '2000-02-28']).asof('2000-02')
280+
281+
To reproduce the old behavior, simply add more precision to the label (e.g., use ``2000-02-01`` instead of ``2000-02``).
282+
255283

256284

257285
- Bug in adding ``offsets.Nano`` to other offets raises ``TypeError`` (:issue:`9284`)

pandas/core/common.py

+14-4
Original file line numberDiff line numberDiff line change
@@ -2682,21 +2682,31 @@ def _astype_nansafe(arr, dtype, copy=True):
26822682
return arr.view(dtype)
26832683

26842684

2685-
def _clean_fill_method(method):
2685+
def _clean_fill_method(method, allow_nearest=False):
26862686
if method is None:
26872687
return None
26882688
method = method.lower()
26892689
if method == 'ffill':
26902690
method = 'pad'
26912691
if method == 'bfill':
26922692
method = 'backfill'
2693-
if method not in ['pad', 'backfill']:
2694-
msg = ('Invalid fill method. Expecting pad (ffill) or backfill '
2695-
'(bfill). Got %s' % method)
2693+
2694+
valid_methods = ['pad', 'backfill']
2695+
expecting = 'pad (ffill) or backfill (bfill)'
2696+
if allow_nearest:
2697+
valid_methods.append('nearest')
2698+
expecting = 'pad (ffill), backfill (bfill) or nearest'
2699+
if method not in valid_methods:
2700+
msg = ('Invalid fill method. Expecting %s. Got %s'
2701+
% (expecting, method))
26962702
raise ValueError(msg)
26972703
return method
26982704

26992705

2706+
def _clean_reindex_fill_method(method):
2707+
return _clean_fill_method(method, allow_nearest=True)
2708+
2709+
27002710
def _all_none(*args):
27012711
for arg in args:
27022712
if arg is not None:

pandas/core/generic.py

+19-17
Original file line numberDiff line numberDiff line change
@@ -1672,10 +1672,12 @@ def sort_index(self, axis=0, ascending=True):
16721672
keywords)
16731673
New labels / index to conform to. Preferably an Index object to
16741674
avoid duplicating data
1675-
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
1676-
Method to use for filling holes in reindexed DataFrame
1677-
pad / ffill: propagate last valid observation forward to next valid
1678-
backfill / bfill: use NEXT valid observation to fill gap
1675+
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
1676+
Method to use for filling holes in reindexed DataFrame:
1677+
* default: don't fill gaps
1678+
* pad / ffill: propagate last valid observation forward to next valid
1679+
* backfill / bfill: use next valid observation to fill gap
1680+
* nearest: use nearest valid observations to fill gap
16791681
copy : boolean, default True
16801682
Return a new object, even if the passed indexes are the same
16811683
level : int or name
@@ -1703,7 +1705,7 @@ def reindex(self, *args, **kwargs):
17031705

17041706
# construct the args
17051707
axes, kwargs = self._construct_axes_from_arguments(args, kwargs)
1706-
method = com._clean_fill_method(kwargs.get('method'))
1708+
method = com._clean_reindex_fill_method(kwargs.get('method'))
17071709
level = kwargs.get('level')
17081710
copy = kwargs.get('copy', True)
17091711
limit = kwargs.get('limit')
@@ -1744,9 +1746,8 @@ def _reindex_axes(self, axes, level, limit, method, fill_value, copy):
17441746

17451747
axis = self._get_axis_number(a)
17461748
obj = obj._reindex_with_indexers(
1747-
{axis: [new_index, indexer]}, method=method,
1748-
fill_value=fill_value, limit=limit, copy=copy,
1749-
allow_dups=False)
1749+
{axis: [new_index, indexer]},
1750+
fill_value=fill_value, copy=copy, allow_dups=False)
17501751

17511752
return obj
17521753

@@ -1770,10 +1771,12 @@ def _reindex_multi(self, axes, copy, fill_value):
17701771
New labels / index to conform to. Preferably an Index object to
17711772
avoid duplicating data
17721773
axis : %(axes_single_arg)s
1773-
method : {'backfill', 'bfill', 'pad', 'ffill', None}, default None
1774-
Method to use for filling holes in reindexed object.
1775-
pad / ffill: propagate last valid observation forward to next valid
1776-
backfill / bfill: use NEXT valid observation to fill gap
1774+
method : {None, 'backfill'/'bfill', 'pad'/'ffill', 'nearest'}, optional
1775+
Method to use for filling holes in reindexed DataFrame:
1776+
* default: don't fill gaps
1777+
* pad / ffill: propagate last valid observation forward to next valid
1778+
* backfill / bfill: use next valid observation to fill gap
1779+
* nearest: use nearest valid observations to fill gap
17771780
copy : boolean, default True
17781781
Return a new object, even if the passed indexes are the same
17791782
level : int or name
@@ -1802,15 +1805,14 @@ def reindex_axis(self, labels, axis=0, method=None, level=None, copy=True,
18021805

18031806
axis_name = self._get_axis_name(axis)
18041807
axis_values = self._get_axis(axis_name)
1805-
method = com._clean_fill_method(method)
1808+
method = com._clean_reindex_fill_method(method)
18061809
new_index, indexer = axis_values.reindex(labels, method, level,
18071810
limit=limit)
18081811
return self._reindex_with_indexers(
1809-
{axis: [new_index, indexer]}, method=method, fill_value=fill_value,
1810-
limit=limit, copy=copy)
1812+
{axis: [new_index, indexer]}, fill_value=fill_value, copy=copy)
18111813

1812-
def _reindex_with_indexers(self, reindexers, method=None,
1813-
fill_value=np.nan, limit=None, copy=False,
1814+
def _reindex_with_indexers(self, reindexers,
1815+
fill_value=np.nan, copy=False,
18141816
allow_dups=False):
18151817
""" allow_dups indicates an internal call here """
18161818

0 commit comments

Comments
 (0)