Skip to content

Commit 9020827

Browse files
committed
API: warning to raise KeyError in the future if not all elements of a list are selected via .loc
closes pandas-dev#15747
1 parent 408ecd2 commit 9020827

16 files changed

+372
-64
lines changed

doc/source/indexing.rst

+108-1
Original file line numberDiff line numberDiff line change
@@ -333,8 +333,15 @@ Selection By Label
333333
334334
dfl.loc['20130102':'20130104']
335335
336+
.. warning::
337+
338+
Starting in 0.21.0, pandas will show a ``FutureWarning`` if indexing with a list-of-lables and not ALL labels are present. In the future
339+
this will raise a ``KeyError``. See :ref:`list-like Using loc with missing keys in a list is Deprecated <indexing.deprecate_loc_reindex_listlike>`
340+
336341
pandas provides a suite of methods in order to have **purely label based indexing**. This is a strict inclusion based protocol.
337-
**At least 1** of the labels for which you ask, must be in the index or a ``KeyError`` will be raised! When slicing, both the start bound **AND** the stop bound are *included*, if present in the index. Integers are valid labels, but they refer to the label **and not the position**.
342+
All of the labels for which you ask, must be in the index or a ``KeyError`` will be raised!
343+
When slicing, both the start bound **AND** the stop bound are *included*, if present in the index.
344+
Integers are valid labels, but they refer to the label **and not the position**.
338345

339346
The ``.loc`` attribute is the primary access method. The following are valid inputs:
340347

@@ -635,6 +642,106 @@ For getting *multiple* indexers, using ``.get_indexer``
635642
dfd.iloc[[0, 2], dfd.columns.get_indexer(['A', 'B'])]
636643
637644
645+
.. _indexing.deprecate_loc_reindex_listlike:
646+
647+
Indexing with missing list-of-labels is Deprecated
648+
--------------------------------------------------
649+
650+
.. warning::
651+
652+
Starting in 0.21.0, using ``.loc`` or ``[]`` with a list-like containing one or more missing labels, is deprecated, in favor of ``.reindex``.
653+
654+
In prior versions, using ``.loc[list-of-labels]`` would work as long as *at least 1* of the keys was found (otherwise it
655+
would raise a ``KeyError``). This behavior is deprecated and will show a warning message pointing to this section. The
656+
recommeded alternative is to use ``.reindex()``.
657+
658+
For example.
659+
660+
.. ipython:: python
661+
662+
s = pd.Series([1, 2, 3])
663+
s
664+
665+
Selection with all keys found is unchanged.
666+
667+
.. ipython:: python
668+
669+
s.loc[[1, 2]]
670+
671+
Previous Behavior
672+
673+
.. code-block:: ipython
674+
675+
676+
In [4]: s.loc[[1, 2, 3]]
677+
Out[4]:
678+
1 2.0
679+
2 3.0
680+
3 NaN
681+
dtype: float64
682+
683+
684+
Current Behavior
685+
686+
In [4]: s.loc[[1, 2, 3]]
687+
Passing list-likes to .loc with any non-matching elements will raise
688+
KeyError in the future, you can use .reindex() as an alternative.
689+
690+
See the documentation here:
691+
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
692+
693+
Out[4]:
694+
1 2.0
695+
2 3.0
696+
3 NaN
697+
dtype: float64
698+
699+
700+
Reindexing
701+
~~~~~~~~~~
702+
703+
The idiomatic way to achieve selecting potentially not-found elmenents is via ``.reindex()``. See also the section on :ref:`reindexing <basics.reindexing>`.
704+
705+
.. ipython:: python
706+
707+
s.reindex([1, 2, 3])
708+
709+
Alternatively, if you want to select only *valid* keys, the following is idiomatic and efficient; it is guaranteed to preserve the dtype of the selection.
710+
711+
.. ipython:: python
712+
713+
labels = [1, 2, 3]
714+
s.loc[s.index.intersection(labels)]
715+
716+
Having a duplicated index will raise for a ``.reindex()``:
717+
718+
.. ipython:: python
719+
720+
s = pd.Series(np.arange(4), index=['a', 'a', 'b', 'c'])
721+
labels = ['c', 'd']
722+
723+
.. code-block:: python
724+
725+
In [17]: s.reindex(labels)
726+
ValueError: cannot reindex from a duplicate axis
727+
728+
Generally, you can interesect the desired labels with the current
729+
axis, and then reindex.
730+
731+
.. ipython:: python
732+
733+
s.loc[s.index.intersection(labels)].reindex(labels)
734+
735+
However, this would *still* raise if your resulting index is duplicated.
736+
737+
.. code-block:: python
738+
739+
In [41]: labels = ['a', 'd']
740+
741+
In [42]: s.loc[s.index.intersection(labels)].reindex(labels)
742+
ValueError: cannot reindex from a duplicate axis
743+
744+
638745
.. _indexing.basics.partial_setting:
639746

640747
Selecting Random Samples

doc/source/whatsnew/v0.15.0.txt

+19-5
Original file line numberDiff line numberDiff line change
@@ -676,10 +676,19 @@ Other notable API changes:
676676

677677
Both will now return a frame reindex by [1,3]. E.g.
678678

679-
.. ipython:: python
679+
.. code-block:: ipython
680680

681-
df.loc[[1,3]]
682-
df.loc[[1,3],:]
681+
In [3]: df.loc[[1,3]]
682+
Out[3]:
683+
0
684+
1 a
685+
3 NaN
686+
687+
In [4]: df.loc[[1,3],:]
688+
Out[4]:
689+
0
690+
1 a
691+
3 NaN
683692

684693
This can also be seen in multi-axis indexing with a ``Panel``.
685694

@@ -693,9 +702,14 @@ Other notable API changes:
693702

694703
The following would raise ``KeyError`` prior to 0.15.0:
695704

696-
.. ipython:: python
705+
.. code-block:: ipython
697706

698-
p.loc[['ItemA','ItemD'],:,'D']
707+
In [5]:
708+
Out[5]:
709+
ItemA ItemD
710+
1 3 NaN
711+
2 7 NaN
712+
3 11 NaN
699713

700714
Furthermore, ``.loc`` will raise If no values are found in a multi-index with a list-like indexer:
701715

doc/source/whatsnew/v0.21.0.txt

+57
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,63 @@ We have updated our minimum supported versions of dependencies (:issue:`15206`,
268268
| Bottleneck | 1.0.0 | |
269269
+--------------+-----------------+----------+
270270

271+
.. _whatsnew_0210.api_breaking.loc:
272+
273+
Indexing with missing list-of-labels is Deprecated
274+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
275+
276+
Previously, selecting at least 1 valid label with a list-like indexer would always succeed, returning ``NaN`` for missing labels.
277+
This will now show a ``FutureWarning``, in the future this will raise a ``KeyError`` (:issue:`15747`).
278+
This warning will trigger on a ``DataFrame`` or a ``Series`` for using ``.loc[]`` or ``[[]]`` when passing a list-of-labels with at least 1 missing label.
279+
See the :ref:`deprecation docs <indexing.deprecate_loc_reindex_listlike>`.
280+
281+
282+
.. ipython:: python
283+
284+
s = pd.Series([1, 2, 3])
285+
s
286+
287+
Previous Behavior
288+
289+
.. code-block:: ipython
290+
291+
292+
In [4]: s.loc[[1, 2, 3]]
293+
Out[4]:
294+
1 2.0
295+
2 3.0
296+
3 NaN
297+
dtype: float64
298+
299+
300+
Current Behavior
301+
302+
In [4]: s.loc[[1, 2, 3]]
303+
Passing list-likes to .loc or [] with any missing label will raise
304+
KeyError in the future, you can use .reindex() as an alternative.
305+
306+
See the documentation here:
307+
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
308+
309+
Out[4]:
310+
1 2.0
311+
2 3.0
312+
3 NaN
313+
dtype: float64
314+
315+
The idiomatic way to achieve selecting potentially not-found elmenents is via ``.reindex()``
316+
317+
.. ipython:: python
318+
319+
s.reindex([1, 2, 3])
320+
321+
Selection with all keys found is unchanged.
322+
323+
.. ipython:: python
324+
325+
s.loc[[1, 2]]
326+
327+
271328
.. _whatsnew_0210.api_breaking.pandas_eval:
272329

273330
Improved error handling during item assignment in pd.eval

pandas/core/indexing.py

+26-6
Original file line numberDiff line numberDiff line change
@@ -1419,13 +1419,33 @@ def _has_valid_type(self, key, axis):
14191419
if isinstance(key, tuple) and isinstance(ax, MultiIndex):
14201420
return True
14211421

1422-
# TODO: don't check the entire key unless necessary
1423-
if (not is_iterator(key) and len(key) and
1424-
np.all(ax.get_indexer_for(key) < 0)):
1422+
if not is_iterator(key) and len(key):
14251423

1426-
raise KeyError(u"None of [{key}] are in the [{axis}]"
1427-
.format(key=key,
1428-
axis=self.obj._get_axis_name(axis)))
1424+
# True indicates missing values
1425+
missing = ax.get_indexer_for(key) < 0
1426+
1427+
if np.any(missing):
1428+
if len(key) == 1 or np.all(missing):
1429+
raise KeyError(
1430+
u"None of [{key}] are in the [{axis}]".format(
1431+
key=key, axis=self.obj._get_axis_name(axis)))
1432+
else:
1433+
1434+
# we skip the warning on Categorical/Interval
1435+
# as this check is actually done (check for
1436+
# non-missing values), but a bit later in the
1437+
# code, so we want to avoid warning & then
1438+
# just raising
1439+
_missing_key_warning = textwrap.dedent("""
1440+
Passing list-likes to .loc or [] with any missing label will raise
1441+
KeyError in the future, you can use .reindex() as an alternative.
1442+
1443+
See the documentation here:
1444+
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike""") # noqa
1445+
1446+
if not (ax.is_categorical() or ax.is_interval()):
1447+
warnings.warn(_missing_key_warning,
1448+
FutureWarning, stacklevel=5)
14291449

14301450
return True
14311451

pandas/core/series.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -691,7 +691,7 @@ def _get_with(self, key):
691691

692692
if key_type == 'integer':
693693
if self.index.is_integer() or self.index.is_floating():
694-
return self.reindex(key)
694+
return self.loc[key]
695695
else:
696696
return self._get_values(key)
697697
elif key_type == 'boolean':

pandas/io/formats/excel.py

+9-1
Original file line numberDiff line numberDiff line change
@@ -356,7 +356,15 @@ def __init__(self, df, na_rep='', float_format=None, cols=None,
356356
self.styler = None
357357
self.df = df
358358
if cols is not None:
359-
self.df = df.loc[:, cols]
359+
360+
# all missing, raise
361+
if not len(Index(cols) & df.columns):
362+
raise KeyError
363+
364+
# 1 missing is ok
365+
# TODO(jreback) this should raise
366+
# on *any* missing columns
367+
self.df = df.reindex(columns=cols)
360368
self.columns = self.df.columns
361369
self.float_format = float_format
362370
self.index = index

pandas/tests/indexing/test_categorical.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,8 @@ def test_loc_listlike(self):
111111
assert_frame_equal(result, expected, check_index_type=True)
112112

113113
# not all labels in the categories
114-
pytest.raises(KeyError, lambda: self.df2.loc[['a', 'd']])
114+
with pytest.raises(KeyError):
115+
self.df2.loc[['a', 'd']]
115116

116117
def test_loc_listlike_dtypes(self):
117118
# GH 11586

pandas/tests/indexing/test_datetime.py

+6-2
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,9 @@ def test_series_partial_set_datetime(self):
223223
Timestamp('2011-01-03')]
224224
exp = Series([np.nan, 0.2, np.nan],
225225
index=pd.DatetimeIndex(keys, name='idx'), name='s')
226-
tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True)
226+
with tm.assert_produces_warning(FutureWarning,
227+
check_stacklevel=False):
228+
tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True)
227229

228230
def test_series_partial_set_period(self):
229231
# GH 11497
@@ -248,5 +250,7 @@ def test_series_partial_set_period(self):
248250
pd.Period('2011-01-03', freq='D')]
249251
exp = Series([np.nan, 0.2, np.nan],
250252
index=pd.PeriodIndex(keys, name='idx'), name='s')
251-
result = ser.loc[keys]
253+
with tm.assert_produces_warning(FutureWarning,
254+
check_stacklevel=False):
255+
result = ser.loc[keys]
252256
tm.assert_series_equal(result, exp)

pandas/tests/indexing/test_iloc.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,8 @@ def test_iloc_non_unique_indexing(self):
617617
expected = DataFrame(new_list)
618618
expected = pd.concat([expected, DataFrame(index=idx[idx > sidx.max()])
619619
])
620-
result = df2.loc[idx]
620+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
621+
result = df2.loc[idx]
621622
tm.assert_frame_equal(result, expected, check_index_type=False)
622623

623624
def test_iloc_empty_list_indexer_is_ok(self):

pandas/tests/indexing/test_indexing.py

+12-6
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,8 @@ def test_dups_fancy_indexing(self):
176176
'test1': [7., 6, np.nan],
177177
'other': ['d', 'c', np.nan]}, index=rows)
178178

179-
result = df.loc[rows]
179+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
180+
result = df.loc[rows]
180181
tm.assert_frame_equal(result, expected)
181182

182183
# see GH5553, make sure we use the right indexer
@@ -186,7 +187,8 @@ def test_dups_fancy_indexing(self):
186187
'other': [np.nan, np.nan, np.nan,
187188
'd', 'c', np.nan]},
188189
index=rows)
189-
result = df.loc[rows]
190+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
191+
result = df.loc[rows]
190192
tm.assert_frame_equal(result, expected)
191193

192194
# inconsistent returns for unique/duplicate indices when values are
@@ -203,20 +205,23 @@ def test_dups_fancy_indexing(self):
203205

204206
# GH 4619; duplicate indexer with missing label
205207
df = DataFrame({"A": [0, 1, 2]})
206-
result = df.loc[[0, 8, 0]]
208+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
209+
result = df.loc[[0, 8, 0]]
207210
expected = DataFrame({"A": [0, np.nan, 0]}, index=[0, 8, 0])
208211
tm.assert_frame_equal(result, expected, check_index_type=False)
209212

210213
df = DataFrame({"A": list('abc')})
211-
result = df.loc[[0, 8, 0]]
214+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
215+
result = df.loc[[0, 8, 0]]
212216
expected = DataFrame({"A": ['a', np.nan, 'a']}, index=[0, 8, 0])
213217
tm.assert_frame_equal(result, expected, check_index_type=False)
214218

215219
# non unique with non unique selector
216220
df = DataFrame({'test': [5, 7, 9, 11]}, index=['A', 'A', 'B', 'C'])
217221
expected = DataFrame(
218222
{'test': [5, 7, 5, 7, np.nan]}, index=['A', 'A', 'A', 'A', 'E'])
219-
result = df.loc[['A', 'A', 'E']]
223+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
224+
result = df.loc[['A', 'A', 'E']]
220225
tm.assert_frame_equal(result, expected)
221226

222227
# GH 5835
@@ -227,7 +232,8 @@ def test_dups_fancy_indexing(self):
227232
expected = pd.concat(
228233
[df.loc[:, ['A', 'B']], DataFrame(np.nan, columns=['C'],
229234
index=df.index)], axis=1)
230-
result = df.loc[:, ['A', 'B', 'C']]
235+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
236+
result = df.loc[:, ['A', 'B', 'C']]
231237
tm.assert_frame_equal(result, expected)
232238

233239
# GH 6504, multi-axis indexing

0 commit comments

Comments
 (0)