Skip to content

API: consistency with .ix and .loc for getitem operations (GH8613) #9566

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 4, 2015
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 22 additions & 3 deletions doc/source/indexing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ of multi-axis indexing.

- ``.iloc`` is primarily integer position based (from ``0`` to
``length-1`` of the axis), but may also be used with a boolean
array. ``.iloc`` will raise ``IndexError`` if a requested
array. ``.iloc`` will raise ``IndexError`` if a requested
indexer is out-of-bounds, except *slice* indexers which allow
out-of-bounds indexing. (this conforms with python/numpy *slice*
semantics). Allowed inputs are:
Expand Down Expand Up @@ -292,6 +292,27 @@ Selection By Label
This is sometimes called ``chained assignment`` and should be avoided.
See :ref:`Returning a View versus Copy <indexing.view_versus_copy>`

.. warning::

``.loc`` is strict when you present slicers that are not compatible (or convertible) with the index type. For example
using integers in a ``DatetimeIndex``. These will raise a ``TypeError``.

.. ipython:: python
dfl = DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=date_range('20130101',periods=5))
dfl
.. code-block:: python
In [4]: dfl.loc[2:3]
TypeError: cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with these indexers [2] of <type 'int'>
String likes in slicing *can* be convertible to the type of the index and lead to natural slicing.

.. ipython:: python
dfl.loc['20130102':'20130104']
pandas provides a suite of methods in order to have **purely label based indexing**. This is a strict inclusion based protocol.
**at least 1** of the labels for which you ask, must be in the index or a ``KeyError`` will be raised! When slicing, the start bound is *included*, **AND** the stop bound is *included*. Integers are valid labels, but they refer to the label **and not the position**.

Expand Down Expand Up @@ -1486,5 +1507,3 @@ This will **not** work at all, and so should be avoided
The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid
assignment. There may be false positives; situations where a chained assignment is inadvertantly
reported.


197 changes: 133 additions & 64 deletions doc/source/whatsnew/v0.16.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ users upgrade to this version.
New features
~~~~~~~~~~~~

.. _whatsnew_0160.enhancements:

- Reindex now supports ``method='nearest'`` for frames or series with a monotonic increasing or decreasing index (:issue:`9258`):

.. ipython:: python
Expand All @@ -29,7 +31,41 @@ New features

This method is also exposed by the lower level ``Index.get_indexer`` and ``Index.get_loc`` methods.

- DataFrame assign method
- Paths beginning with ~ will now be expanded to begin with the user's home directory (:issue:`9066`)
- Added time interval selection in ``get_data_yahoo`` (:issue:`9071`)
- Added ``Series.str.slice_replace()``, which previously raised ``NotImplementedError`` (:issue:`8888`)
- Added ``Timestamp.to_datetime64()`` to complement ``Timedelta.to_timedelta64()`` (:issue:`9255`)
- ``tseries.frequencies.to_offset()`` now accepts ``Timedelta`` as input (:issue:`9064`)
- Lag parameter was added to the autocorrelation method of ``Series``, defaults to lag-1 autocorrelation (:issue:`9192`)
- ``Timedelta`` will now accept ``nanoseconds`` keyword in constructor (:issue:`9273`)
- SQL code now safely escapes table and column names (:issue:`8986`)

- Added auto-complete for ``Series.str.<tab>``, ``Series.dt.<tab>`` and ``Series.cat.<tab>`` (:issue:`9322`)
- Added ``StringMethods.isalnum()``, ``isalpha()``, ``isdigit()``, ``isspace()``, ``islower()``,
``isupper()``, ``istitle()`` which behave as the same as standard ``str`` (:issue:`9282`)

- Added ``StringMethods.find()`` and ``rfind()`` which behave as the same as standard ``str`` (:issue:`9386`)

- ``Index.get_indexer`` now supports ``method='pad'`` and ``method='backfill'`` even for any target array, not just monotonic targets. These methods also work for monotonic decreasing as well as monotonic increasing indexes (:issue:`9258`).
- ``Index.asof`` now works on all index types (:issue:`9258`).

- Added ``StringMethods.isnumeric`` and ``isdecimal`` which behave as the same as standard ``str`` (:issue:`9439`)
- The ``read_excel()`` function's :ref:`sheetname <_io.specifying_sheets>` argument now accepts a list and ``None``, to get multiple or all sheets respectively. If more than one sheet is specified, a dictionary is returned. (:issue:`9450`)

.. code-block:: python

# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
pd.read_excel('path_to_file.xls',sheetname=['Sheet1',3])

- A ``verbose`` argument has been augmented in ``io.read_excel()``, defaults to False. Set to True to print sheet names as they are parsed. (:issue:`9450`)
- Added ``StringMethods.ljust()`` and ``rjust()`` which behave as the same as standard ``str`` (:issue:`9352`)
- ``StringMethods.pad()`` and ``center()`` now accept ``fillchar`` option to specify filling character (:issue:`9352`)
- Added ``StringMethods.zfill()`` which behave as the same as standard ``str`` (:issue:`9387`)

DataFrame Assign
~~~~~~~~~~~~~~~~

.. _whatsnew_0160.enhancements.assign:

Inspired by `dplyr's
<http://cran.rstudio.com/web/packages/dplyr/vignettes/introduction.html#mutate>`__ ``mutate`` verb, DataFrame has a new
Expand Down Expand Up @@ -71,6 +107,55 @@ calculate the ratio, and plot

See the :ref:`documentation <dsintro.chained_assignment>` for more. (:issue:`9229`)


Interaction with scipy.sparse
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. _whatsnew_0160.enhancements.sparse:

Added :meth:`SparseSeries.to_coo` and :meth:`SparseSeries.from_coo` methods (:issue:`8048`) for converting to and from ``scipy.sparse.coo_matrix`` instances (see :ref:`here <sparse.scipysparse>`). For example, given a SparseSeries with MultiIndex we can convert to a `scipy.sparse.coo_matrix` by specifying the row and column labels as index levels:

.. ipython:: python

from numpy import nan
s = Series([3.0, nan, 1.0, 3.0, nan, nan])
s.index = MultiIndex.from_tuples([(1, 2, 'a', 0),
(1, 2, 'a', 1),
(1, 1, 'b', 0),
(1, 1, 'b', 1),
(2, 1, 'b', 0),
(2, 1, 'b', 1)],
names=['A', 'B', 'C', 'D'])

s

# SparseSeries
ss = s.to_sparse()
ss

A, rows, columns = ss.to_coo(row_levels=['A', 'B'],
column_levels=['C', 'D'],
sort_labels=False)

A
A.todense()
rows
columns

The from_coo method is a convenience method for creating a ``SparseSeries``
from a ``scipy.sparse.coo_matrix``:

.. ipython:: python

from scipy import sparse
A = sparse.coo_matrix(([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])),
shape=(3, 4))
A
A.todense()

ss = SparseSeries.from_coo(A)
ss

.. _whatsnew_0160.api:

.. _whatsnew_0160.api_breaking:
Expand Down Expand Up @@ -211,96 +296,80 @@ Backwards incompatible API changes
p // 0


Indexing Changes
~~~~~~~~~~~~~~~~

Deprecations
~~~~~~~~~~~~
.. _whatsnew_0160.api_breaking.indexing:

.. _whatsnew_0160.deprecations:
The behavior of a small sub-set of edge cases for using ``.loc`` have changed (:issue:`8613`). Furthermore we have improved the content of the error messages that are raised:

- slicing with ``.loc`` where the start and/or stop bound is not found in the index is now allowed; this previously would raise a ``KeyError``. This makes the behavior the same as ``.ix`` in this case. This change is only for slicing, not when indexing with a single label.

Enhancements
~~~~~~~~~~~~
.. ipython:: python

.. _whatsnew_0160.enhancements:
df = DataFrame(np.random.randn(5,4), columns=list('ABCD'), index=date_range('20130101',periods=5))
df
s = Series(range(5),[-2,-1,1,2,3])
s

- Paths beginning with ~ will now be expanded to begin with the user's home directory (:issue:`9066`)
- Added time interval selection in ``get_data_yahoo`` (:issue:`9071`)
- Added ``Series.str.slice_replace()``, which previously raised ``NotImplementedError`` (:issue:`8888`)
- Added ``Timestamp.to_datetime64()`` to complement ``Timedelta.to_timedelta64()`` (:issue:`9255`)
- ``tseries.frequencies.to_offset()`` now accepts ``Timedelta`` as input (:issue:`9064`)
- Lag parameter was added to the autocorrelation method of ``Series``, defaults to lag-1 autocorrelation (:issue:`9192`)
- ``Timedelta`` will now accept ``nanoseconds`` keyword in constructor (:issue:`9273`)
- SQL code now safely escapes table and column names (:issue:`8986`)
Previous Behavior

- Added auto-complete for ``Series.str.<tab>``, ``Series.dt.<tab>`` and ``Series.cat.<tab>`` (:issue:`9322`)
- Added ``StringMethods.isalnum()``, ``isalpha()``, ``isdigit()``, ``isspace()``, ``islower()``,
``isupper()``, ``istitle()`` which behave as the same as standard ``str`` (:issue:`9282`)
.. code-block:: python

- Added ``StringMethods.find()`` and ``rfind()`` which behave as the same as standard ``str`` (:issue:`9386`)
In [4]: df.loc['2013-01-02':'2013-01-10']
KeyError: 'stop bound [2013-01-10] is not in the [index]'

- ``Index.get_indexer`` now supports ``method='pad'`` and ``method='backfill'`` even for any target array, not just monotonic targets. These methods also work for monotonic decreasing as well as monotonic increasing indexes (:issue:`9258`).
- ``Index.asof`` now works on all index types (:issue:`9258`).
In [6]: s.loc[-10:3]
KeyError: 'start bound [-10] is not the [index]'

- Added ``StringMethods.isnumeric`` and ``isdecimal`` which behave as the same as standard ``str`` (:issue:`9439`)
- The ``read_excel()`` function's :ref:`sheetname <_io.specifying_sheets>` argument now accepts a list and ``None``, to get multiple or all sheets respectively. If more than one sheet is specified, a dictionary is returned. (:issue:`9450`)
New Behavior

.. ipython:: python

df.loc['2013-01-02':'2013-01-10']
s.loc[-10:3]

- allow slicing with float-like values on an integer index for ``.ix``. Previously this was only enabled for ``.loc``:

.. code-block:: python

# Returns the 1st and 4th sheet, as a dictionary of DataFrames.
pd.read_excel('path_to_file.xls',sheetname=['Sheet1',3])
Previous Behavior

- A ``verbose`` argument has been augmented in ``io.read_excel()``, defaults to False. Set to True to print sheet names as they are parsed. (:issue:`9450`)
- Added ``StringMethods.ljust()`` and ``rjust()`` which behave as the same as standard ``str`` (:issue:`9352`)
- ``StringMethods.pad()`` and ``center()`` now accept ``fillchar`` option to specify filling character (:issue:`9352`)
- Added ``StringMethods.zfill()`` which behave as the same as standard ``str`` (:issue:`9387`)
In [8]: s.ix[-1.0:2]
TypeError: the slice start value [-1.0] is not a proper indexer for this index type (Int64Index)

Interaction with scipy.sparse
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
New Behavior

.. _whatsnew_0160.enhancements.sparse:
.. ipython:: python

Added :meth:`SparseSeries.to_coo` and :meth:`SparseSeries.from_coo` methods (:issue:`8048`) for converting to and from ``scipy.sparse.coo_matrix`` instances (see :ref:`here <sparse.scipysparse>`). For example, given a SparseSeries with MultiIndex we can convert to a `scipy.sparse.coo_matrix` by specifying the row and column labels as index levels:
In [8]: s.ix[-1.0:2]
Out[2]:
-1 1
1 2
2 3
dtype: int64

.. ipython:: python
- provide a useful exception for indexing with an invalid type for that index when using ``.loc``. For example trying to use ``.loc`` on an index of type ``DatetimeIndex`` or ``PeriodIndex`` or ``TimedeltaIndex``, with an integer (or a float).

from numpy import nan
s = Series([3.0, nan, 1.0, 3.0, nan, nan])
s.index = MultiIndex.from_tuples([(1, 2, 'a', 0),
(1, 2, 'a', 1),
(1, 1, 'b', 0),
(1, 1, 'b', 1),
(2, 1, 'b', 0),
(2, 1, 'b', 1)],
names=['A', 'B', 'C', 'D'])
Previous Behavior

s
.. code-block:: python

# SparseSeries
ss = s.to_sparse()
ss
In [4]: df.loc[2:3]
KeyError: 'start bound [2] is not the [index]'

A, rows, columns = ss.to_coo(row_levels=['A', 'B'],
column_levels=['C', 'D'],
sort_labels=False)
New Behavior

A
A.todense()
rows
columns
.. code-block:: python

The from_coo method is a convenience method for creating a ``SparseSeries``
from a ``scipy.sparse.coo_matrix``:
In [4]: df.loc[2:3]
TypeError: Cannot do slice indexing on <class 'pandas.tseries.index.DatetimeIndex'> with <type 'int'> keys

.. ipython:: python
Deprecations
~~~~~~~~~~~~

from scipy import sparse
A = sparse.coo_matrix(([3.0, 1.0, 2.0], ([1, 0, 0], [0, 2, 3])),
shape=(3, 4))
A
A.todense()
.. _whatsnew_0160.deprecations:

ss = SparseSeries.from_coo(A)
ss

Performance
~~~~~~~~~~~
Expand Down
4 changes: 2 additions & 2 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -1159,11 +1159,11 @@ def _clear_item_cache(self, i=None):
else:
self._item_cache.clear()

def _slice(self, slobj, axis=0, typ=None):
def _slice(self, slobj, axis=0, kind=None):
"""
Construct a slice of this container.

typ parameter is maintained for compatibility with Series slicing.
kind parameter is maintained for compatibility with Series slicing.

"""
axis = self._get_block_manager_axis(axis)
Expand Down
Loading