Skip to content

DOC: Merge FAQ and gotcha #13768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion doc/source/10min.rst
Original file line number Diff line number Diff line change
Expand Up @@ -810,4 +810,4 @@ If you are trying an operation and you see an exception like:

See :ref:`Comparisons<basics.compare>` for an explanation and what to do.

See :ref:`Gotchas<gotchas>` as well.
See :ref:`FAQ<faq>` as well.
126 changes: 126 additions & 0 deletions doc/source/advanced.rst
Original file line number Diff line number Diff line change
Expand Up @@ -853,3 +853,129 @@ Of course if you need integer based selection, then use ``iloc``
.. ipython:: python

dfir.iloc[0:5]

Miscellaneous indexing FAQ
--------------------------

Integer indexing with ix
~~~~~~~~~~~~~~~~~~~~~~~~

Label-based indexing with integer axis labels is a thorny topic. It has been
discussed heavily on mailing lists and among various members of the scientific
Python community. In pandas, our general viewpoint is that labels matter more
than integer locations. Therefore, with an integer axis index *only*
label-based indexing is possible with the standard tools like ``.ix``. The
following code will generate exceptions:

.. code-block:: python

s = pd.Series(range(5))
s[-1]
df = pd.DataFrame(np.random.randn(5, 4))
df
df.ix[-2:]

This deliberate decision was made to prevent ambiguities and subtle bugs (many
users reported finding bugs when the API change was made to stop "falling back"
on position-based indexing).

Non-monotonic indexes require exact matches
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the index of a ``Series`` or ``DataFrame`` is monotonically increasing or decreasing, then the bounds
of a label-based slice can be outside the range of the index, much like slice indexing a
normal Python ``list``. Monotonicity of an index can be tested with the ``is_monotonic_increasing`` and
``is_monotonic_decreasing`` attributes.

.. ipython:: python

df = pd.DataFrame(index=[2,3,3,4,5], columns=['data'], data=range(5))
df.index.is_monotonic_increasing

# no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4:
df.loc[0:4, :]

# slice is are outside the index, so empty DataFrame is returned
df.loc[13:15, :]

On the other hand, if the index is not monotonic, then both slice bounds must be
*unique* members of the index.

.. ipython:: python

df = pd.DataFrame(index=[2,3,1,4,3,5], columns=['data'], data=range(6))
df.index.is_monotonic_increasing

# OK because 2 and 4 are in the index
df.loc[2:4, :]

.. code-block:: python

# 0 is not in the index
In [9]: df.loc[0:4, :]
KeyError: 0

# 3 is not a unique label
In [11]: df.loc[2:3, :]
KeyError: 'Cannot get right slice bound for non-unique label: 3'


Endpoints are inclusive
~~~~~~~~~~~~~~~~~~~~~~~

Compared with standard Python sequence slicing in which the slice endpoint is
not inclusive, label-based slicing in pandas **is inclusive**. The primary
reason for this is that it is often not possible to easily determine the
"successor" or next element after a particular label in an index. For example,
consider the following Series:

.. ipython:: python

s = pd.Series(np.random.randn(6), index=list('abcdef'))
s

Suppose we wished to slice from ``c`` to ``e``, using integers this would be

.. ipython:: python

s[2:5]

However, if you only had ``c`` and ``e``, determining the next element in the
index can be somewhat complicated. For example, the following does not work:

::

s.loc['c':'e'+1]

A very common use case is to limit a time series to start and end at two
specific dates. To enable this, we made the design design to make label-based
slicing include both endpoints:

.. ipython:: python

s.loc['c':'e']

This is most definitely a "practicality beats purity" sort of thing, but it is
something to watch out for if you expect label-based slicing to behave exactly
in the way that standard Python integer slicing works.


Indexing potentially changes underlying Series dtype
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The use of ``reindex_like`` can potentially change the dtype of a ``Series``.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would state this more generally, as this is not only the case with reindex_like, but also with plain indexing of reindex, if those introduce NaNs


.. ipython:: python

series = pd.Series([1, 2, 3])
x = pd.Series([True])
x.dtype
x = pd.Series([True]).reindex_like(series)
x.dtype

This is because ``reindex_like`` silently inserts ``NaNs`` and the ``dtype``
changes accordingly. This can cause some issues when using ``numpy`` ``ufuncs``
such as ``numpy.logical_and``.

See the `this old issue <https://github.com/pydata/pandas/issues/2388>`__ for a more
detailed discussion.
4 changes: 2 additions & 2 deletions doc/source/basics.rst
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ To evaluate single-element pandas objects in a boolean context, use the method

ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().

See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.
See :ref:`FAQ<faq.truth>` for a more detailed discussion.

.. _basics.equals:

Expand Down Expand Up @@ -1849,7 +1849,7 @@ gotchas

Performing selection operations on ``integer`` type data can easily upcast the data to ``floating``.
The dtype of the input data will be preserved in cases where ``nans`` are not introduced (starting in 0.11.0)
See also :ref:`integer na gotchas <gotchas.intna>`
See also :ref:`Support for integer ``NA`` <faq.intna>`

.. ipython:: python

Expand Down
115 changes: 0 additions & 115 deletions doc/source/faq.rst

This file was deleted.

Loading