Skip to content

Commit 8fcb152

Browse files
committed
DOC: Merge FAQ and gotcha
1 parent 59f2557 commit 8fcb152

File tree

9 files changed

+221
-395
lines changed

9 files changed

+221
-395
lines changed

doc/source/10min.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -810,4 +810,4 @@ If you are trying an operation and you see an exception like:
810810
811811
See :ref:`Comparisons<basics.compare>` for an explanation and what to do.
812812

813-
See :ref:`Gotchas<gotchas>` as well.
813+
See :ref:`FAQ<faq>` as well.

doc/source/advanced.rst

+126
Original file line numberDiff line numberDiff line change
@@ -853,3 +853,129 @@ Of course if you need integer based selection, then use ``iloc``
853853
.. ipython:: python
854854
855855
dfir.iloc[0:5]
856+
857+
Miscellaneous indexing FAQ
858+
--------------------------
859+
860+
Integer indexing with ix
861+
~~~~~~~~~~~~~~~~~~~~~~~~
862+
863+
Label-based indexing with integer axis labels is a thorny topic. It has been
864+
discussed heavily on mailing lists and among various members of the scientific
865+
Python community. In pandas, our general viewpoint is that labels matter more
866+
than integer locations. Therefore, with an integer axis index *only*
867+
label-based indexing is possible with the standard tools like ``.ix``. The
868+
following code will generate exceptions:
869+
870+
.. code-block:: python
871+
872+
s = pd.Series(range(5))
873+
s[-1]
874+
df = pd.DataFrame(np.random.randn(5, 4))
875+
df
876+
df.ix[-2:]
877+
878+
This deliberate decision was made to prevent ambiguities and subtle bugs (many
879+
users reported finding bugs when the API change was made to stop "falling back"
880+
on position-based indexing).
881+
882+
Non-monotonic indexes require exact matches
883+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
884+
885+
If the index of a ``Series`` or ``DataFrame`` is monotonically increasing or decreasing, then the bounds
886+
of a label-based slice can be outside the range of the index, much like slice indexing a
887+
normal Python ``list``. Monotonicity of an index can be tested with the ``is_monotonic_increasing`` and
888+
``is_monotonic_decreasing`` attributes.
889+
890+
.. ipython:: python
891+
892+
df = pd.DataFrame(index=[2,3,3,4,5], columns=['data'], data=range(5))
893+
df.index.is_monotonic_increasing
894+
895+
# no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4:
896+
df.loc[0:4, :]
897+
898+
# slice is are outside the index, so empty DataFrame is returned
899+
df.loc[13:15, :]
900+
901+
On the other hand, if the index is not monotonic, then both slice bounds must be
902+
*unique* members of the index.
903+
904+
.. ipython:: python
905+
906+
df = pd.DataFrame(index=[2,3,1,4,3,5], columns=['data'], data=range(6))
907+
df.index.is_monotonic_increasing
908+
909+
# OK because 2 and 4 are in the index
910+
df.loc[2:4, :]
911+
912+
.. code-block:: python
913+
914+
# 0 is not in the index
915+
In [9]: df.loc[0:4, :]
916+
KeyError: 0
917+
918+
# 3 is not a unique label
919+
In [11]: df.loc[2:3, :]
920+
KeyError: 'Cannot get right slice bound for non-unique label: 3'
921+
922+
923+
Endpoints are inclusive
924+
~~~~~~~~~~~~~~~~~~~~~~~
925+
926+
Compared with standard Python sequence slicing in which the slice endpoint is
927+
not inclusive, label-based slicing in pandas **is inclusive**. The primary
928+
reason for this is that it is often not possible to easily determine the
929+
"successor" or next element after a particular label in an index. For example,
930+
consider the following Series:
931+
932+
.. ipython:: python
933+
934+
s = pd.Series(np.random.randn(6), index=list('abcdef'))
935+
s
936+
937+
Suppose we wished to slice from ``c`` to ``e``, using integers this would be
938+
939+
.. ipython:: python
940+
941+
s[2:5]
942+
943+
However, if you only had ``c`` and ``e``, determining the next element in the
944+
index can be somewhat complicated. For example, the following does not work:
945+
946+
::
947+
948+
s.loc['c':'e'+1]
949+
950+
A very common use case is to limit a time series to start and end at two
951+
specific dates. To enable this, we made the design design to make label-based
952+
slicing include both endpoints:
953+
954+
.. ipython:: python
955+
956+
s.loc['c':'e']
957+
958+
This is most definitely a "practicality beats purity" sort of thing, but it is
959+
something to watch out for if you expect label-based slicing to behave exactly
960+
in the way that standard Python integer slicing works.
961+
962+
963+
Indexing potentially changes underlying Series dtype
964+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
965+
966+
The use of ``reindex_like`` can potentially change the dtype of a ``Series``.
967+
968+
.. ipython:: python
969+
970+
series = pd.Series([1, 2, 3])
971+
x = pd.Series([True])
972+
x.dtype
973+
x = pd.Series([True]).reindex_like(series)
974+
x.dtype
975+
976+
This is because ``reindex_like`` silently inserts ``NaNs`` and the ``dtype``
977+
changes accordingly. This can cause some issues when using ``numpy`` ``ufuncs``
978+
such as ``numpy.logical_and``.
979+
980+
See the `this old issue <https://github.com/pydata/pandas/issues/2388>`__ for a more
981+
detailed discussion.

doc/source/basics.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ To evaluate single-element pandas objects in a boolean context, use the method
287287
288288
ValueError: The truth value of an array is ambiguous. Use a.empty, a.any() or a.all().
289289
290-
See :ref:`gotchas<gotchas.truth>` for a more detailed discussion.
290+
See :ref:`FAQ<faq.truth>` for a more detailed discussion.
291291

292292
.. _basics.equals:
293293

@@ -1849,7 +1849,7 @@ gotchas
18491849

18501850
Performing selection operations on ``integer`` type data can easily upcast the data to ``floating``.
18511851
The dtype of the input data will be preserved in cases where ``nans`` are not introduced (starting in 0.11.0)
1852-
See also :ref:`integer na gotchas <gotchas.intna>`
1852+
See also :ref:`Support for integer ``NA`` <faq.intna>`
18531853

18541854
.. ipython:: python
18551855

doc/source/faq.rst

-115
This file was deleted.

0 commit comments

Comments
 (0)