@@ -853,3 +853,129 @@ Of course if you need integer based selection, then use ``iloc``
853
853
.. ipython:: python
854
854
855
855
dfir.iloc[0 :5 ]
856
+
857
+ Miscellaneous indexing FAQ
858
+ --------------------------
859
+
860
+ Integer indexing with ix
861
+ ~~~~~~~~~~~~~~~~~~~~~~~~
862
+
863
+ Label- based indexing with integer axis labels is a thorny topic. It has been
864
+ discussed heavily on mailing lists and among various members of the scientific
865
+ Python community. In pandas, our general viewpoint is that labels matter more
866
+ than integer locations. Therefore, with an integer axis index * only*
867
+ label- based indexing is possible with the standard tools like `` .ix`` . The
868
+ following code will generate exceptions:
869
+
870
+ .. code- block:: python
871
+
872
+ s = pd.Series(range (5 ))
873
+ s[- 1 ]
874
+ df = pd.DataFrame(np.random.randn(5 , 4 ))
875
+ df
876
+ df.ix[- 2 :]
877
+
878
+ This deliberate decision was made to prevent ambiguities and subtle bugs (many
879
+ users reported finding bugs when the API change was made to stop " falling back"
880
+ on position- based indexing).
881
+
882
+ Non- monotonic indexes require exact matches
883
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
884
+
885
+ If the index of a `` Series`` or `` DataFrame`` is monotonically increasing or decreasing, then the bounds
886
+ of a label- based slice can be outside the range of the index, much like slice indexing a
887
+ normal Python `` list `` . Monotonicity of an index can be tested with the `` is_monotonic_increasing`` and
888
+ `` is_monotonic_decreasing`` attributes.
889
+
890
+ .. ipython:: python
891
+
892
+ df = pd.DataFrame(index = [2 ,3 ,3 ,4 ,5 ], columns = [' data' ], data = range (5 ))
893
+ df.index.is_monotonic_increasing
894
+
895
+ # no rows 0 or 1, but still returns rows 2, 3 (both of them), and 4:
896
+ df.loc[0 :4 , :]
897
+
898
+ # slice is are outside the index, so empty DataFrame is returned
899
+ df.loc[13 :15 , :]
900
+
901
+ On the other hand, if the index is not monotonic, then both slice bounds must be
902
+ * unique* members of the index.
903
+
904
+ .. ipython:: python
905
+
906
+ df = pd.DataFrame(index = [2 ,3 ,1 ,4 ,3 ,5 ], columns = [' data' ], data = range (6 ))
907
+ df.index.is_monotonic_increasing
908
+
909
+ # OK because 2 and 4 are in the index
910
+ df.loc[2 :4 , :]
911
+
912
+ .. code- block:: python
913
+
914
+ # 0 is not in the index
915
+ In [9 ]: df.loc[0 :4 , :]
916
+ KeyError : 0
917
+
918
+ # 3 is not a unique label
919
+ In [11 ]: df.loc[2 :3 , :]
920
+ KeyError : ' Cannot get right slice bound for non-unique label: 3'
921
+
922
+
923
+ Endpoints are inclusive
924
+ ~~~~~~~~~~~~~~~~~~~~~~~
925
+
926
+ Compared with standard Python sequence slicing in which the slice endpoint is
927
+ not inclusive, label- based slicing in pandas ** is inclusive** . The primary
928
+ reason for this is that it is often not possible to easily determine the
929
+ " successor" or next element after a particular label in an index. For example,
930
+ consider the following Series:
931
+
932
+ .. ipython:: python
933
+
934
+ s = pd.Series(np.random.randn(6 ), index = list (' abcdef' ))
935
+ s
936
+
937
+ Suppose we wished to slice from `` c`` to `` e`` , using integers this would be
938
+
939
+ .. ipython:: python
940
+
941
+ s[2 :5 ]
942
+
943
+ However, if you only had `` c`` and `` e`` , determining the next element in the
944
+ index can be somewhat complicated. For example, the following does not work:
945
+
946
+ ::
947
+
948
+ s.loc[' c' :' e' + 1 ]
949
+
950
+ A very common use case is to limit a time series to start and end at two
951
+ specific dates. To enable this, we made the design design to make label- based
952
+ slicing include both endpoints:
953
+
954
+ .. ipython:: python
955
+
956
+ s.loc[' c' :' e' ]
957
+
958
+ This is most definitely a " practicality beats purity" sort of thing, but it is
959
+ something to watch out for if you expect label- based slicing to behave exactly
960
+ in the way that standard Python integer slicing works.
961
+
962
+
963
+ Indexing potentially changes underlying Series dtype
964
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
965
+
966
+ The use of `` reindex_like`` can potentially change the dtype of a `` Series`` .
967
+
968
+ .. ipython:: python
969
+
970
+ series = pd.Series([1 , 2 , 3 ])
971
+ x = pd.Series([True ])
972
+ x.dtype
973
+ x = pd.Series([True ]).reindex_like(series)
974
+ x.dtype
975
+
976
+ This is because `` reindex_like`` silently inserts `` NaNs`` and the `` dtype``
977
+ changes accordingly. This can cause some issues when using `` numpy`` `` ufuncs``
978
+ such as `` numpy.logical_and`` .
979
+
980
+ See the `this old issue < https:// github.com/ pydata/ pandas/ issues/ 2388 > ` __ for a more
981
+ detailed discussion.
0 commit comments