@@ -80,3 +80,58 @@ specific dates. To enable this, we made the design design to make label-based sl
80
80
This is most definitely a "practicality beats purity" sort of thing, but it is
81
81
something to watch out for is you expect label-based slicing to behave exactly
82
82
in the way that standard Python integer slicing works.
83
+
84
+ Miscellaneous indexing gotchas
85
+ ------------------------------
86
+
87
+ Reindex versus ix gotchas
88
+ ~~~~~~~~~~~~~~~~~~~~~~~~~
89
+
90
+ Many users will find themselves using the ``ix `` indexing capabilities as a
91
+ concise means of selecting data from a pandas object:
92
+
93
+ .. ipython :: python
94
+
95
+ df = DataFrame(randn(6 , 4 ), columns = [' one' , ' two' , ' three' , ' four' ],
96
+ index = list (' abcdef' ))
97
+ df
98
+ df.ix[[' b' , ' c' , ' e' ]]
99
+
100
+ This is, of course, completely equivalent *in this case * to using th
101
+ ``reindex `` method:
102
+
103
+ .. ipython :: python
104
+
105
+ df.reindex([' b' , ' c' , ' e' ])
106
+
107
+ Some might conclude that ``ix `` and ``reindex `` are 100% equivalent based on
108
+ this. This is indeed true **except in the case of integer indexing **. For
109
+ example, the above operation could alternately have been expressed as:
110
+
111
+ .. ipython :: python
112
+
113
+ df.ix[[1 , 2 , 4 ]]
114
+
115
+ If you pass ``[1, 2, 4] `` to ``reindex `` you will get another thing entirely:
116
+
117
+ .. ipython :: python
118
+
119
+ df.reindex([1 , 2 , 4 ])
120
+
121
+ So it's important to remember that ``reindex `` is **strict label indexing
122
+ only **. This can lead to some potentially surprising results in pathological
123
+ cases where an index contains, say, both integers and strings:
124
+
125
+ .. ipython :: python
126
+
127
+ s = Series([1 , 2 , 3 ], index = [' a' , 0 , 1 ])
128
+ s
129
+ s.ix[[0 , 1 ]]
130
+ s.reindex([0 , 1 ])
131
+
132
+ Because the index in this case does not contain solely integers, ``ix `` falls
133
+ back on integer indexing. By contrast, ``reindex `` only looks for the values
134
+ passed in the index, thus finding the integers ``0 `` and ``1 ``. While it would
135
+ be possible to insert some logic to check whether a passed sequence is all
136
+ contained in the index, that logic would exact a very high cost in large data
137
+ sets.
0 commit comments