Skip to content

Commit c3708f2

Browse files
committed
DOC: 0.7.0 docs, add iget_value alias and DataFrame.iget_value, GH #627
1 parent 2ad3694 commit c3708f2

File tree

5 files changed

+236
-51
lines changed

5 files changed

+236
-51
lines changed

RELEASE.rst

+2
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,7 @@ pandas 0.7.0
215215
- Catch misreported console size when running IPython within Emacs
216216
- Fix minor bug in pivot table margins, loss of index names and length-1
217217
'All' tuple in row labels
218+
- Add support for legacy
218219

219220
Thanks
220221
------
@@ -233,6 +234,7 @@ Thanks
233234
- Solomon Negusse
234235
- Wouter Overmeire
235236
- Christian Prinoth
237+
- Jeff Reback
236238
- Sam Reckoner
237239
- Craig Reeson
238240
- Jan Schulz

doc/source/gotchas.rst

+82
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
.. currentmodule:: pandas
2+
.. _gotchas:
3+
4+
.. ipython:: python
5+
:suppress:
6+
7+
import numpy as np
8+
from pandas import *
9+
randn = np.random.randn
10+
np.set_printoptions(precision=4, suppress=True)
11+
12+
*******************
13+
Caveats and Gotchas
14+
*******************
15+
16+
``NaN``, Integer ``NA`` values and ``NA`` type promotions
17+
---------------------------------------------------------
18+
19+
Choice of ``NA`` representation
20+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21+
22+
For lack of ``NA`` (missing) support from the ground up in NumPy and Python in
23+
general, we were given the difficult choice between either
24+
25+
- A *masked array* solution: an array of data and an array of boolean values
26+
indicating whether a value
27+
- Using a special sentinel value, bit pattern, or set of sentinel values to
28+
denote ``NA`` across the dtypes
29+
30+
31+
Support for integer ``NA``
32+
~~~~~~~~~~~~~~~~~~~~~~~~~~
33+
34+
``NA`` type promotions
35+
~~~~~~~~~~~~~~~~~~~~~~
36+
37+
Integer indexing
38+
----------------
39+
40+
Label-based slicing conventions
41+
-------------------------------
42+
43+
Non-monotonic indexes require exact matches
44+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
45+
46+
Endpoints are inclusive
47+
~~~~~~~~~~~~~~~~~~~~~~~
48+
49+
Compared with standard Python sequence slicing in which the slice endpoint is
50+
not inclusive, label-based slicing in pandas **is inclusive**. The primary
51+
reason for this is that it is often not possible to easily the "successor" or
52+
next element after a particular label in an index. For example, consider the
53+
following Series:
54+
55+
.. ipython:: python
56+
57+
s = Series(randn(6), index=list('abcdef'))
58+
s
59+
60+
Suppose we wished to slice from ``c`` to ``e``, using integers this would be
61+
62+
.. ipython:: python
63+
64+
s[2:5]
65+
66+
However, if you only had ``c`` and ``e``, determining the next element in the
67+
index can be somewhat complicated. For example, the following does not work:
68+
69+
::
70+
71+
s.ix['c':'e'+1]
72+
73+
A very common use case is to limit a time series to start and end at two
74+
specific dates. To enable this, we made the design design to make label-based slicing include both endpoints:
75+
76+
.. ipython:: python
77+
78+
s.ix['c':'e']
79+
80+
This is most definitely a "practicality beats purity" sort of thing, but it is
81+
something to watch out for is you expect label-based slicing to behave exactly
82+
in the way that standard Python integer slicing works.

doc/source/whatsnew/v0.7.0.txt

+131-50
Original file line numberDiff line numberDiff line change
@@ -3,56 +3,6 @@
33
v.0.7.0 (Not Yet Released)
44
--------------------------
55

6-
API Changes to integer indexing
7-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
8-
9-
One of the potentially riskiest API changes in 0.7.0, but also one of the most
10-
important, was a complete review of how **integer indexes** are handled with
11-
regard to label-based indexing. Here is an example:
12-
13-
.. ipython:: python
14-
15-
s = Series(randn(10), index=range(0, 20, 2))
16-
s
17-
s[0]
18-
s[2]
19-
s[4]
20-
21-
This is all exactly identical to the behavior before. However, if you ask for a
22-
key **not** contained in the Series, in versions 0.6.1 and prior, Series would
23-
*fall back* on a location-based lookup. This now raises a ``KeyError``:
24-
25-
.. code-block:: ipython
26-
27-
In [2]: s[1]
28-
KeyError: 1
29-
30-
This change also has the same impact on DataFrame:
31-
32-
.. code-block:: ipython
33-
34-
In [3]: df = DataFrame(randn(8, 4), index=range(0, 16, 2))
35-
36-
In [4]: df
37-
0 1 2 3
38-
0 0.88427 0.3363 -0.1787 0.03162
39-
2 0.14451 -0.1415 0.2504 0.58374
40-
4 -1.44779 -0.9186 -1.4996 0.27163
41-
6 -0.26598 -2.4184 -0.2658 0.11503
42-
8 -0.58776 0.3144 -0.8566 0.61941
43-
10 0.10940 -0.7175 -1.0108 0.47990
44-
12 -1.16919 -0.3087 -0.6049 -0.43544
45-
14 -0.07337 0.3410 0.0424 -0.16037
46-
47-
In [5]: df.ix[3]
48-
KeyError: 3
49-
50-
API refinements regarding label-based slicing
51-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
52-
53-
Other relevant API Changes
54-
~~~~~~~~~~~~~~~~~~~~~~~~~~
55-
566
New features
577
~~~~~~~~~~~~
588

@@ -138,6 +88,136 @@ New features
13888
aggregate with groupby on a DataFrame, yielding an aggregated result with
13989
hierarchical columns (GH166_)
14090

91+
API Changes to integer indexing
92+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
93+
94+
One of the potentially riskiest API changes in 0.7.0, but also one of the most
95+
important, was a complete review of how **integer indexes** are handled with
96+
regard to label-based indexing. Here is an example:
97+
98+
.. ipython:: python
99+
100+
s = Series(randn(10), index=range(0, 20, 2))
101+
s
102+
s[0]
103+
s[2]
104+
s[4]
105+
106+
This is all exactly identical to the behavior before. However, if you ask for a
107+
key **not** contained in the Series, in versions 0.6.1 and prior, Series would
108+
*fall back* on a location-based lookup. This now raises a ``KeyError``:
109+
110+
.. code-block:: ipython
111+
112+
In [2]: s[1]
113+
KeyError: 1
114+
115+
This change also has the same impact on DataFrame:
116+
117+
.. code-block:: ipython
118+
119+
In [3]: df = DataFrame(randn(8, 4), index=range(0, 16, 2))
120+
121+
In [4]: df
122+
0 1 2 3
123+
0 0.88427 0.3363 -0.1787 0.03162
124+
2 0.14451 -0.1415 0.2504 0.58374
125+
4 -1.44779 -0.9186 -1.4996 0.27163
126+
6 -0.26598 -2.4184 -0.2658 0.11503
127+
8 -0.58776 0.3144 -0.8566 0.61941
128+
10 0.10940 -0.7175 -1.0108 0.47990
129+
12 -1.16919 -0.3087 -0.6049 -0.43544
130+
14 -0.07337 0.3410 0.0424 -0.16037
131+
132+
In [5]: df.ix[3]
133+
KeyError: 3
134+
135+
In order to support purely integer-based indexing, the following methods have
136+
been added:
137+
138+
.. csv-table::
139+
:header: "Method","Description"
140+
:widths: 40,60
141+
142+
``Series.iget_value(i)``, Retrieve value stored at location ``i``
143+
``Series.iget(i)``, Alias for ``iget_value``
144+
``DataFrame.irow(i)``, Retrieve the ``i``-th row
145+
``DataFrame.icol(j)``, Retrieve the ``j``-th column
146+
"``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j``
147+
148+
API tweaks regarding label-based slicing
149+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
150+
151+
Label-based slicing using ``ix`` now requires that the index be sorted
152+
(monotonic) **unless** both the start and endpoint are contained in the index:
153+
154+
.. ipython:: python
155+
156+
s = Series(randn(6), index=list('gmkaec'))
157+
s
158+
159+
Then this is OK:
160+
161+
.. ipython:: python
162+
163+
s.ix['k':'e']
164+
165+
But this is not:
166+
167+
.. code-block:: ipython
168+
169+
In [12]: s.ix['b':'h']
170+
KeyError 'b'
171+
172+
If the index had been sorted, the "range selection" would have been possible:
173+
174+
.. ipython:: python
175+
176+
s2 = s.sort_index()
177+
s2
178+
s2.ix['b':'h']
179+
180+
Changes to Series ``[]`` operator
181+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182+
183+
As as notational convenience, you can pass a sequence of labels or a label
184+
slice to a Series when getting and setting values via ``[]`` (i.e. the
185+
``__getitem__`` and ``__setitem__`` methods). The behavior will be the same as
186+
passing similar input to ``ix`` **except in the case of integer indexing**:
187+
188+
.. ipython:: python
189+
190+
s = Series(randn(6), index=list('acegkm'))
191+
s
192+
s[['m', 'a', 'c', 'e']]
193+
s['b':'l']
194+
s['c':'k']
195+
196+
In the case of integer indexes, the behavior will be exactly as before
197+
(shadowing ``ndarray``):
198+
199+
.. ipython:: python
200+
201+
s = Series(randn(6), index=range(0, 12, 2))
202+
s[[4, 0, 2]]
203+
s[1:5]
204+
205+
If you wish to do indexing with sequences and slicing on an integer index with
206+
label semantics, use ``ix``.
207+
208+
Other API Changes
209+
~~~~~~~~~~~~~~~~~
210+
211+
- The deprecated ``LongPanel`` class has been completely removed
212+
213+
- If ``Series.sort`` is called on a column of a DataFrame, an exception will
214+
now be raised. Before it was possible to accidentally mutate a DataFrame's
215+
column by doing ``df[col].sort()`` instead of the side-effect free method
216+
``df[col].order()`` (GH316_)
217+
218+
- Miscellaneous renames and deprecations which will (harmlessly) raise
219+
``FutureWarning``
220+
141221
Performance improvements
142222
~~~~~~~~~~~~~~~~~~~~~~~~
143223

@@ -189,6 +269,7 @@ similar operation to the above but using a Python function:
189269
.. _GH249: https://github.com/wesm/pandas/issues/249
190270
.. _GH267: https://github.com/wesm/pandas/issues/267
191271
.. _GH273: https://github.com/wesm/pandas/issues/273
272+
.. _GH316: https://github.com/wesm/pandas/issues/316
192273
.. _GH338: https://github.com/wesm/pandas/issues/338
193274
.. _GH342: https://github.com/wesm/pandas/issues/342
194275
.. _GH374: https://github.com/wesm/pandas/issues/374

pandas/core/frame.py

+18
Original file line numberDiff line numberDiff line change
@@ -1257,6 +1257,24 @@ def icol(self, i):
12571257
else:
12581258
return self[label]
12591259

1260+
def iget_value(self, i, j):
1261+
"""
1262+
Return scalar value stored at row i and column j, where i and j are
1263+
integers
1264+
1265+
Parameters
1266+
----------
1267+
i : int
1268+
j : int
1269+
1270+
Returns
1271+
-------
1272+
value : scalar value
1273+
"""
1274+
row = self.index[i]
1275+
col = self.columns[j]
1276+
return self.get_value(row, col)
1277+
12601278
def __getitem__(self, key):
12611279
# slice rows
12621280
if isinstance(key, slice):

pandas/core/series.py

+3-1
Original file line numberDiff line numberDiff line change
@@ -477,7 +477,7 @@ def get(self, label, default=None):
477477
except KeyError:
478478
return default
479479

480-
def iget(self, i):
480+
def iget_value(self, i):
481481
"""
482482
Return the i-th value in the Series by location
483483
@@ -495,6 +495,8 @@ def iget(self, i):
495495
label = self.index[i]
496496
return self[label]
497497

498+
iget = iget_value
499+
498500
def get_value(self, label):
499501
"""
500502
Quickly retrieve single value at passed index label

0 commit comments

Comments
 (0)