DOC: 0.7.0 docs, add iget_value alias and DataFrame.iget_value, GH #627

wesm · wesm · commit c3708f2aacec · 2012-01-19T17:07:27.000-05:00
diff --git a/RELEASE.rst b/RELEASE.rst
@@ -215,6 +215,7 @@ pandas 0.7.0
   - Catch misreported console size when running IPython within Emacs
   - Fix minor bug in pivot table margins, loss of index names and length-1
     'All' tuple in row labels
+  - Add support for legacy
 
 Thanks
 ------
@@ -233,6 +234,7 @@ Thanks
 - Solomon Negusse
 - Wouter Overmeire
 - Christian Prinoth
+- Jeff Reback
 - Sam Reckoner
 - Craig Reeson
 - Jan Schulz
diff --git a/doc/source/gotchas.rst b/doc/source/gotchas.rst
@@ -0,0 +1,82 @@
+.. currentmodule:: pandas
+.. _gotchas:
+
+.. ipython:: python
+   :suppress:
+
+   import numpy as np
+   from pandas import *
+   randn = np.random.randn
+   np.set_printoptions(precision=4, suppress=True)
+
+*******************
+Caveats and Gotchas
+*******************
+
+``NaN``, Integer ``NA`` values and ``NA`` type promotions
+---------------------------------------------------------
+
+Choice of ``NA`` representation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For lack of ``NA`` (missing) support from the ground up in NumPy and Python in
+general, we were given the difficult choice between either
+
+- A *masked array* solution: an array of data and an array of boolean values
+  indicating whether a value
+- Using a special sentinel value, bit pattern, or set of sentinel values to
+  denote ``NA`` across the dtypes
+
+
+Support for integer ``NA``
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``NA`` type promotions
+~~~~~~~~~~~~~~~~~~~~~~
+
+Integer indexing
+----------------
+
+Label-based slicing conventions
+-------------------------------
+
+Non-monotonic indexes require exact matches
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Endpoints are inclusive
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Compared with standard Python sequence slicing in which the slice endpoint is
+not inclusive, label-based slicing in pandas **is inclusive**. The primary
+reason for this is that it is often not possible to easily the "successor" or
+next element after a particular label in an index. For example, consider the
+following Series:
+
+.. ipython:: python
+
+   s = Series(randn(6), index=list('abcdef'))
+   s
+
+Suppose we wished to slice from ``c`` to ``e``, using integers this would be
+
+.. ipython:: python
+
+   s[2:5]
+
+However, if you only had ``c`` and ``e``, determining the next element in the
+index can be somewhat complicated. For example, the following does not work:
+
+::
+
+    s.ix['c':'e'+1]
+
+A very common use case is to limit a time series to start and end at two
+specific dates. To enable this, we made the design design to make label-based slicing include both endpoints:
+
+.. ipython:: python
+
+    s.ix['c':'e']
+
+This is most definitely a "practicality beats purity" sort of thing, but it is
+something to watch out for is you expect label-based slicing to behave exactly
+in the way that standard Python integer slicing works.
diff --git a/doc/source/whatsnew/v0.7.0.txt b/doc/source/whatsnew/v0.7.0.txt
@@ -3,56 +3,6 @@
 v.0.7.0 (Not Yet Released)
 --------------------------
 
-API Changes to integer indexing
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-One of the potentially riskiest API changes in 0.7.0, but also one of the most
-important, was a complete review of how **integer indexes** are handled with
-regard to label-based indexing. Here is an example:
-
-.. ipython:: python
-
-    s = Series(randn(10), index=range(0, 20, 2))
-    s
-    s[0]
-    s[2]
-    s[4]
-
-This is all exactly identical to the behavior before. However, if you ask for a
-key **not** contained in the Series, in versions 0.6.1 and prior, Series would
-*fall back* on a location-based lookup. This now raises a ``KeyError``:
-
-.. code-block:: ipython
-
-   In [2]: s[1]
-   KeyError: 1
-
-This change also has the same impact on DataFrame:
-
-.. code-block:: ipython
-
-   In [3]: df = DataFrame(randn(8, 4), index=range(0, 16, 2))
-
-   In [4]: df
-       0        1       2       3
-   0   0.88427  0.3363 -0.1787  0.03162
-   2   0.14451 -0.1415  0.2504  0.58374
-   4  -1.44779 -0.9186 -1.4996  0.27163
-   6  -0.26598 -2.4184 -0.2658  0.11503
-   8  -0.58776  0.3144 -0.8566  0.61941
-   10  0.10940 -0.7175 -1.0108  0.47990
-   12 -1.16919 -0.3087 -0.6049 -0.43544
-   14 -0.07337  0.3410  0.0424 -0.16037
-
-   In [5]: df.ix[3]
-   KeyError: 3
-
-API refinements regarding label-based slicing
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-Other relevant API Changes
-~~~~~~~~~~~~~~~~~~~~~~~~~~
-
 New features
 ~~~~~~~~~~~~
 
@@ -138,6 +88,136 @@ New features
   aggregate with groupby on a DataFrame, yielding an aggregated result with
   hierarchical columns (GH166_)
 
+API Changes to integer indexing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+One of the potentially riskiest API changes in 0.7.0, but also one of the most
+important, was a complete review of how **integer indexes** are handled with
+regard to label-based indexing. Here is an example:
+
+.. ipython:: python
+
+    s = Series(randn(10), index=range(0, 20, 2))
+    s
+    s[0]
+    s[2]
+    s[4]
+
+This is all exactly identical to the behavior before. However, if you ask for a
+key **not** contained in the Series, in versions 0.6.1 and prior, Series would
+*fall back* on a location-based lookup. This now raises a ``KeyError``:
+
+.. code-block:: ipython
+
+   In [2]: s[1]
+   KeyError: 1
+
+This change also has the same impact on DataFrame:
+
+.. code-block:: ipython
+
+   In [3]: df = DataFrame(randn(8, 4), index=range(0, 16, 2))
+
+   In [4]: df
+       0        1       2       3
+   0   0.88427  0.3363 -0.1787  0.03162
+   2   0.14451 -0.1415  0.2504  0.58374
+   4  -1.44779 -0.9186 -1.4996  0.27163
+   6  -0.26598 -2.4184 -0.2658  0.11503
+   8  -0.58776  0.3144 -0.8566  0.61941
+   10  0.10940 -0.7175 -1.0108  0.47990
+   12 -1.16919 -0.3087 -0.6049 -0.43544
+   14 -0.07337  0.3410  0.0424 -0.16037
+
+   In [5]: df.ix[3]
+   KeyError: 3
+
+In order to support purely integer-based indexing, the following methods have
+been added:
+
+.. csv-table::
+    :header: "Method","Description"
+    :widths: 40,60
+
+	``Series.iget_value(i)``, Retrieve value stored at location ``i``
+	``Series.iget(i)``, Alias for ``iget_value``
+	``DataFrame.irow(i)``, Retrieve the ``i``-th row
+	``DataFrame.icol(j)``, Retrieve the ``j``-th column
+	"``DataFrame.iget_value(i, j)``", Retrieve the value at row ``i`` and column ``j``
+
+API tweaks regarding label-based slicing
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Label-based slicing using ``ix`` now requires that the index be sorted
+(monotonic) **unless** both the start and endpoint are contained in the index:
+
+.. ipython:: python
+
+   s = Series(randn(6), index=list('gmkaec'))
+   s
+
+Then this is OK:
+
+.. ipython:: python
+
+   s.ix['k':'e']
+
+But this is not:
+
+.. code-block:: ipython
+
+   In [12]: s.ix['b':'h']
+   KeyError 'b'
+
+If the index had been sorted, the "range selection" would have been possible:
+
+.. ipython:: python
+
+   s2 = s.sort_index()
+   s2
+   s2.ix['b':'h']
+
+Changes to Series ``[]`` operator
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+As as notational convenience, you can pass a sequence of labels or a label
+slice to a Series when getting and setting values via ``[]`` (i.e. the
+``__getitem__`` and ``__setitem__`` methods). The behavior will be the same as
+passing similar input to ``ix`` **except in the case of integer indexing**:
+
+.. ipython:: python
+
+   s = Series(randn(6), index=list('acegkm'))
+   s
+   s[['m', 'a', 'c', 'e']]
+   s['b':'l']
+   s['c':'k']
+
+In the case of integer indexes, the behavior will be exactly as before
+(shadowing ``ndarray``):
+
+.. ipython:: python
+
+   s = Series(randn(6), index=range(0, 12, 2))
+   s[[4, 0, 2]]
+   s[1:5]
+
+If you wish to do indexing with sequences and slicing on an integer index with
+label semantics, use ``ix``.
+
+Other API Changes
+~~~~~~~~~~~~~~~~~
+
+- The deprecated ``LongPanel`` class has been completely removed
+
+- If ``Series.sort`` is called on a column of a DataFrame, an exception will
+  now be raised. Before it was possible to accidentally mutate a DataFrame's
+  column by doing ``df[col].sort()`` instead of the side-effect free method
+  ``df[col].order()`` (GH316_)
+
+- Miscellaneous renames and deprecations which will (harmlessly) raise
+  ``FutureWarning``
+
 Performance improvements
 ~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -189,6 +269,7 @@ similar operation to the above but using a Python function:
 .. _GH249: https://github.com/wesm/pandas/issues/249
 .. _GH267: https://github.com/wesm/pandas/issues/267
 .. _GH273: https://github.com/wesm/pandas/issues/273
+.. _GH316: https://github.com/wesm/pandas/issues/316
 .. _GH338: https://github.com/wesm/pandas/issues/338
 .. _GH342: https://github.com/wesm/pandas/issues/342
 .. _GH374: https://github.com/wesm/pandas/issues/374
diff --git a/pandas/core/frame.py b/pandas/core/frame.py
@@ -1257,6 +1257,24 @@ def icol(self, i):
         else:
             return self[label]
 
+    def iget_value(self, i, j):
+        """
+        Return scalar value stored at row i and column j, where i and j are
+        integers
+
+        Parameters
+        ----------
+        i : int
+        j : int
+
+        Returns
+        -------
+        value : scalar value
+        """
+        row = self.index[i]
+        col = self.columns[j]
+        return self.get_value(row, col)
+
     def __getitem__(self, key):
         # slice rows
         if isinstance(key, slice):
diff --git a/pandas/core/series.py b/pandas/core/series.py
@@ -477,7 +477,7 @@ def get(self, label, default=None):
         except KeyError:
             return default
 
-    def iget(self, i):
+    def iget_value(self, i):
         """
         Return the i-th value in the Series by location
 
@@ -495,6 +495,8 @@ def iget(self, i):
             label = self.index[i]
             return self[label]
 
+    iget = iget_value
+
     def get_value(self, label):
         """
         Quickly retrieve single value at passed index label