DOC: more docs loose ends

wesm · wesm · commit 32d65ddb78cf · 2011-10-21T21:57:38.000-04:00
diff --git a/TODO.rst b/TODO.rst
@@ -15,25 +15,25 @@ TODO docs
   - auto-sniff delimiter
   - MultiIndex
   - generally more documentation
-
-- pivot_table
-
+- DONE pivot_table
 - DONE Set mixed-type values with .ix
-- get_dtype_counts / dtypes
-- save / load functions
-- combine_first
-- describe for Series
-- DataFrame.to_string
-- Index / MultiIndex names
-- Unstack / stack by level name
-- ignore_index in DataFrame.append
+- DONE get_dtype_counts / dtypes
+- DONE save / load functions
+- DONE isnull/notnull as instance methods
+- DONE DataFrame.to_string
+- DONE IPython tab complete hook
+- DONE ignore_index in DataFrame.append
+- DONE describe for Series with dtype=object
+- DONE as_index=False in groupby
+- DONOTWANT is_monotonic
+- DONE DataFrame.to_csv: different delimiters
 - Inner join on key
 - Multi-key joining
-- as_index=False in groupby
-- is_monotonic
-- isnull/notnull as instance methods
+- Index / MultiIndex names
+
+- combine_first
+- Unstack / stack by level name
 - name attribute on Series
-- DataFrame.to_csv: different delimiters?
 - groupby with level name
 - MultiIndex
   - get_level_values
@@ -43,7 +43,6 @@ TODO docs
 - df[col_list]
 - Panel.rename_axis
 - & and | for intersection / union
-- IPython tab complete hook
 
 Performance blog
 ----------------
diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -242,9 +242,9 @@ will exclude NAs on Series input by default:
 Summarizing data: describe
 ~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-For floating point data, there is a convenient ``describe`` function which
-computes a variety of summary statistics about a Series or the columns of a
-DataFrame (excluding NAs of course):
+There is a convenient ``describe`` function which computes a variety of summary
+statistics about a Series or the columns of a DataFrame (excluding NAs of
+course):
 
 .. ipython:: python
 
@@ -255,6 +255,16 @@ DataFrame (excluding NAs of course):
     frame.ix[::2] = np.nan
     frame.describe()
 
+For a non-numerical Series object, `describe` will give a simple summary of the
+number of unique values and most frequently occurring values:
+
+
+.. ipython:: python
+
+   s = Series(['a', 'a', 'b', 'b', 'a', 'a', np.nan, 'c', 'd', 'a'])
+   s.describe()
+
+
 Correlations between objects
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
@@ -657,15 +667,28 @@ alternately passing the ``dtype`` keyword argument to the object constructor.
 Pickling and serialization
 --------------------------
 
-All pandas objects are equipped with ``save`` and ``load`` methods which use
-Python's ``cPickle`` module to save and load data structures to disk using the
-pickle format.
+All pandas objects are equipped with ``save`` methods which use Python's
+``cPickle`` module to save data structures to disk using the pickle format.
 
 .. ipython:: python
 
    df
    df.save('foo.pickle')
-   DataFrame.load('foo.pickle')
+
+The ``load`` function in the ``pandas`` namespace can be used to load any
+pickled pandas object (or any other pickled object) from file:
+
+
+.. ipython:: python
+
+   load('foo.pickle')
+
+There is also a ``save`` function which takes any object as its first argument:
+
+.. ipython:: python
+
+   save(df, 'foo.pickle')
+   load('foo.pickle')
 
 .. ipython:: python
    :suppress:
diff --git a/doc/source/dsintro.rst b/doc/source/dsintro.rst
@@ -439,12 +439,51 @@ R package):
    baseball = read_csv('data/baseball.csv')
    baseball
 
-However, using ``to_string`` will display any DataFrame in tabular form, though
-it won't always fit the console width:
+However, using ``to_string`` will return a string representation of the
+DataFrame in tabular form, though it won't always fit the console width:
 
 .. ipython:: python
 
-   baseball.ix[-20:, :12].to_string()
+   print baseball.ix[-20:, :12].to_string()
+
+DataFrame column types
+~~~~~~~~~~~~~~~~~~~~~~
+
+The four main types stored in pandas objects are float, int, boolean, and
+object. A convenient ``dtypes`` attribute return a Series with the data type of
+each column:
+
+.. ipython:: python
+
+   baseball.dtypes
+
+The related method ``get_dtype_counts`` will return the number of columns of
+each type:
+
+.. ipython:: python
+
+   baseball.get_dtype_counts()
+
+DataFrame column attribute access and IPython completion
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If a DataFrame column label is a valid Python variable name, the column can be
+accessed like attributes:
+
+.. ipython:: python
+
+   df = DataFrame({'foo1' : np.random.randn(5),
+                   'foo2' : np.random.randn(5)})
+   df
+   df.foo1
+
+The columns are also connected to the `IPython <http://ipython.org>`__
+completion mechanism so they can be tab-completed:
+
+.. code-block:: ipython
+
+    In [5]: df.fo<TAB>
+    df.foo1  df.foo2
 
 .. _basics.panel:
 
diff --git a/doc/source/groupby.rst b/doc/source/groupby.rst
@@ -250,6 +250,8 @@ changed by using the ``as_index`` option:
    grouped = df.groupby(['A', 'B'], as_index=False)
    grouped.aggregate(np.sum)
 
+   df.groupby('A', as_index=False).sum()
+
 Note that you could use the ``delevel`` DataFrame function to achieve the same
 result as the column names are stored in the resulting ``MultiIndex``:
 
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -175,7 +175,7 @@ rather than reading the entire file into memory, such as the following:
 .. ipython:: python
    :suppress:
 
-   df[:7].to_csv('tmp.sv', delimiter='|')
+   df[:7].to_csv('tmp.sv', sep='|')
 
 .. ipython:: python
 
diff --git a/doc/source/merging.rst b/doc/source/merging.rst
@@ -14,8 +14,8 @@
 Merging / Joining data sets
 ***************************
 
-Appending disjoint objects
---------------------------
+Appending DataFrame objects
+---------------------------
 
 Series and DataFrame have an ``append`` method which will glue together objects
 each of whose ``index`` (Series labels or DataFrame rows) is mutually
@@ -40,6 +40,27 @@ In the case of DataFrame, the indexes must be disjoint but the columns do not ne
    df2
    df1.append(df2)
 
+Appending record-array like DataFrames
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+For DataFrames which don't have a meaningful index, you may wish to append them
+and ignore the fact that they may have overlapping indexes:
+
+.. ipython:: python
+
+   df1 = DataFrame(randn(6, 4), columns=['A', 'B', 'C', 'D'])
+   df2 = DataFrame(randn(3, 4), columns=['A', 'B', 'C', 'D'])
+
+   df1
+   df2
+
+To do this, use the ``ignore_index`` argument:
+
+.. ipython:: python
+
+   df1.append(df2, ignore_index=True)
+
+
 Joining / merging DataFrames
 ----------------------------
 
diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst
@@ -195,7 +195,7 @@ some very expressive and fast data manipulations.
 Pivot tables and cross-tabulations
 **********************************
 
-The function `pandas.pivot_table` can be used to create spreadsheet-style pivot
+The function ``pandas.pivot_table`` can be used to create spreadsheet-style pivot
 tables. It takes a number of arguments
 
 - ``data``: A DataFrame object