Merge pull request #10101 from jorisvandenbossche/v0.16.1-docs

jorisvandenbossche · jorisvandenbossche · commit 3298528a956e · 2015-05-14T16:52:38.000+02:00
v0.16.1 docs
diff --git a/doc/source/basics.rst b/doc/source/basics.rst
@@ -236,7 +236,7 @@ see :ref:`here<indexing.boolean>`
 Boolean Reductions
 ~~~~~~~~~~~~~~~~~~
 
-    You can apply the reductions: :attr:`~DataFrame.empty`, :meth:`~DataFrame.any`,
+You can apply the reductions: :attr:`~DataFrame.empty`, :meth:`~DataFrame.any`,
 :meth:`~DataFrame.all`, and :meth:`~DataFrame.bool` to provide a
 way to summarize a boolean result.
 
diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst
@@ -813,12 +813,16 @@ basic type) and applying along columns will also convert to object.
     df.apply(lambda row: type(row["cats"]), axis=1)
     df.apply(lambda col: col.dtype, axis=0)
 
-No Categorical Index
-~~~~~~~~~~~~~~~~~~~~
+Categorical Index
+~~~~~~~~~~~~~~~~~
+
+.. versionadded:: 0.16.1
+
+A new ``CategoricalIndex`` index type is introduced in version 0.16.1. See the
+:ref:`advanced indexing docs <indexing.categoricalindex>` for a more detailed
+explanation.
 
-There is currently no index of type ``category``, so setting the index to categorical column will
-convert the categorical data to a "normal" dtype first and therefore remove any custom
-ordering of the categories:
+Setting the index, will create create a ``CategoricalIndex``
 
 .. ipython:: python
 
@@ -827,13 +831,12 @@ ordering of the categories:
     values = [4,2,3,1]
     df = DataFrame({"strings":strings, "values":values}, index=cats)
     df.index
-    # This should sort by categories but does not as there is no CategoricalIndex!
+    # This now sorts by the categories order
     df.sort_index()
 
-.. note::
-    This could change if a `CategoricalIndex` is implemented (see
-    https://github.com/pydata/pandas/issues/7629)
-
+In previous versions (<0.16.1) there is no index of type ``category``, so
+setting the index to categorical column will convert the categorical data to a
+"normal" dtype first and therefore remove any custom ordering of the categories.
 
 Side Effects
 ~~~~~~~~~~~~
diff --git a/doc/source/contributing.rst b/doc/source/contributing.rst
@@ -113,10 +113,10 @@ This creates the directory `pandas-yourname` and connects your repository to
 the upstream (main project) *pandas* repository.
 
 The testing suite will run automatically on Travis-CI once your Pull Request is
-submitted.  However, if you wish to run the test suite on a branch prior to 
+submitted.  However, if you wish to run the test suite on a branch prior to
 submitting the Pull Request, then Travis-CI needs to be hooked up to your
 GitHub repository.  Instructions are for doing so are `here
-<http://about.travis-ci.org/docs/user/getting-started/>`_.
+<http://about.travis-ci.org/docs/user/getting-started/>`__.
 
 Creating a Branch
 -----------------
@@ -219,7 +219,7 @@ To return to you home root environment:
       deactivate
 
 See the full ``conda`` docs `here
-<http://conda.pydata.org/docs>`_.
+<http://conda.pydata.org/docs>`__.
 
 At this point you can easily do an *in-place* install, as detailed in the next section.
 
@@ -372,7 +372,7 @@ If you want to do a full clean build, do::
 Starting with 0.13.1 you can tell ``make.py`` to compile only a single section
 of the docs, greatly reducing the turn-around time for checking your changes.
 You will be prompted to delete `.rst` files that aren't required.  This is okay
-since the prior version can be checked out from git, but make sure to 
+since the prior version can be checked out from git, but make sure to
 not commit the file deletions.
 
 ::
@@ -401,7 +401,7 @@ Built Master Branch Documentation
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
 When pull-requests are merged into the pandas *master* branch, the main parts of the documentation are
-also built by Travis-CI. These docs are then hosted `here <http://pandas-docs.github.io/pandas-docs-travis>`_.
+also built by Travis-CI. These docs are then hosted `here <http://pandas-docs.github.io/pandas-docs-travis>`__.
 
 Contributing to the code base
 =============================
diff --git a/doc/source/install.rst b/doc/source/install.rst
@@ -35,7 +35,7 @@ pandas at all.
 Simply create an account, and have access to pandas from within your brower via
 an `IPython Notebook <http://ipython.org/notebook.html>`__ in a few minutes.
 
-.. _install.anaconda
+.. _install.anaconda:
 
 Installing pandas with Anaconda
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -68,7 +68,7 @@ admin rights to install it, it will install in the user's home directory, and
 this also makes it trivial to delete Anaconda at a later date (just delete
 that folder).
 
-.. _install.miniconda
+.. _install.miniconda:
 
 Installing pandas with Miniconda
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
diff --git a/doc/source/text.rst b/doc/source/text.rst
@@ -82,11 +82,11 @@ Elements in the split lists can be accessed using ``get`` or ``[]`` notation:
    s2.str.split('_').str.get(1)
    s2.str.split('_').str[1]
 
-Easy to expand this to return a DataFrame using ``return_type``.
+Easy to expand this to return a DataFrame using ``expand``.
 
 .. ipython:: python
 
-   s2.str.split('_', return_type='frame')
+   s2.str.split('_', expand=True)
 
 Methods like ``replace`` and ``findall`` take `regular expressions
 <https://docs.python.org/2/library/re.html>`__, too:
diff --git a/doc/source/visualization.rst b/doc/source/visualization.rst
@@ -220,8 +220,8 @@ Histogram can be drawn specifying ``kind='hist'``.
 
 .. ipython:: python
 
-   df4 = pd.DataFrame({'a': randn(1000) + 1, 'b': randn(1000),
-                       'c': randn(1000) - 1}, columns=['a', 'b', 'c'])
+   df4 = pd.DataFrame({'a': np.random.randn(1000) + 1, 'b': np.random.randn(1000),
+                       'c': np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])
 
    plt.figure();
 
diff --git a/doc/source/whatsnew/v0.16.1.txt b/doc/source/whatsnew/v0.16.1.txt
@@ -31,44 +31,6 @@ Highlights include:
 Enhancements
 ~~~~~~~~~~~~
 
-- ``BusinessHour`` offset is now supported, which represents business hours starting from 09:00 - 17:00 on ``BusinessDay`` by default. See :ref:`Here <timeseries.businesshour>` for details. (:issue:`7905`)
-
-  .. ipython:: python
-
-     Timestamp('2014-08-01 09:00') + BusinessHour()
-     Timestamp('2014-08-01 07:00') + BusinessHour()
-     Timestamp('2014-08-01 16:30') + BusinessHour()
-
-- ``DataFrame.diff`` now takes an ``axis`` parameter that determines the direction of differencing (:issue:`9727`)
-
-- Allow ``clip``, ``clip_lower``, and ``clip_upper`` to accept array-like arguments as thresholds (This is a regression from 0.11.0). These methods now have an ``axis`` parameter which determines how the Series or DataFrame will be aligned with the threshold(s). (:issue:`6966`)
-
-- ``DataFrame.mask()`` and ``Series.mask()`` now support same keywords as ``where`` (:issue:`8801`)
-
-- ``drop`` function can now accept ``errors`` keyword to suppress ``ValueError`` raised when any of label does not exist in the target data. (:issue:`6736`)
-
-  .. ipython:: python
-
-    df = DataFrame(np.random.randn(3, 3), columns=['A', 'B', 'C'])
-    df.drop(['A', 'X'], axis=1, errors='ignore')
-
-- Allow conversion of values with dtype ``datetime64`` or ``timedelta64`` to strings using ``astype(str)`` (:issue:`9757`)
-- ``get_dummies`` function now accepts ``sparse`` keyword.  If set to ``True``, the return ``DataFrame`` is sparse, e.g. ``SparseDataFrame``. (:issue:`8823`)
-- ``Period`` now accepts ``datetime64`` as value input. (:issue:`9054`)
-
-- Allow timedelta string conversion when leading zero is missing from time definition, ie `0:00:00` vs `00:00:00`. (:issue:`9570`)
-- Allow ``Panel.shift`` with ``axis='items'`` (:issue:`9890`)
-
-- Trying to write an excel file now raises ``NotImplementedError`` if the ``DataFrame`` has a ``MultiIndex`` instead of writing a broken Excel file. (:issue:`9794`)
-- Allow ``Categorical.add_categories`` to accept ``Series`` or ``np.array``. (:issue:`9927`)
-
-- Add/delete ``str/dt/cat`` accessors dynamically from ``__dir__``. (:issue:`9910`)
-- Add ``normalize`` as a ``dt`` accessor method. (:issue:`10047`)
-
-- ``DataFrame`` and ``Series`` now have ``_constructor_expanddim`` property as overridable constructor for one higher dimensionality data. This should be used only when it is really needed, see :ref:`here <ref-subclassing-pandas>`
-
-- ``pd.lib.infer_dtype`` now returns ``'bytes'`` in Python 3 where appropriate. (:issue:`10032`)
-
 .. _whatsnew_0161.enhancements.categoricalindex:
 
 CategoricalIndex
@@ -188,16 +150,6 @@ String Methods Enhancements
 :ref:`Continuing from v0.16.0 <whatsnew_0160.enhancements.string>`, the following
 enhancements make string operations easier and more consistent with standard python string operations.
 
-- The following new methods are accesible via ``.str`` accessor to apply the function to each values. (:issue:`9766`, :issue:`9773`, :issue:`10031`, :issue:`10045`, :issue:`10052`)
-
-  ================  ===============  ===============  ===============  ================
-  ..                ..               Methods          ..               ..
-  ================  ===============  ===============  ===============  ================
-  ``capitalize()``  ``swapcase()``   ``normalize()``  ``partition()``  ``rpartition()``
-  ``index()``       ``rindex()``     ``translate()``
-  ================  ===============  ===============  ===============  ================
-
-
 
 - Added ``StringMethods`` (``.str`` accessor) to ``Index`` (:issue:`9068`)
 
@@ -220,6 +172,14 @@ enhancements make string operations easier and more consistent with standard pyt
      idx.str.startswith('a')
      s[s.index.str.startswith('a')]
 
+- The following new methods are accesible via ``.str`` accessor to apply the function to each values. (:issue:`9766`, :issue:`9773`, :issue:`10031`, :issue:`10045`, :issue:`10052`)
+
+  ================  ===============  ===============  ===============  ================
+  ..                ..               Methods          ..               ..
+  ================  ===============  ===============  ===============  ================
+  ``capitalize()``  ``swapcase()``   ``normalize()``  ``partition()``  ``rpartition()``
+  ``index()``       ``rindex()``     ``translate()``
+  ================  ===============  ===============  ===============  ================
 
 - ``split`` now takes ``expand`` keyword to specify whether to expand dimensionality. ``return_type`` is deprecated. (:issue:`9847`)
 
@@ -244,14 +204,59 @@ enhancements make string operations easier and more consistent with standard pyt
 
 - Improved ``extract`` and ``get_dummies`` methods for ``Index.str`` (:issue:`9980`)
 
-.. _whatsnew_0161.api:
 
-API changes
-~~~~~~~~~~~
+.. _whatsnew_0161.enhancements.other:
+
+Other Enhancements
+^^^^^^^^^^^^^^^^^^
+
+- ``BusinessHour`` offset is now supported, which represents business hours starting from 09:00 - 17:00 on ``BusinessDay`` by default. See :ref:`Here <timeseries.businesshour>` for details. (:issue:`7905`)
+
+  .. ipython:: python
 
+     from pandas.tseries.offsets import BusinessHour
+     Timestamp('2014-08-01 09:00') + BusinessHour()
+     Timestamp('2014-08-01 07:00') + BusinessHour()
+     Timestamp('2014-08-01 16:30') + BusinessHour()
 
+- ``DataFrame.diff`` now takes an ``axis`` parameter that determines the direction of differencing (:issue:`9727`)
 
+- Allow ``clip``, ``clip_lower``, and ``clip_upper`` to accept array-like arguments as thresholds (This is a regression from 0.11.0). These methods now have an ``axis`` parameter which determines how the Series or DataFrame will be aligned with the threshold(s). (:issue:`6966`)
+
+- ``DataFrame.mask()`` and ``Series.mask()`` now support same keywords as ``where`` (:issue:`8801`)
 
+- ``drop`` function can now accept ``errors`` keyword to suppress ``ValueError`` raised when any of label does not exist in the target data. (:issue:`6736`)
+
+  .. ipython:: python
+
+    df = DataFrame(np.random.randn(3, 3), columns=['A', 'B', 'C'])
+    df.drop(['A', 'X'], axis=1, errors='ignore')
+
+- Add support for separating years and quarters using dashes, for
+  example 2014-Q1.  (:issue:`9688`)
+
+- Allow conversion of values with dtype ``datetime64`` or ``timedelta64`` to strings using ``astype(str)`` (:issue:`9757`)
+- ``get_dummies`` function now accepts ``sparse`` keyword.  If set to ``True``, the return ``DataFrame`` is sparse, e.g. ``SparseDataFrame``. (:issue:`8823`)
+- ``Period`` now accepts ``datetime64`` as value input. (:issue:`9054`)
+
+- Allow timedelta string conversion when leading zero is missing from time definition, ie `0:00:00` vs `00:00:00`. (:issue:`9570`)
+- Allow ``Panel.shift`` with ``axis='items'`` (:issue:`9890`)
+
+- Trying to write an excel file now raises ``NotImplementedError`` if the ``DataFrame`` has a ``MultiIndex`` instead of writing a broken Excel file. (:issue:`9794`)
+- Allow ``Categorical.add_categories`` to accept ``Series`` or ``np.array``. (:issue:`9927`)
+
+- Add/delete ``str/dt/cat`` accessors dynamically from ``__dir__``. (:issue:`9910`)
+- Add ``normalize`` as a ``dt`` accessor method. (:issue:`10047`)
+
+- ``DataFrame`` and ``Series`` now have ``_constructor_expanddim`` property as overridable constructor for one higher dimensionality data. This should be used only when it is really needed, see :ref:`here <ref-subclassing-pandas>`
+
+- ``pd.lib.infer_dtype`` now returns ``'bytes'`` in Python 3 where appropriate. (:issue:`10032`)
+
+
+.. _whatsnew_0161.api:
+
+API changes
+~~~~~~~~~~~
 
 - When passing in an ax to ``df.plot( ..., ax=ax)``, the `sharex` kwarg will now default to `False`.
   The result is that the visibility of xlabels and xticklabels will not anymore be changed. You
@@ -260,16 +265,19 @@ API changes
   If pandas creates the subplots itself (e.g. no passed in `ax` kwarg), then the
   default is still ``sharex=True`` and the visibility changes are applied.
 
-
-
-- Add support for separating years and quarters using dashes, for
-  example 2014-Q1.  (:issue:`9688`)
-
 - :meth:`~pandas.DataFrame.assign` now inserts new columns in alphabetical order. Previously
   the order was arbitrary. (:issue:`9777`)
 
 - By default, ``read_csv`` and ``read_table`` will now try to infer the compression type based on the file extension. Set ``compression=None`` to restore the previous behavior (no decompression). (:issue:`9770`)
 
+.. _whatsnew_0161.deprecations:
+
+Deprecations
+^^^^^^^^^^^^
+
+- ``Series.str.split``'s ``return_type`` keyword was removed in favor of ``expand`` (:issue:`9847`)
+
+
 .. _whatsnew_0161.index_repr:
 
 Index Representation
@@ -303,25 +311,17 @@ New Behavior
 
 .. ipython:: python
 
-   pd.set_option('display.width',100)
-   pd.Index(range(4),name='foo')
-   pd.Index(range(25),name='foo')
-   pd.Index(range(104),name='foo')
-   pd.Index(['datetime', 'sA', 'sB', 'sC', 'flow', 'error', 'temp', 'ref', 'a_bit_a_longer_one']*2)
-   pd.CategoricalIndex(['a','bb','ccc','dddd'],ordered=True,name='foobar')
-   pd.CategoricalIndex(['a','bb','ccc','dddd']*10,ordered=True,name='foobar')
-   pd.CategoricalIndex(['a','bb','ccc','dddd']*100,ordered=True,name='foobar')
-   pd.CategoricalIndex(np.arange(1000),ordered=True,name='foobar')
-   pd.date_range('20130101',periods=4,name='foo',tz='US/Eastern')
-   pd.date_range('20130101',periods=25,name='foo',tz='US/Eastern')
-   pd.date_range('20130101',periods=104,name='foo',tz='US/Eastern')
-
-.. _whatsnew_0161.deprecations:
+   pd.set_option('display.width', 80)
+   pd.Index(range(4), name='foo')
+   pd.Index(range(30), name='foo')
+   pd.Index(range(104), name='foo')
+   pd.CategoricalIndex(['a','bb','ccc','dddd'], ordered=True, name='foobar')
+   pd.CategoricalIndex(['a','bb','ccc','dddd']*10, ordered=True, name='foobar')
+   pd.CategoricalIndex(['a','bb','ccc','dddd']*100, ordered=True, name='foobar')
+   pd.date_range('20130101',periods=4, name='foo', tz='US/Eastern')
+   pd.date_range('20130101',periods=25, freq='D')
+   pd.date_range('20130101',periods=104, name='foo', tz='US/Eastern')
 
-Deprecations
-^^^^^^^^^^^^
-
-- ``Series.str.split``'s ``return_type`` keyword was removed in favor of ``expand`` (:issue:`9847`)
 
 .. _whatsnew_0161.performance:
 
@@ -333,7 +333,6 @@ Performance Improvements
 - Improved the performance of ``pd.lib.max_len_string_array`` by 5-7x (:issue:`10024`)
 
 
-
 .. _whatsnew_0161.bug_fixes:
 
 Bug Fixes
@@ -361,7 +360,6 @@ Bug Fixes
 - Bug where repeated plotting of ``DataFrame`` with a ``DatetimeIndex`` may raise ``TypeError`` (:issue:`9852`)
 - Bug in ``setup.py`` that would allow an incompat cython version to build (:issue:`9827`)
 - Bug in plotting ``secondary_y`` incorrectly attaches ``right_ax`` property to secondary axes specifying itself recursively. (:issue:`9861`)
-
 - Bug in ``Series.quantile`` on empty Series of type ``Datetime`` or ``Timedelta`` (:issue:`9675`)
 - Bug in ``where`` causing incorrect results when upcasting was required (:issue:`9731`)
 - Bug in ``FloatArrayFormatter`` where decision boundary for displaying "small" floats in decimal format is off by one order of magnitude for a given display.precision (:issue:`9764`)
@@ -372,20 +370,13 @@ Bug Fixes
 - Bug in index equality comparisons using ``==`` failing on Index/MultiIndex type incompatibility (:issue:`9785`)
 - Bug in which ``SparseDataFrame`` could not take `nan` as a column name (:issue:`8822`)
 - Bug in ``to_msgpack`` and ``read_msgpack`` zlib and blosc compression support (:issue:`9783`)
-
 - Bug ``GroupBy.size`` doesn't attach index name properly if grouped by ``TimeGrouper`` (:issue:`9925`)
 - Bug causing an exception in slice assignments because ``length_of_indexer`` returns wrong results (:issue:`9995`)
 - Bug in csv parser causing lines with initial whitespace plus one non-space character to be skipped. (:issue:`9710`)
 - Bug in C csv parser causing spurious NaNs when data started with newline followed by whitespace. (:issue:`10022`)
-
 - Bug causing elements with a null group to spill into the final group when grouping by a ``Categorical`` (:issue:`9603`)
 - Bug where .iloc and .loc behavior is not consistent on empty dataframes (:issue:`9964`)
-
 - Bug in invalid attribute access on a ``TimedeltaIndex`` incorrectly raised ``ValueError`` instead of ``AttributeError`` (:issue:`9680`)
-
-
-
-
 - Bug in unequal comparisons between categorical data and a scalar, which was not in the categories (e.g. ``Series(Categorical(list("abc"), ordered=True)) > "d"``. This returned ``False`` for all elements, but now raises a ``TypeError``. Equality comparisons also now return ``False`` for ``==`` and ``True`` for ``!=``. (:issue:`9848`)
 - Bug in DataFrame ``__setitem__`` when right hand side is a dictionary (:issue:`9874`)
 - Bug in ``where`` when dtype is ``datetime64/timedelta64``, but dtype of other is not (:issue:`9804`)
@@ -394,25 +385,13 @@ Bug Fixes
 - Bug in ``DataFrame`` constructor when ``columns`` parameter is set, and ``data`` is an empty list (:issue:`9939`)
 - Bug in bar plot with ``log=True`` raises ``TypeError`` if all values are less than 1 (:issue:`9905`)
 - Bug in horizontal bar plot ignores ``log=True`` (:issue:`9905`)
-
-
-
 - Bug in PyTables queries that did not return proper results using the index (:issue:`8265`, :issue:`9676`)
-
-
-
-
 - Bug where dividing a dataframe containing values of type ``Decimal`` by another ``Decimal`` would raise. (:issue:`9787`)
 - Bug where using DataFrames asfreq would remove the name of the index. (:issue:`9885`)
 - Bug causing extra index point when resample BM/BQ (:issue:`9756`)
 - Changed caching in ``AbstractHolidayCalendar`` to be at the instance level rather than at the class level as the latter can result in unexpected behaviour. (:issue:`9552`)
-
 - Fixed latex output for multi-indexed dataframes (:issue:`9778`)
 - Bug causing an exception when setting an empty range using ``DataFrame.loc`` (:issue:`9596`)
-
-
-
-
 - Bug in hiding ticklabels with subplots and shared axes when adding a new plot to an existing grid of axes (:issue:`9158`)
 - Bug in ``transform`` and ``filter`` when grouping on a categorical variable (:issue:`9921`)
 - Bug in ``transform`` when groups are equal in number and dtype to the input index (:issue:`9700`)
diff --git a/pandas/core/strings.py b/pandas/core/strings.py