doc/source/v0.14.0.txt

.. _whatsnew_0140:

v0.14.0 (May ? , 2014)
----------------------

This is a major release from 0.13.1 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

Highlights include:

- MultIndexing Using Slicers
- Joining a singly-indexed DataFrame with a multi-indexed DataFrame

API changes
~~~~~~~~~~~

- ``iloc`` will now accept out-of-bounds indexers, e.g. a value that exceeds the length of the object being
  indexed. These will be excluded. This will make pandas conform more with pandas/numpy indexing of out-of-bounds
  values. A single indexer that is out-of-bounds and drops the dimensions of the object will still raise
  ``IndexError`` (:issue:`6296`). This could result in an empty axis (e.g. an empty DataFrame being returned)

  .. ipython:: python

      df = DataFrame(np.random.randn(5,2),columns=list('AB'))
      df
      df.iloc[[4,5,6]]
      df.iloc[4:6]
      df.iloc[:,2:3]
      df.iloc[:,1:3]

- The ``DataFrame.interpolate()`` ``downcast`` keyword default has been changed from ``infer`` to
  ``None``. This is to preseve the original dtype unless explicitly requested otherwise (:issue:`6290`).
- allow a Series to utilize index methods depending on its index type, e.g. ``Series.year`` is now defined
  for a Series with a ``DatetimeIndex`` or a ``PeriodIndex``; trying this on a non-supported Index type will
  now raise a ``TypeError``. (:issue:`4551`, :issue:`4056`, :issue:`5519`)

  The following affected:

  - ``date,time,year,month,day``
  - ``hour,minute,second,weekofyear``
  - ``week,dayofweek,dayofyear,quarter``
  - ``microsecond,nanosecond,qyear``
  - ``min(),max()``
  - ``pd.infer_freq()``

  .. ipython:: python

     s = Series(np.random.randn(5),index=tm.makeDateIndex(5))
     s
     s.year
     s.index.year

- Local variable usage has changed in
  :func:`pandas.eval`/:meth:`DataFrame.eval`/:meth:`DataFrame.query`
  (:issue:`5987`). For the :class:`~pandas.DataFrame` methods, two things have
  changed

   - Column names are now given precedence over locals
   - Local variables must be referred to explicitly. This means that even if
     you have a local variable that is *not* a column you must still refer to
     it with the ``'@'`` prefix.
   - You can have an expression like ``df.query('@a < a')`` with no complaints
     from ``pandas`` about ambiguity of the name ``a``.

- The top-level :func:`pandas.eval` function does not allow you use the
  ``'@'`` prefix and provides you with an error message telling you so.
- ``NameResolutionError`` was removed because it isn't necessary anymore.

MultiIndexing Using Slicers
~~~~~~~~~~~~~~~~~~~~~~~~~~~

In 0.14.0 we added a new way to slice multi-indexed objects.
You can slice a multi-index by providing multiple indexers.

You can provide any of the selectors as if you are indexing by label, see :ref:`Selection by Label <indexing.label>`,
including slices, lists of labels, labels, and boolean indexers.

You can use ``slice(None)`` to select all the contents of *that* level. You do not need to specify all the
*deeper* levels, they will be implied as ``slice(None)``.

As usual, **both sides** of the slicers are included as this is label indexing.

See :ref:`the docs<indexing.mi_slicers>`
See also issues (:issue:`6134`, :issue:`4036`, :issue:`3057`, :issue:`2598`, :issue:`5641`)

.. warning::

   You should specify all axes in the ``.loc`` specifier, meaning the indexer for the **index** and
   for the **columns**. Their are some ambiguous cases where the passed indexer could be mis-interpreted
   as indexing *both* axes, rather than into say the MuliIndex for the rows.

   You should do this:

   .. code-block:: python

      df.loc[(slice('A1','A3'),.....),:]

   rather than this:

   .. code-block:: python

      df.loc[(slice('A1','A3'),.....)]

.. warning::

   You will need to make sure that the selection axes are fully lexsorted!

.. ipython:: python

   def mklbl(prefix,n):
       return ["%s%s" % (prefix,i)  for i in range(n)]

   index = MultiIndex.from_product([mklbl('A',4),
                                    mklbl('B',2),
                                    mklbl('C',4),
                                    mklbl('D',2)])
   columns = MultiIndex.from_tuples([('a','foo'),('a','bar'),
                                     ('b','foo'),('b','bah')],
                                      names=['lvl0', 'lvl1'])
   df = DataFrame(np.arange(len(index)*len(columns)).reshape((len(index),len(columns))),
                  index=index,
                  columns=columns).sortlevel().sortlevel(axis=1)
   df

Basic multi-index slicing using slices, lists, and labels.

.. ipython:: python

   df.loc[(slice('A1','A3'),slice(None), ['C1','C3']),:]

You can use a ``pd.IndexSlice`` to shortcut the creation of these slices

.. ipython:: python

   idx = pd.IndexSlice
   df.loc[idx[:,:,['C1','C3']],idx[:,'foo']]

It is possible to perform quite complicated selections using this method on multiple
axes at the same time.

.. ipython:: python

   df.loc['A1',(slice(None),'foo')]
   df.loc[idx[:,:,['C1','C3']],idx[:,'foo']]

Using a boolean indexer you can provide selection related to the *values*.

.. ipython:: python

   mask = df[('a','foo')]>200
   df.loc[idx[mask,:,['C1','C3']],idx[:,'foo']]

You can also specify the ``axis`` argument to ``.loc`` to interpret the passed
slicers on a single axis.

.. ipython:: python

   df.loc(axis=0)[:,:,['C1','C3']]

Furthermore you can *set* the values using these methods

.. ipython:: python

   df2 = df.copy()
   df2.loc(axis=0)[:,:,['C1','C3']] = -10
   df2

You can use a right-hand-side of an alignable object as well.

.. ipython:: python

   df2 = df.copy()
   df2.loc[idx[:,:,['C1','C3']],:] = df2*1000
   df2

Prior Version Deprecations/Changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

There are no announced changes in 0.13.1 or prior that are taking effect as of 0.14.0

Deprecations
~~~~~~~~~~~~

There are no deprecations of prior behavior in 0.14.0

Enhancements
~~~~~~~~~~~~

- pd.read_clipboard will, if 'sep' is unspecified, try to detect data copied from a spreadsheet
  and parse accordingly. (:issue:`6223`)
- ``plot(legend='reverse')`` will now reverse the order of legend labels for
  most plot kinds. (:issue:`6014`)
- improve performance of slice indexing on Series with string keys (:issue:`6341`, :issue:`6372`)
- Hexagonal bin plots from ``DataFrame.plot`` with ``kind='hexbin'`` (:issue:`5478`)
- Joining a singly-indexed DataFrame with a multi-indexed DataFrame (:issue:`3662`)

  See :ref:`the docs<merging.join_on_mi>`. Joining multi-index DataFrames on both the left and right is not yet supported ATM.

  .. ipython:: python

     household = DataFrame(dict(household_id = [1,2,3],
                                male = [0,1,0],
                                wealth = [196087.3,316478.7,294750]),
                           columns = ['household_id','male','wealth']
                          ).set_index('household_id')
     household
     portfolio = DataFrame(dict(household_id = [1,2,2,3,3,3,4],
                                asset_id = ["nl0000301109","nl0000289783","gb00b03mlx29",
                                            "gb00b03mlx29","lu0197800237","nl0000289965",np.nan],
                                name = ["ABN Amro","Robeco","Royal Dutch Shell","Royal Dutch Shell",
                                        "AAB Eastern Europe Equity Fund","Postbank BioTech Fonds",np.nan],
                                share = [1.0,0.4,0.6,0.15,0.6,0.25,1.0]),
                           columns = ['household_id','asset_id','name','share']
                          ).set_index(['household_id','asset_id'])
     portfolio

     household.join(portfolio, how='inner')

- ``quotechar``, ``doublequote``, and ``escapechar`` can now be specified when
  using ``DataFrame.to_csv`` (:issue:`5414`, :issue:`4528`)

Performance
~~~~~~~~~~~

Experimental
~~~~~~~~~~~~

There are no experimental changes in 0.14.0

Bug Fixes
~~~~~~~~~

See :ref:`V0.14.0 Bug Fixes<release.bug_fixes-0.14.0>` for an extensive list of bugs that have been fixed in 0.14.0.

See the :ref:`full release notes
<release>` or issue tracker
on GitHub for a complete list of all API changes, Enhancements and Bug Fixes.