doc/source/whatsnew/v0.15.1.txt

.. _whatsnew_0151:

v0.15.1 (November ??, 2014)
-----------------------

This is a minor release from 0.15.0 and includes a small number of API changes, several new features,
enhancements, and performance improvements along with a large number of bug fixes. We recommend that all
users upgrade to this version.

- Highlights include:

- :ref:`Enhancements <whatsnew_0151.enhancements>`
- :ref:`API Changes <whatsnew_0151.api>`
- :ref:`Performance Improvements <whatsnew_0151.performance>`
- :ref:`Experimental Changes <whatsnew_0151.experimental>`
- :ref:`Bug Fixes <whatsnew_0151.bug_fixes>`

.. _whatsnew_0151.api:

API changes
~~~~~~~~~~~

- Represent ``MultiIndex`` labels with a dtype that utilizes memory based on the level size. In prior versions, the memory usage was a constant 8 bytes per element in each level. In addition, in prior versions, the *reported* memory usage was incorrect as it didn't show the usage for the memory occupied by the underling data array. (:issue:`8456`)

  .. ipython:: python

     dfi = DataFrame(1,index=pd.MultiIndex.from_product([['a'],range(1000)]),columns=['A'])

  previous behavior:

  .. code-block:: python

     # this was underreported and actually took (in < 0.15.1) about 24008 bytes
     In [1]: dfi.memory_usage(index=True)
     Out[1]:
     Index    8000
     A        8000
     dtype: int64


  current behavior:

  .. ipython:: python

     dfi.memory_usage(index=True)

- ``groupby`` with ``as_index=False`` will not add erroneous extra columns to
  result (:issue:`8582`):

  .. ipython:: python

    np.random.seed(2718281)
    df = pd.DataFrame(np.random.randint(0, 100, (10, 2)),
                      columns=['jim', 'joe'])
    df.head()

    ts = pd.Series(5 * np.random.randint(0, 3, 10))

  previous behavior:

  .. code-block:: python

     In [4]: df.groupby(ts, as_index=False).max()
     Out[4]:
        NaN  jim  joe
     0    0   72   83
     1    5   77   84
     2   10   96   65

  current behavior:

  .. ipython:: python

    df.groupby(ts, as_index=False).max()

- ``groupby`` will not erroneously exclude columns if the column name conflics
  with the grouper name (:issue:`8112`):

  .. ipython:: python

     df = pd.DataFrame({'jim': range(5), 'joe': range(5, 10)})
     df
     gr = df.groupby(df['jim'] < 2)

  previous behavior (excludes 1st column from output):

  .. code-block:: python

     In [4]: gr.apply(sum)
     Out[4]:
            joe
     jim
     False   24
     True    11

  current behavior:

  .. ipython:: python

     gr.apply(sum)

- ``concat`` permits a wider variety of iterables of pandas objects to be
  passed as the first parameter (:issue:`8645`):

  .. ipython:: python

     from collections import deque
     df1 = pd.DataFrame([1, 2, 3])
     df2 = pd.DataFrame([4, 5, 6])

  previous behavior:

  .. code-block:: python

     In [7]: pd.concat(deque((df1, df2)))
     TypeError: first argument must be a list-like of pandas objects, you passed an object of type "deque"

  current behavior:

  .. ipython:: python

     pd.concat(deque((df1, df2)))

.. _whatsnew_0151.enhancements:

Enhancements
~~~~~~~~~~~~

- Added option to select columns when importing Stata files (:issue:`7935`)
- Qualify memory usage in ``DataFrame.info()`` by adding ``+`` if it is a lower bound (:issue:`8578`)
- Raise errors in certain aggregation cases where an argument such as ``numeric_only`` is not handled (:issue:`8592`).


- Added support for 3-character ISO and non-standard country codes in :func:``io.wb.download()`` (:issue:`8482`)
- :ref:`World Bank data requests <remote_data.wb>` now will warn/raise based on an ``errors`` argument, as well as a list of hard-coded country codes and the World Bank's JSON response.  In prior versions, the error messages didn't look at the World Bank's JSON response.  Problem-inducing input were simply dropped prior to the request.  The issue was that many good countries were cropped in the hard-coded approach.  All countries will work now, but some bad countries will raise exceptions because some edge cases break the entire response. (:issue:`8482`)

.. _whatsnew_0151.performance:

Performance
~~~~~~~~~~~


.. _whatsnew_0151.experimental:

Experimental
~~~~~~~~~~~~

.. _whatsnew_0151.bug_fixes:

Bug Fixes
~~~~~~~~~


- Bug in coercing ``Categorical`` to a records array, e.g. ``df.to_records()`` (:issue:`8626`)
- Bug in ``Categorical`` not created properly with ``Series.to_frame()`` (:issue:`8626`)
- Bug in coercing in astype of a ``Categorical`` of a passed ``pd.Categorical`` (this now raises ``TypeError`` correctly), (:issue:`8626`)
- Bug in ``cut``/``qcut`` when using ``Series`` and ``retbins=True`` (:issue:`8589`)


- Bug in numeric index operations of add/sub with Float/Index Index with numpy arrays (:issue:`8608`
- Bug in setitem with empty indexer and unwanted coercion of dtypes (:issue:`8669`)


- Bug in ix/loc block splitting on setitem (manifests with integer-like dtypes, e.g. datetime64) (:issue:`8607`)


- Fix ``shape`` attribute for ``MultiIndex`` (:issue:`8609`)
- Bug in ``GroupBy`` where a name conflict between the grouper and columns
  would break ``groupby`` operations (:issue:`7115`, :issue:`8112`)


- Fixed a bug where plotting a column ``y`` and specifying a label would mutate the index name of the original DataFrame (:issue:`8494`)

- Bug in ``date_range`` where partially-specified dates would incorporate current date (:issue:`6961`)