Skip to content

Latest commit

 

History

History
408 lines (290 loc) · 16.3 KB

v0.25.0.rst

File metadata and controls

408 lines (290 loc) · 16.3 KB

What's New in 0.25.0 (April XX, 2019)

Warning

Starting with the 0.25.x series of releases, pandas only supports Python 3.5 and higher. See :ref:`install.dropping-27` for more details.

Warning

Panel has been fully removed. For N-D labeled data structures, please use xarray

{{ header }}

These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog including other versions of pandas.

Other Enhancements

Backwards incompatible API changes

Indexing with date strings with UTC offsets

Indexing a :class:`DataFrame` or :class:`Series` with a :class:`DatetimeIndex` with a date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset is respected in indexing. (:issue:`24076`, :issue:`16785`)

Previous Behavior:

In [1]: df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))

In [2]: df
Out[2]:
                           0
2019-01-01 00:00:00-08:00  0

In [3]: df['2019-01-01 00:00:00+04:00':'2019-01-01 01:00:00+04:00']
Out[3]:
                           0
2019-01-01 00:00:00-08:00  0

New Behavior:

.. ipython:: ipython

    df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
    df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']

GroupBy.apply on DataFrame evaluates first group only once

The implementation of :meth:`DataFrameGroupBy.apply() <pandas.core.groupby.DataFrameGroupBy.apply>` previously evaluated the supplied function consistently twice on the first group to infer if it is safe to use a fast code path. Particularly for functions with side effects, this was an undesired behavior and may have led to surprises.

(:issue:`2936`, :issue:`2656`, :issue:`7739`, :issue:`10519`, :issue:`12155`, :issue:`20084`, :issue:`21417`)

Now every group is evaluated only a single time.

.. ipython:: python

    df = pd.DataFrame({"a": ["x", "y"], "b": [1, 2]})
    df

    def func(group):
        print(group.name)
        return group

Previous Behaviour:

In [3]: df.groupby('a').apply(func)
x
x
y
Out[3]:
   a  b
0  x  1
1  y  2

New Behaviour:

.. ipython:: python

    df.groupby("a").apply(func)


Concatenating Sparse Values

When passed DataFrames whose values are sparse, :func:`concat` will now return a Series or DataFrame with sparse values, rather than a SparseDataFrame (:issue:`25702`).

.. ipython:: python

   df = pd.DataFrame({"A": pd.SparseArray([0, 1])})

Previous Behavior:

In [2]: type(pd.concat([df, df]))
pandas.core.sparse.frame.SparseDataFrame

New Behavior:

.. ipython:: python

   type(pd.concat([df, df]))


This now matches the existing behavior of :class:`concat` on Series with sparse values. :func:`concat` will continue to return a SparseDataFrame when all the values are instances of SparseDataFrame.

This change also affects routines using :func:`concat` internally, like :func:`get_dummies`, which now returns a :class:`DataFrame` in all cases (previously a SparseDataFrame was returned if all the columns were dummy encoded, and a :class:`DataFrame` otherwise).

Providing any SparseSeries or SparseDataFrame to :func:`concat` will cause a SparseSeries or SparseDataFrame to be returned, as before.

Increased minimum versions for dependencies

Due to dropping support for Python 2.7, a number of optional dependencies have updated minimum versions (issue:25725, :issue:`24942`, :issue:`25752`). Independently, some minimum supported versions of dependencies were updated (:issue:`23519`, :issue:`25554`). If installed, we now require:

Package Minimum Version Required
numpy 1.13.3 X
pytz 2015.4 X
bottleneck 1.2.1  
numexpr 2.6.2  
pytest (dev) 4.0.2  

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version
fastparquet 0.2.1
matplotlib 2.2.2
openpyxl 2.4.0
pyarrow 0.9.0
pytables 3.4.2
scipy 0.19.0
sqlalchemy 1.1.4
xarray 0.8.2
xlrd 1.0.0
xlsxwriter 0.7.7
xlwt 1.0.0

Other API Changes

Deprecations

Removal of prior version deprecations/changes

Performance Improvements

Bug Fixes

Categorical

Datetimelike

  • Bug in :func:`to_datetime` which would raise an (incorrect) ValueError when called with a date far into the future and the format argument specified instead of raising OutOfBoundsDatetime (:issue:`23830`)

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Plotting

Groupby/Resample/Rolling

Reshaping

Sparse

  • Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0 (:issue:`24985`)
  • Bug in :class:`SparseFrame` constructor where passing None as the data would cause default_fill_value to be ignored (:issue:`16807`)

Other

Contributors

.. contributors:: v0.24.x..HEAD