Skip to content

Latest commit

 

History

History
563 lines (418 loc) · 31.2 KB

v0.25.0.rst

File metadata and controls

563 lines (418 loc) · 31.2 KB

What's New in 0.25.0 (April XX, 2019)

Warning

Starting with the 0.25.x series of releases, pandas only supports Python 3.5 and higher. See :ref:`install.dropping-27` for more details.

Warning

Panel has been fully removed. For N-D labeled data structures, please use xarray

{{ header }}

These are the changes in pandas 0.25.0. See :ref:`release` for a full changelog including other versions of pandas.

Other Enhancements

Backwards incompatible API changes

Indexing with date strings with UTC offsets

Indexing a :class:`DataFrame` or :class:`Series` with a :class:`DatetimeIndex` with a date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset is respected in indexing. (:issue:`24076`, :issue:`16785`)

.. ipython:: python

    df = pd.DataFrame([0], index=pd.DatetimeIndex(['2019-01-01'], tz='US/Pacific'))
    df

Previous Behavior:

In [3]: df['2019-01-01 00:00:00+04:00':'2019-01-01 01:00:00+04:00']
Out[3]:
                           0
2019-01-01 00:00:00-08:00  0

New Behavior:

.. ipython:: python

    df['2019-01-01 12:00:00+04:00':'2019-01-01 13:00:00+04:00']

GroupBy.apply on DataFrame evaluates first group only once

The implementation of :meth:`DataFrameGroupBy.apply() <pandas.core.groupby.DataFrameGroupBy.apply>` previously evaluated the supplied function consistently twice on the first group to infer if it is safe to use a fast code path. Particularly for functions with side effects, this was an undesired behavior and may have led to surprises. (:issue:`2936`, :issue:`2656`, :issue:`7739`, :issue:`10519`, :issue:`12155`, :issue:`20084`, :issue:`21417`)

Now every group is evaluated only a single time.

.. ipython:: python

    df = pd.DataFrame({"a": ["x", "y"], "b": [1, 2]})
    df

    def func(group):
        print(group.name)
        return group

Previous Behaviour:

In [3]: df.groupby('a').apply(func)
x
x
y
Out[3]:
   a  b
0  x  1
1  y  2

New Behaviour:

.. ipython:: python

    df.groupby("a").apply(func)


Concatenating Sparse Values

When passed DataFrames whose values are sparse, :func:`concat` will now return a :class:`Series` or :class:`DataFrame` with sparse values, rather than a :class:`SparseDataFrame` (:issue:`25702`).

.. ipython:: python

   df = pd.DataFrame({"A": pd.SparseArray([0, 1])})

Previous Behavior:

In [2]: type(pd.concat([df, df]))
pandas.core.sparse.frame.SparseDataFrame

New Behavior:

.. ipython:: python

   type(pd.concat([df, df]))


This now matches the existing behavior of :class:`concat` on Series with sparse values. :func:`concat` will continue to return a SparseDataFrame when all the values are instances of SparseDataFrame.

This change also affects routines using :func:`concat` internally, like :func:`get_dummies`, which now returns a :class:`DataFrame` in all cases (previously a SparseDataFrame was returned if all the columns were dummy encoded, and a :class:`DataFrame` otherwise).

Providing any SparseSeries or SparseDataFrame to :func:`concat` will cause a SparseSeries or SparseDataFrame to be returned, as before.

Incompatible Index Type Unions

When performing :func:`Index.union` operations between objects of incompatible dtypes, the result will be a base :class:`Index` of dtype object. This behavior holds true for unions between :class:`Index` objects that previously would have been prohibited. The dtype of empty :class:`Index` objects will now be evaluated before performing union operations rather than simply returning the other :class:`Index` object. :func:`Index.union` can now be considered commutative, such that A.union(B) == B.union(A) (:issue:`23525`).

Previous Behavior:

In [1]: pd.period_range('19910905', periods=2).union(pd.Int64Index([1, 2, 3])) ... ValueError: can only call with other PeriodIndex-ed objects

In [2]: pd.Index([], dtype=object).union(pd.Index([1, 2, 3])) Out[2]: Int64Index([1, 2, 3], dtype='int64')

New Behavior:

.. ipython:: python

    pd.period_range('19910905', periods=2).union(pd.Int64Index([1, 2, 3]))
    pd.Index([], dtype=object).union(pd.Index([1, 2, 3]))

DataFrame groupby ffill/bfill no longer return group labels

The methods ffill, bfill, pad and backfill of :class:`DataFrameGroupBy <pandas.core.groupby.DataFrameGroupBy>` previously included the group labels in the return value, which was inconsistent with other groupby transforms. Now only the filled values are returned. (:issue:`21521`)

.. ipython:: python

    df = pd.DataFrame({"a": ["x", "y"], "b": [1, 2]})
    df

Previous Behaviour:

In [3]: df.groupby("a").ffill()
Out[3]:
   a  b
0  x  1
1  y  2

New Behaviour:

.. ipython:: python

    df.groupby("a").ffill()


__str__ methods now call __repr__ rather than vica-versa

Pandas has until now mostly defined string representations in a Pandas objects's __str__/__unicode__/__bytes__ methods, and called __str__ from the __repr__ method, if a specific __repr__ method is not found. This is not needed for Python3. In Pandas 0.25, the string representations of Pandas objects are now generally defined in __repr__, and calls to __str__ in general now pass the call on to the __repr__, if a specific __str__ method doesn't exist, as is standard for Python. This change is backward compatible for direct usage of Pandas, but if you subclass Pandas objects and give your subclasses specific __str__/__repr__ methods, you may have to adjust your __str__/__repr__ methods (:issue:`26495`).

Increased minimum versions for dependencies

Due to dropping support for Python 2.7, a number of optional dependencies have updated minimum versions (:issue:`25725`, :issue:`24942`, :issue:`25752`). Independently, some minimum supported versions of dependencies were updated (:issue:`23519`, :issue:`25554`). If installed, we now require:

Package Minimum Version Required
numpy 1.13.3 X
pytz 2015.4 X
bottleneck 1.2.1  
numexpr 2.6.2  
pytest (dev) 4.0.2  

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version
fastparquet 0.2.1
matplotlib 2.2.2
openpyxl 2.4.0
pyarrow 0.9.0
pytables 3.4.2
scipy 0.19.0
sqlalchemy 1.1.4
xarray 0.8.2
xlrd 1.0.0
xlsxwriter 0.7.7
xlwt 1.0.0

Other API Changes

Deprecations

Sparse Subclasses

The SparseSeries and SparseDataFrame subclasses are deprecated. Their functionality is better-provided by a Series or DataFrame with sparse values.

Previous Way

.. ipython:: python
   :okwarning:

   df = pd.SparseDataFrame({"A": [0, 0, 1, 2]})
   df.dtypes

New Way

.. ipython:: python

   df = pd.DataFrame({"A": pd.SparseArray([0, 0, 1, 2])})
   df.dtypes

The memory usage of the two approaches is identical. See :ref:`sparse.migration` for more (:issue:`19239`).

Other Deprecations

Removal of prior version deprecations/changes

Performance Improvements

Bug Fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Plotting

Groupby/Resample/Rolling

Reshaping

Sparse

Other

Contributors

.. contributors:: v0.24.x..HEAD