Skip to content

Latest commit

 

History

History
576 lines (417 loc) · 28.8 KB

v1.1.0.rst

File metadata and controls

576 lines (417 loc) · 28.8 KB

What's new in 1.1.0 (??)

These are the changes in pandas 1.1.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Enhancements

Nonmonotonic PeriodIndex Partial String Slicing

:class:`PeriodIndex` now supports partial string slicing for non-monotonic indexes, mirroring :class:`DatetimeIndex` behavior (:issue:`31096`)

For example:

.. ipython:: python

   dti = pd.date_range("2014-01-01", periods=30, freq="30D")
   pi = dti.to_period("D")
   ser_monotonic = pd.Series(np.arange(30), index=pi)
   shuffler = list(range(0, 30, 2)) + list(range(1, 31, 2))
   ser = ser_monotonic[shuffler]
   ser

.. ipython:: python

   ser["2014"]
   ser.loc["May 2015"]

Fold argument support in Timestamp constructor

:class:`Timestamp:` now supports the keyword-only fold argument according to PEP 495 similar to parent datetime.datetime class. It supports both accepting fold as an initialization argument and inferring fold from other constructor arguments (:issue:`25057`, :issue:`31338`). Support is limited to dateutil timezones as pytz doesn't support fold.

For example:

.. ipython:: python

    ts = pd.Timestamp("2019-10-27 01:30:00+00:00")
    ts.fold

.. ipython:: python

    ts = pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
                      tz="dateutil/Europe/London", fold=1)
    ts

For more on working with fold, see :ref:`Fold subsection <timeseries.fold>` in the user guide.

Parsing timezone-aware format with different timezones in to_datetime

:func:`to_datetime` now supports parsing formats containing timezone names (%Z) and UTC offsets (%z) from different timezones then converting them to UTC by setting utc=True. This would return a :class:`DatetimeIndex` with timezone at UTC as opposed to an :class:`Index` with object dtype if utc=True is not set (:issue:`32792`).

For example:

.. ipython:: python

    tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
               "2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
    pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
    pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')

Other enhancements

Development Changes

  • The minimum version of Cython is now the most recent bug-fix version (0.29.16) (:issue:`33334`).

Other API changes

Backwards incompatible API changes

MultiIndex.get_indexer interprets method argument differently

This restores the behavior of :meth:`MultiIndex.get_indexer` with method='backfill' or method='pad' to the behavior before pandas 0.23.0. In particular, MultiIndexes are treated as a list of tuples and padding or backfilling is done with respect to the ordering of these lists of tuples (:issue:`29896`).

As an example of this, given:

.. ipython:: python

        df = pd.DataFrame({
            'a': [0, 0, 0, 0],
            'b': [0, 2, 3, 4],
            'c': ['A', 'B', 'C', 'D'],
        }).set_index(['a', 'b'])
        mi_2 = pd.MultiIndex.from_product([[0], [-1, 0, 1, 3, 4, 5]])

The differences in reindexing df with mi_2 and using method='backfill' can be seen here:

pandas >= 0.23, < 1.1.0:

In [1]: df.reindex(mi_2, method='backfill')
Out[1]:
      c
0 -1  A
   0  A
   1  D
   3  A
   4  A
   5  C

pandas <0.23, >= 1.1.0

.. ipython:: python

        df.reindex(mi_2, method='backfill')

And the differences in reindexing df with mi_2 and using method='pad' can be seen here:

pandas >= 0.23, < 1.1.0

In [1]: df.reindex(mi_2, method='pad')
Out[1]:
        c
0 -1  NaN
   0  NaN
   1    D
   3  NaN
   4    A
   5    C

pandas < 0.23, >= 1.1.0

.. ipython:: python

        df.reindex(mi_2, method='pad')

Failed Label-Based Lookups Always Raise KeyError

Label lookups series[key], series.loc[key] and frame.loc[key] used to raises either KeyError or TypeError depending on the type of key and type of :class:`Index`. These now consistently raise KeyError (:issue:`31867`)

.. ipython:: python

    ser1 = pd.Series(range(3), index=[0, 1, 2])
    ser2 = pd.Series(range(3), index=pd.date_range("2020-02-01", periods=3))

Previous behavior:

In [3]: ser1[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
TypeError: cannot do label indexing on DatetimeIndex with these indexers [1] of type int

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

New behavior:

In [3]: ser1[1.5]
...
KeyError: 1.5

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
KeyError: 1.5

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
KeyError: 1

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

:meth:`DataFrame.merge` preserves right frame's row order

:meth:`DataFrame.merge` now preserves right frame's row order when executing a right merge (:issue:`27453`)

.. ipython:: python

    left_df = pd.DataFrame({'animal': ['dog', 'pig'], 'max_speed': [40, 11]})
    right_df = pd.DataFrame({'animal': ['quetzal', 'pig'], 'max_speed': [80, 11]})
    left_df
    right_df

Previous behavior:

>>> left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
    animal  max_speed
0      pig         11
1  quetzal         80

New behavior:

.. ipython:: python

    left_df.merge(right_df, on=['animal', 'max_speed'], how="right")

Assignment to multiple columns of a DataFrame when some columns do not exist

Assignment to multiple columns of a :class:`DataFrame` when some of the columns do not exist would previously assign the values to the last column. Now, new columns would be constructed with the right values. (:issue:`13658`)

.. ipython:: python

   df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})
   df

Previous behavior:

In [3]: df[['a', 'c']] = 1
In [4]: df
Out[4]:
   a  b
0  1  1
1  1  1
2  1  1

New behavior:

.. ipython:: python

   df[['a', 'c']] = 1
   df

Deprecations

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

.. ipython:: python

        df = pd.DataFrame(np.arange(4),
                          index=[["a", "a", "b", "b"], [1, 2, 1, 2]])
        # Rows are now ordered as the requested keys
        df.loc[(['b', 'a'], [2, 1]), :]

.. ipython:: python

        left = pd.MultiIndex.from_arrays([["b", "a"], [2, 1]])
        right = pd.MultiIndex.from_arrays([["a", "b", "c"], [1, 2, 3]])
        # Common elements are now guaranteed to be ordered by the left side
        left.intersection(right, sort=False)

I/O

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Other

Contributors