Skip to content

Latest commit

 

History

History
788 lines (595 loc) · 43.3 KB

v1.1.0.rst

File metadata and controls

788 lines (595 loc) · 43.3 KB

What's new in 1.1.0 (??)

These are the changes in pandas 1.1.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Enhancements

Nonmonotonic PeriodIndex Partial String Slicing

:class:`PeriodIndex` now supports partial string slicing for non-monotonic indexes, mirroring :class:`DatetimeIndex` behavior (:issue:`31096`)

For example:

.. ipython:: python

   dti = pd.date_range("2014-01-01", periods=30, freq="30D")
   pi = dti.to_period("D")
   ser_monotonic = pd.Series(np.arange(30), index=pi)
   shuffler = list(range(0, 30, 2)) + list(range(1, 31, 2))
   ser = ser_monotonic[shuffler]
   ser

.. ipython:: python

   ser["2014"]
   ser.loc["May 2015"]

Sorting with keys

We've added a key argument to the DataFrame and Series sorting methods, including :meth:`DataFrame.sort_values`, :meth:`DataFrame.sort_index`, :meth:`Series.sort_values`, and :meth:`Series.sort_index`. The key can be any callable function which is applied column-by-column to each column used for sorting, before sorting is performed (:issue:`27237`). See :ref:`sort_values with keys <basics.sort_value_key>` and :ref:`sort_index with keys <basics.sort_index_key>` for more information.

.. ipython:: python

   s = pd.Series(['C', 'a', 'B'])
   s

.. ipython:: python

   s.sort_values()


Note how this is sorted with capital letters first. If we apply the :meth:`Series.str.lower` method, we get

.. ipython:: python

   s.sort_values(key=lambda x: x.str.lower())


When applied to a DataFrame, they key is applied per-column to all columns or a subset if by is specified, e.g.

.. ipython:: python

   df = pd.DataFrame({'a': ['C', 'C', 'a', 'a', 'B', 'B'],
                      'b': [1, 2, 3, 4, 5, 6]})
   df

.. ipython:: python

   df.sort_values(by=['a'], key=lambda col: col.str.lower())


For more details, see examples and documentation in :meth:`DataFrame.sort_values`, :meth:`Series.sort_values`, and :meth:`~DataFrame.sort_index`.

Fold argument support in Timestamp constructor

:class:`Timestamp:` now supports the keyword-only fold argument according to PEP 495 similar to parent datetime.datetime class. It supports both accepting fold as an initialization argument and inferring fold from other constructor arguments (:issue:`25057`, :issue:`31338`). Support is limited to dateutil timezones as pytz doesn't support fold.

For example:

.. ipython:: python

    ts = pd.Timestamp("2019-10-27 01:30:00+00:00")
    ts.fold

.. ipython:: python

    ts = pd.Timestamp(year=2019, month=10, day=27, hour=1, minute=30,
                      tz="dateutil/Europe/London", fold=1)
    ts

For more on working with fold, see :ref:`Fold subsection <timeseries.fold>` in the user guide.

Parsing timezone-aware format with different timezones in to_datetime

:func:`to_datetime` now supports parsing formats containing timezone names (%Z) and UTC offsets (%z) from different timezones then converting them to UTC by setting utc=True. This would return a :class:`DatetimeIndex` with timezone at UTC as opposed to an :class:`Index` with object dtype if utc=True is not set (:issue:`32792`).

For example:

.. ipython:: python

    tz_strs = ["2010-01-01 12:00:00 +0100", "2010-01-01 12:00:00 -0100",
               "2010-01-01 12:00:00 +0300", "2010-01-01 12:00:00 +0400"]
    pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z', utc=True)
    pd.to_datetime(tz_strs, format='%Y-%m-%d %H:%M:%S %z')

Other enhancements

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated (:issue:`33718`, :issue:`29766`, :issue:`29723`, pytables >= 3.4.3). If installed, we now require:

Package Minimum Version Required Changed
numpy 1.15.4 X X
pytz 2015.4 X  
python-dateutil 2.7.3 X X
bottleneck 1.2.1    
numexpr 2.6.2    
pytest (dev) 4.0.2    

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version Changed
beautifulsoup4 4.6.0  
fastparquet 0.3.2  
gcsfs 0.2.2  
lxml 3.8.0  
matplotlib 2.2.2  
numba 0.46.0  
openpyxl 2.5.7  
pyarrow 0.13.0  
pymysql 0.7.1  
pytables 3.4.3 X
s3fs 0.3.0  
scipy 1.2.0 X
sqlalchemy 1.1.4  
xarray 0.8.2  
xlrd 1.1.0  
xlsxwriter 0.9.8  
xlwt 1.2.0  

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Development Changes

  • The minimum version of Cython is now the most recent bug-fix version (0.29.16) (:issue:`33334`).

Other API changes

Backwards incompatible API changes

MultiIndex.get_indexer interprets method argument differently

This restores the behavior of :meth:`MultiIndex.get_indexer` with method='backfill' or method='pad' to the behavior before pandas 0.23.0. In particular, MultiIndexes are treated as a list of tuples and padding or backfilling is done with respect to the ordering of these lists of tuples (:issue:`29896`).

As an example of this, given:

.. ipython:: python

        df = pd.DataFrame({
            'a': [0, 0, 0, 0],
            'b': [0, 2, 3, 4],
            'c': ['A', 'B', 'C', 'D'],
        }).set_index(['a', 'b'])
        mi_2 = pd.MultiIndex.from_product([[0], [-1, 0, 1, 3, 4, 5]])

The differences in reindexing df with mi_2 and using method='backfill' can be seen here:

pandas >= 0.23, < 1.1.0:

In [1]: df.reindex(mi_2, method='backfill')
Out[1]:
      c
0 -1  A
   0  A
   1  D
   3  A
   4  A
   5  C

pandas <0.23, >= 1.1.0

.. ipython:: python

        df.reindex(mi_2, method='backfill')

And the differences in reindexing df with mi_2 and using method='pad' can be seen here:

pandas >= 0.23, < 1.1.0

In [1]: df.reindex(mi_2, method='pad')
Out[1]:
        c
0 -1  NaN
   0  NaN
   1    D
   3  NaN
   4    A
   5    C

pandas < 0.23, >= 1.1.0

.. ipython:: python

        df.reindex(mi_2, method='pad')

Failed Label-Based Lookups Always Raise KeyError

Label lookups series[key], series.loc[key] and frame.loc[key] used to raises either KeyError or TypeError depending on the type of key and type of :class:`Index`. These now consistently raise KeyError (:issue:`31867`)

.. ipython:: python

    ser1 = pd.Series(range(3), index=[0, 1, 2])
    ser2 = pd.Series(range(3), index=pd.date_range("2020-02-01", periods=3))

Previous behavior:

In [3]: ser1[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
TypeError: cannot do label indexing on Int64Index with these indexers [1.5] of type float

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
TypeError: cannot do label indexing on DatetimeIndex with these indexers [1] of type int

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

New behavior:

In [3]: ser1[1.5]
...
KeyError: 1.5

In [4] ser1["foo"]
...
KeyError: 'foo'

In [5]: ser1.loc[1.5]
...
KeyError: 1.5

In [6]: ser1.loc["foo"]
...
KeyError: 'foo'

In [7]: ser2.loc[1]
...
KeyError: 1

In [8]: ser2.loc[pd.Timestamp(0)]
...
KeyError: Timestamp('1970-01-01 00:00:00')

Failed Integer Lookups on MultiIndex Raise KeyError

Indexing with integers with a :class:`MultiIndex` that has a integer-dtype first level incorrectly failed to raise KeyError when one or more of those integer keys is not present in the first level of the index (:issue:`33539`)

.. ipython:: python

    idx = pd.Index(range(4))
    dti = pd.date_range("2000-01-03", periods=3)
    mi = pd.MultiIndex.from_product([idx, dti])
    ser = pd.Series(range(len(mi)), index=mi)

Previous behavior:

In [5]: ser[[5]]
Out[5]: Series([], dtype: int64)

New behavior:

In [5]: ser[[5]]
...
KeyError: '[5] not in index'

:meth:`DataFrame.merge` preserves right frame's row order

:meth:`DataFrame.merge` now preserves right frame's row order when executing a right merge (:issue:`27453`)

.. ipython:: python

    left_df = pd.DataFrame({'animal': ['dog', 'pig'], 'max_speed': [40, 11]})
    right_df = pd.DataFrame({'animal': ['quetzal', 'pig'], 'max_speed': [80, 11]})
    left_df
    right_df

Previous behavior:

>>> left_df.merge(right_df, on=['animal', 'max_speed'], how="right")
    animal  max_speed
0      pig         11
1  quetzal         80

New behavior:

.. ipython:: python

    left_df.merge(right_df, on=['animal', 'max_speed'], how="right")

Assignment to multiple columns of a DataFrame when some columns do not exist

Assignment to multiple columns of a :class:`DataFrame` when some of the columns do not exist would previously assign the values to the last column. Now, new columns would be constructed with the right values. (:issue:`13658`)

.. ipython:: python

   df = pd.DataFrame({'a': [0, 1, 2], 'b': [3, 4, 5]})
   df

Previous behavior:

In [3]: df[['a', 'c']] = 1
In [4]: df
Out[4]:
   a  b
0  1  1
1  1  1
2  1  1

New behavior:

.. ipython:: python

   df[['a', 'c']] = 1
   df

Deprecations

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

.. ipython:: python

        df = pd.DataFrame(np.arange(4),
                          index=[["a", "a", "b", "b"], [1, 2, 1, 2]])
        # Rows are now ordered as the requested keys
        df.loc[(['b', 'a'], [2, 1]), :]

.. ipython:: python

        left = pd.MultiIndex.from_arrays([["b", "a"], [2, 1]])
        right = pd.MultiIndex.from_arrays([["a", "b", "c"], [1, 2, 3]])
        # Common elements are now guaranteed to be ordered by the left side
        left.intersection(right, sort=False)

I/O

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Other

Contributors