Skip to content

Latest commit

 

History

History
475 lines (390 loc) · 31.4 KB

v2.0.0.rst

File metadata and controls

475 lines (390 loc) · 31.4 KB

What's new in 2.0.0 (??)

These are the changes in pandas 2.0.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Enhancements

Optional dependencies version management

Optional pandas dependencies can be managed as extras in a requirements/setup file, for example:

pandas[performance, aws]>=2.0.0

Available optional dependencies (listed in order of appearance at install guide) are [all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql, sql-other, html, xml, plot, output_formatting, clipboard, compression, test] (:issue:`39164`).

enhancement2

Other enhancements

Notable bug fixes

These are bug fixes that might have notable behavior changes.

:meth:`.GroupBy.cumsum` and :meth:`.GroupBy.cumprod` overflow instead of lossy casting to float

In previous versions we cast to float when applying cumsum and cumprod which lead to incorrect results even if the result could be hold by int64 dtype. Additionally, the aggregation overflows consistent with numpy and the regular :meth:`DataFrame.cumprod` and :meth:`DataFrame.cumsum` methods when the limit of int64 is reached (:issue:`37493`).

Old Behavior

In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16

We return incorrect results with the 6th value.

New Behavior

.. ipython:: python

    df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
    df.groupby("key")["value"].cumprod()

We overflow with the 7th value, but the 6th value is still correct.

notable_bug_fix2

Backwards incompatible API changes

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package Minimum Version Required Changed
mypy (dev) 0.981   X
python-dateutil 2.8.2 X X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version Changed
pyarrow 6.0.0 X

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Datetimes are now parsed with a consistent format

In the past, :func:`to_datetime` guessed the format for each element independently. This was appropriate for some cases where a column had a mixed date format - however, it would regularly cause problems for columns where users expected a consistent format but the function would switch formats row-wise. As of version 2.0.0, this behavior is consistent column-wise, and the format is determined by the first non-NA value in the column (unless the user specifies a format, in which case that is used).

Old behavior:

In [1]: ser = pd.Series(['13-01-2000', '12-01-2000'])
In [2]: pd.to_datetime(ser)
Out[2]:
0   2000-01-13
1   2000-12-01
dtype: datetime64[ns]

New behavior:

.. ipython:: python
  :okwarning:

   ser = pd.Series(['13-01-2000', '12-01-2000'])
   pd.to_datetime(ser)

Note that this affects :func:`read_csv` as well.

If you still need to parse dates with inconsistent formats, you'll need to apply :func:`to_datetime` to each element individually, e.g.

ser = pd.Series(['13-01-2000', '12 January 2000'])
ser.apply(pd.to_datetime)

Other API changes

Deprecations

Removal of prior version deprecations/changes

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Period

Plotting

  • ax.set_xlim was sometimes raising UserWarning which users couldn't address due to set_xlim not accepting parsing arguments - the converter now uses :func:`Timestamp` instead (:issue:`49148`)

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

  • Bug in :meth:`Series.mean` overflowing unnecessarily with nullable integers (:issue:`48378`)
  • Bug when concatenating an empty DataFrame with an ExtensionDtype to another DataFrame with the same ExtensionDtype, the resulting dtype turned into object (:issue:`48510`)

Styler

Metadata

Other

Contributors