Skip to content

Latest commit

 

History

History
479 lines (398 loc) · 29.6 KB

v2.1.0.rst

File metadata and controls

479 lines (398 loc) · 29.6 KB

What's new in 2.1.0 (Month XX, 2023)

These are the changes in pandas 2.1.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Enhancements

pd.to_datetime now tries to infer the datetime format of each string by considering a random sample (instead of the first non-null sample), and tries to find the format which work for most strings. If several formats work as well, the one which matches the dayfirst parameter is returned. If format="mixed", pandas does the same thing, then tries the second best format on the strings which failed to parse with the first best format, and so on (:issue:`52508`).

Previous behavior:

In [1]: pd.to_datetime(["01-02-2012", "01-03-2012", "30-01-2012"])
Out[1]:
ValueError: time data "30-01-2012" doesn't match format "%m-%d-%Y", at position 2. You might want to try:
- passing `format` if your strings have a consistent format;
- passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format;
- passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.

In [2]: pd.to_datetime(["01-02-2012", "01-03-2012", "30-01-2012"], errors="coerce")
Out[2]:
DatetimeIndex(['2012-01-02', '2012-01-03', 'NaT'], dtype='datetime64[ns]', freq=None)

In [3]: pd.to_datetime(["01-02-2012", "01-03-2012", "30-01-2012"], format="mixed")
Out[3]:
DatetimeIndex(['2012-01-02', '2012-01-03', '2012-01-30'], dtype='datetime64[ns]', freq=None)

New behavior:

In [1]: pd.to_datetime(["01-02-2012", "01-03-2012", "30-01-2012"])
Out[1]:
UserWarning: Parsing dates in %d-%m-%Y format when dayfirst=False was specified.
Pass `dayfirst=True` or specify a format to silence this warning.
DatetimeIndex(['2012-02-01', '2012-03-01', '2012-01-30'], dtype='datetime64[ns]',
freq=None)

In [2]: pd.to_datetime(["01-02-2012", "01-03-2012", "30-01-2012"], errors="coerce")
Out[2]:
UserWarning: Parsing dates in %d-%m-%Y format when dayfirst=False was specified. Pass `dayfirst=True` or specify a format to silence this warning.
DatetimeIndex(['2012-02-01', '2012-03-01', '2012-01-30'], dtype='datetime64[ns]', freq=None)

In [3]: pd.to_datetime(["01-02-2012", "01-03-2012", "30-01-2012"], format="mixed")
Out[3]:
DatetimeIndex(['2012-02-01', '2012-03-01', '2012-01-30'], dtype='datetime64[ns]', freq=None)

map(func, na_action="ignore") now works for all array types

When given a callable, :meth:`Series.map` applies the callable to all elements of the :class:`Series`. Similarly, :meth:`DataFrame.map` applies the callable to all elements of the :class:`DataFrame`, while :meth:`Index.map` applies the callable to all elements of the :class:`Index`.

Frequently, it is not desirable to apply the callable to nan-like values of the array and to avoid doing that, the map method could be called with na_action="ignore", i.e. ser.map(func, na_action="ignore"). However, na_action="ignore" was not implemented for many ExtensionArray and Index types and na_action="ignore" did not work correctly for any ExtensionArray subclass except the nullable numeric ones (i.e. with dtype :class:`Int64` etc.).

na_action="ignore" now works for all array types (:issue:`52219`, :issue:`51645`, :issue:`51809`, :issue:`51936`, :issue:`52033`; :issue:`52096`).

Previous behavior:

In [1]: ser = pd.Series(["a", "b", np.nan], dtype="category")
In [2]: ser.map(str.upper, na_action="ignore")
NotImplementedError
In [3]: df = pd.DataFrame(ser)
In [4]: df.applymap(str.upper, na_action="ignore")  # worked for DataFrame
     0
0    A
1    B
2  NaN
In [5]: idx = pd.Index(ser)
In [6]: idx.map(str.upper, na_action="ignore")
TypeError: CategoricalIndex.map() got an unexpected keyword argument 'na_action'

New behavior:

.. ipython:: python

    ser = pd.Series(["a", "b", np.nan], dtype="category")
    ser.map(str.upper, na_action="ignore")
    df = pd.DataFrame(ser)
    df.map(str.upper, na_action="ignore")
    idx = pd.Index(ser)
    idx.map(str.upper, na_action="ignore")

Notice also that in this version, :meth:`DataFrame.map` been added and :meth:`DataFrame.applymap` has been deprecated. :meth:`DataFrame.map` has the same functionality as :meth:`DataFrame.applymap`, but the new name better communicate that this is the :class:`DataFrame` version of :meth:`Series.map` (:issue:`52353`).

Also, note that :meth:`Categorical.map` implicitly has had its na_action set to "ignore" by default. This has been deprecated and will :meth:`Categorical.map` in the future change the default to na_action=None, like for all the other array types.

Other enhancements

Notable bug fixes

These are bug fixes that might have notable behavior changes.

notable_bug_fix1

notable_bug_fix2

Backwards incompatible API changes

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package Minimum Version Required Changed
numpy 1.21.6 X X
mypy (dev) 1.2   X
beautifulsoup4 4.11.1   X
bottleneck 1.3.4   X
fastparquet 0.8.1   X
fsspec 2022.05.0   X
hypothesis 6.46.1   X
gcsfs 2022.05.0   X
jinja2 3.1.2   X
lxml 4.8.0   X
numba 0.55.2   X
numexpr 2.8.0   X
openpyxl 3.0.10   X
pandas-gbq 0.17.5   X
psycopg2 2.9.3   X
pyreadstat 1.1.5   X
pyqt5 5.15.6   X
pytables 3.7.0   X
python-snappy 0.6.1   X
pyxlsb 1.0.9   X
s3fs 2022.05.0   X
scipy 1.8.1   X
sqlalchemy 1.4.36   X
tabulate 0.8.10   X
xarray 2022.03.0   X
xlsxwriter 3.0.3   X
zstandard 0.17.0   X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version Changed
    X

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Other API changes

Deprecations

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Period

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Styler

Other

Contributors