What's new in 3.0.0 (Month XX, 2024)

These are the changes in pandas 3.0.0. See :ref:`release` for a full changelog including other versions of pandas.

Enhancements

enhancement1

enhancement2

Other enhancements

:func:`DataFrame.to_excel` now raises an UserWarning when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
:func:`read_stata` now returns datetime64 resolutions better matching those natively stored in the stata format (:issue:`55642`)
:meth:`Styler.set_tooltips` provides alternative method to storing tooltips by using title attribute of td elements. (:issue:`56981`)
Allow dictionaries to be passed to :meth:`pandas.Series.str.replace` via pat parameter (:issue:`51748`)
Support passing a :class:`Series` input to :func:`json_normalize` that retains the :class:`Series` :class:`Index` (:issue:`51452`)
Users can globally disable any PerformanceWarning by setting the option mode.performance_warnings to False (:issue:`56920`)

Notable bug fixes

These are bug fixes that might have notable behavior changes.

Improved behavior in groupby for `observed=False`

A number of bugs have been fixed due to improved handling of unobserved groups (:issue:`55738`). All remarks in this section equally impact :class:`.SeriesGroupBy`.

In previous versions of pandas, a single grouping with :meth:`.DataFrameGroupBy.apply` or :meth:`.DataFrameGroupBy.agg` would pass the unobserved groups to the provided function, resulting in 0 below.

.. ipython:: python

    df = pd.DataFrame(
        {
            "key1": pd.Categorical(list("aabb"), categories=list("abc")),
            "key2": [1, 1, 1, 2],
            "values": [1, 2, 3, 4],
        }
    )
    df
    gb = df.groupby("key1", observed=False)
    gb[["values"]].apply(lambda x: x.sum())

However this was not the case when using multiple groupings, resulting in NaN below.

In [1]: gb = df.groupby(["key1", "key2"], observed=False)
In [2]: gb[["values"]].apply(lambda x: x.sum())
Out[2]:
           values
key1 key2
a    1        3.0
     2        NaN
b    1        3.0
     2        4.0
c    1        NaN
     2        NaN

Now using multiple groupings will also pass the unobserved groups to the provided function.

.. ipython:: python

    gb = df.groupby(["key1", "key2"], observed=False)
    gb[["values"]].apply(lambda x: x.sum())

Similarly:

In previous versions of pandas the method :meth:`.DataFrameGroupBy.sum` would result in 0 for unobserved groups, but :meth:`.DataFrameGroupBy.prod`, :meth:`.DataFrameGroupBy.all`, and :meth:`.DataFrameGroupBy.any` would all result in NA values. Now these methods result in 1, True, and False respectively.

:meth:`.DataFrameGroupBy.groups` did not include unobserved groups and now does.

These improvements also fixed certain bugs in groupby:

:meth:`.DataFrameGroupBy.nunique` would fail when there are multiple groupings, unobserved groups, and as_index=False (:issue:`52848`)

:meth:`.DataFrameGroupBy.agg` would fail when there are multiple groupings, unobserved groups, and as_index=False (:issue:`36698`)

:meth:`.DataFrameGroupBy.sum` would have incorrect values when there are multiple groupings, unobserved groups, and non-numeric data (:issue:`43891`)

:meth:`.DataFrameGroupBy.groups` with sort=False would sort groups; they now occur in the order they are observed (:issue:`56966`)

:meth:`.DataFrameGroupBy.value_counts` would produce incorrect results when used with some categorical and some non-categorical groupings and observed=False (:issue:`56016`)

notable_bug_fix2

Backwards incompatible API changes

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package	Minimum Version	Required	Changed
numpy	1.23.5	X	X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package	New Minimum Version
fastparquet	2023.04.0

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Other API changes

3rd party py.path objects are no longer explicitly supported in IO methods. Use :py:class:`pathlib.Path` objects instead (:issue:`57091`)
:attr:`MultiIndex.codes`, :attr:`MultiIndex.levels`, and :attr:`MultiIndex.names` now returns a tuple instead of a FrozenList (:issue:`53531`)
:func:`read_table`'s parse_dates argument defaults to None to improve consistency with :func:`read_csv` (:issue:`57476`)
Made dtype a required argument in :meth:`ExtensionArray._from_sequence_of_strings` (:issue:`56519`)
Updated :meth:`DataFrame.to_excel` so that the output spreadsheet has no styling. Custom styling can still be done using :meth:`Styler.to_excel` (:issue:`54154`)
pickle and HDF (.h5) files created with Python 2 are no longer explicitly supported (:issue:`57387`)
pickled objects from pandas version less than 1.0.0 are no longer supported (:issue:`57155`)

Deprecations

Copy keyword

The copy keyword argument in the following methods is deprecated and will be removed in a future version:

:meth:`DataFrame.truncate` / :meth:`Series.truncate`
:meth:`DataFrame.tz_convert` / :meth:`Series.tz_convert`
:meth:`DataFrame.tz_localize` / :meth:`Series.tz_localize`
:meth:`DataFrame.infer_objects` / :meth:`Series.infer_objects`
:meth:`DataFrame.align` / :meth:`Series.align`
:meth:`DataFrame.astype` / :meth:`Series.astype`
:meth:`DataFrame.reindex` / :meth:`Series.reindex`
:meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`

Copy-on-Write utilizes a lazy copy mechanism that defers copying the data until necessary. Use .copy to trigger an eager copy. The copy keyword has no effect starting with 3.0, so it can be safely removed from your code.

Other Deprecations

Deprecated :meth:`Timestamp.utcfromtimestamp`, use Timestamp.fromtimestamp(ts, "UTC") instead (:issue:`56680`)
Deprecated :meth:`Timestamp.utcnow`, use Timestamp.now("UTC") instead (:issue:`56680`)
Deprecated allowing non-keyword arguments in :meth:`Series.to_markdown` except buf. (:issue:`57280`)
Deprecated allowing non-keyword arguments in :meth:`Series.to_string` except buf. (:issue:`57280`)

Removal of prior version deprecations/changes

:func:`read_excel`, :func:`read_json`, :func:`read_html`, and :func:`read_xml` no longer accept raw string or byte representation of the data. That type of data must be wrapped in a :py:class:`StringIO` or :py:class:`BytesIO` (:issue:`53767`)
:meth:`Series.dt.to_pydatetime` now returns a :class:`Series` of :py:class:`datetime.datetime` objects (:issue:`52459`)
:meth:`SeriesGroupBy.agg` no longer pins the name of the group to the input passed to the provided func (:issue:`51703`)
All arguments except name in :meth:`Index.rename` are now keyword only (:issue:`56493`)
All arguments except the first path-like argument in IO writers are now keyword only (:issue:`54229`)
All arguments in :meth:`Index.sort_values` are now keyword only (:issue:`56493`)
All arguments in :meth:`Series.to_dict` are now keyword only (:issue:`56493`)
Changed the default value of observed in :meth:`DataFrame.groupby` and :meth:`Series.groupby` to True (:issue:`51811`)
Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes utc=True to :func:`to_datetime` (:issue:`57275`)
Enforced deprecation of axis=None acting the same as axis=0 in the DataFrame reductions sum, prod, std, var, and sem, passing axis=None will now reduce over both axes; this is particularly the case when doing e.g. numpy.sum(df) (:issue:`21597`)
Enforced silent-downcasting deprecation for :ref:`all relevant methods <whatsnew_220.silent_downcasting>` (:issue:`54710`)
In :meth:`DataFrame.stack`, the default value of future_stack is now True; specifying False will raise a FutureWarning (:issue:`55448`)
Methods apply, agg, and transform will no longer replace NumPy functions (e.g. np.sum) and built-in functions (e.g. min) with the equivalent pandas implementation; use string aliases (e.g. "sum" and "min") if you desire to use the pandas implementation (:issue:`53974`)
Passing both freq and fill_value in :meth:`DataFrame.shift` and :meth:`Series.shift` and :meth:`.DataFrameGroupBy.shift` now raises a ValueError (:issue:`54818`)
Removed :meth:`DateOffset.is_anchored` and :meth:`offsets.Tick.is_anchored` (:issue:`56594`)
Removed DataFrame.applymap, Styler.applymap and Styler.applymap_index (:issue:`52364`)
Removed DataFrame.bool and Series.bool (:issue:`51756`)
Removed DataFrame.first and DataFrame.last (:issue:`53710`)
Removed DataFrame.swapaxes and Series.swapaxes (:issue:`51946`)
Removed DataFrameGroupBy.grouper and SeriesGroupBy.grouper (:issue:`56521`)
Removed DataFrameGroupby.fillna and SeriesGroupBy.fillna` (:issue:`55719`)
Removed Index.format, use :meth:`Index.astype` with str or :meth:`Index.map` with a formatter function instead (:issue:`55439`)
Removed Resample.fillna (:issue:`55719`)
Removed Series.__int__ and Series.__float__. Call int(Series.iloc[0]) or float(Series.iloc[0]) instead. (:issue:`51131`)
Removed Series.ravel (:issue:`56053`)
Removed Series.view (:issue:`56054`)
Removed StataReader.close (:issue:`49228`)
Removed _data from :class:`DataFrame`, :class:`Series`, :class:`.arrays.ArrowExtensionArray` (:issue:`52003`)
Removed axis argument from :meth:`DataFrame.groupby`, :meth:`Series.groupby`, :meth:`DataFrame.rolling`, :meth:`Series.rolling`, :meth:`DataFrame.resample`, and :meth:`Series.resample` (:issue:`51203`)
Removed axis argument from all groupby operations (:issue:`50405`)
Removed convert_dtype from :meth:`Series.apply` (:issue:`52257`)
Removed method, limit fill_axis and broadcast_axis keywords from :meth:`DataFrame.align` (:issue:`51968`)
Removed pandas.api.types.is_interval and pandas.api.types.is_period, use isinstance(obj, pd.Interval) and isinstance(obj, pd.Period) instead (:issue:`55264`)
Removed pandas.io.sql.execute (:issue:`50185`)
Removed pandas.value_counts, use :meth:`Series.value_counts` instead (:issue:`53493`)
Removed read_gbq and DataFrame.to_gbq. Use pandas_gbq.read_gbq and pandas_gbq.to_gbq instead https://pandas-gbq.readthedocs.io/en/latest/api.html (:issue:`55525`)
Removed use_nullable_dtypes from :func:`read_parquet` (:issue:`51853`)
Removed year, month, quarter, day, hour, minute, and second keywords in the :class:`PeriodIndex` constructor, use :meth:`PeriodIndex.from_fields` instead (:issue:`55960`)
Removed deprecated argument obj in :meth:`.DataFrameGroupBy.get_group` and :meth:`.SeriesGroupBy.get_group` (:issue:`53545`)
Removed deprecated behavior of :meth:`Series.agg` using :meth:`Series.apply` (:issue:`53325`)
Removed deprecated keyword method on :meth:`Series.fillna`, :meth:`DataFrame.fillna` (:issue:`57760`)
Removed option mode.use_inf_as_na, convert inf entries to NaN before instead (:issue:`51684`)
Removed support for :class:`DataFrame` in :meth:`DataFrame.from_records`(:issue:`51697`)
Removed support for errors="ignore" in :func:`to_datetime`, :func:`to_timedelta` and :func:`to_numeric` (:issue:`55734`)
Removed support for slice in :meth:`DataFrame.take` (:issue:`51539`)
Removed the ArrayManager (:issue:`55043`)
Removed the fastpath argument from the :class:`Series` constructor (:issue:`55466`)
Removed the is_boolean, is_integer, is_floating, holds_integer, is_numeric, is_categorical, is_object, and is_interval attributes of :class:`Index` (:issue:`50042`)
Removed the ordinal keyword in :class:`PeriodIndex`, use :meth:`PeriodIndex.from_ordinals` instead (:issue:`55960`)
Removed unused arguments *args and **kwargs in :class:`Resampler` methods (:issue:`50977`)
Unrecognized timezones when parsing strings to datetimes now raises a ValueError (:issue:`51477`)

Performance improvements

:meth:`Series.str.extract` returns a :class:`RangeIndex` columns instead of an :class:`Index` column when possible (:issue:`57542`)
Performance improvement in :class:`DataFrame` when data is a dict and columns is specified (:issue:`24368`)
Performance improvement in :meth:`DataFrame.join` for sorted but non-unique indexes (:issue:`56941`)
Performance improvement in :meth:`DataFrame.join` when left and/or right are non-unique and how is "left", "right", or "inner" (:issue:`56817`)
Performance improvement in :meth:`DataFrame.join` with how="left" or how="right" and sort=True (:issue:`56919`)
Performance improvement in :meth:`DataFrameGroupBy.ffill`, :meth:`DataFrameGroupBy.bfill`, :meth:`SeriesGroupBy.ffill`, and :meth:`SeriesGroupBy.bfill` (:issue:`56902`)
Performance improvement in :meth:`Index.join` by propagating cached attributes in cases where the result matches one of the inputs (:issue:`57023`)
Performance improvement in :meth:`Index.take` when indices is a full range indexer from zero to length of index (:issue:`56806`)
Performance improvement in :meth:`MultiIndex.equals` for equal length indexes (:issue:`56990`)
Performance improvement in :meth:`RangeIndex.__getitem__` with a boolean mask returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57588`)
Performance improvement in :meth:`RangeIndex.append` when appending the same index (:issue:`57252`)
Performance improvement in :meth:`RangeIndex.join` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57651`)
Performance improvement in :meth:`RangeIndex.reindex` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57647`)
Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`)
Performance improvement in DataFrameGroupBy.__len__ and SeriesGroupBy.__len__ (:issue:`57595`)
Performance improvement in indexing operations for string dtypes (:issue:`56997`)

Bug fixes

Fixed bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
Fixed bug in :meth:`DataFrame.to_string` that raised StopIteration with nested DataFrames. (:issue:`16098`)
Fixed bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
Fixed bug in :meth:`Series.diff` allowing non-integer values for the periods argument. (:issue:`56607`)

Categorical

Datetimelike

Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)

Timedelta

Timezones

Numeric

Bug in np.matmul with :class:`Index` inputs raising a TypeError (:issue:`57079`)

Conversion

Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
Bug in :meth:`Series.reindex` not maintaining float32 type when a reindex introduces a missing value (:issue:`45857`)

Strings

Bug in :meth:`Series.value_counts` would not respect sort=False for series having string dtype (:issue:`55224`)

Interval

Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)

Indexing

Missing

MultiIndex

I/O

Bug in :meth:`DataFrame.to_excel` when writing empty :class:`DataFrame` with :class:`MultiIndex` on both axes (:issue:`57696`)

Period

Plotting

Groupby/resample/rolling

Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby argument dropna (:issue:`55919`)
Bug in :meth:`.DataFrameGroupBy.quantile` when interpolation="nearest" is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed times and aggregation functions other than mean (:issue:`51695`)

Reshaping

Sparse

ExtensionArray

Fixed bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return False for array-likes (:issue:`57055`)

Styler

Other

Bug in :class:`DataFrame` when passing a dict with a NA scalar and columns that would always return np.nan (:issue:`57205`)
Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
Bug in :meth:`DataFrame.sort_index` when passing axis="columns" and ignore_index=True and ascending=False not returning a :class:`RangeIndex` columns (:issue:`57293`)
Bug in :meth:`DataFrame.where` where using a non-bool type array in the function would return a ValueError instead of a TypeError (:issue:`56330`)
Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers' associated dtype, for string and datetime columns (:issue:`54781`)

Files

v3.0.0.rst

Latest commit

History