What's new in 2.0.0 (??)

These are the changes in pandas 2.0.0. See :ref:`release` for a full changelog including other versions of pandas.

Enhancements

enhancement1

enhancement2

Other enhancements

:func:`read_sas` now supports using encoding='infer' to correctly read and use the encoding specified by the sas file. (:issue:`48048`)
:meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` now preserve nullable dtypes instead of casting to numpy dtypes (:issue:`37493`)
:meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support an axis argument. If axis is set, the default behaviour of which axis to consider can be overwritten (:issue:`47819`)
:func:`assert_frame_equal` now shows the first element where the DataFrames differ, analogously to pytest's output (:issue:`47910`)
Added new argument use_nullable_dtypes to :func:`read_csv` to enable automatic conversion to nullable dtypes (:issue:`36712`)
Added index parameter to :meth:`DataFrame.to_dict` (:issue:`46398`)
Added metadata propagation for binary operators on :class:`DataFrame` (:issue:`28283`)
:class:`.CategoricalConversionWarning`, :class:`.InvalidComparison`, :class:`.InvalidVersion`, :class:`.LossySetitemError`, and :class:`.NoBufferPresent` are now exposed in pandas.errors (:issue:`27656`)
:func:`DataFrame.astype` exception message thrown improved to include column name when type conversion is not possible. (:issue:`47571`)

Notable bug fixes

These are bug fixes that might have notable behavior changes.

:meth:`.GroupBy.cumsum` and :meth:`.GroupBy.cumprod` overflow instead of lossy casting to float

In previous versions we cast to float when applying cumsum and cumprod which lead to incorrect results even if the result could be hold by int64 dtype. Additionally, the aggregation overflows consistent with numpy and the regular :meth:`DataFrame.cumprod` and :meth:`DataFrame.cumsum` methods when the limit of int64 is reached (:issue:`37493`).

Old Behavior

In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16

We return incorrect results with the 6th value.

New Behavior

.. ipython:: python

    df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
    df.groupby("key")["value"].cumprod()

We overflow with the 7th value, but the 6th value is still correct.

notable_bug_fix2

Backwards incompatible API changes

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package	Minimum Version	Required	Changed
mypy (dev)	0.981		X
python-dateutil	2.8.2	X	X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package	Minimum Version	Changed
pyarrow	6.0.0	X

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Other API changes

Passing nanoseconds greater than 999 or less than 0 in :class:`Timestamp` now raises a ValueError (:issue:`48538`, :issue:`48255`)
:func:`read_csv`: specifying an incorrect number of columns with index_col of now raises ParserError instead of IndexError when using the c parser.
Default value of dtype in :func:`get_dummies` is changed to bool from uint8 (:issue:`45848`)
:meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting datetime64 data to any of "datetime64[s]", "datetime64[ms]", "datetime64[us]" will return an object with the given resolution instead of coercing back to "datetime64[ns]" (:issue:`48928`)
:meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting timedelta64 data to any of "timedelta64[s]", "timedelta64[ms]", "timedelta64[us]" will return an object with the given resolution instead of coercing to "float64" dtype (:issue:`48963`)
Passing data with dtype of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; timedelta64 data with lower resolution will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
Passing dtype of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; passing a dtype with lower resolution for :class:`Series` or :class:`DataFrame` will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
Passing a np.datetime64 object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
The other argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults to no_default instead of np.nan consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (np.nan for numpy dtypes, pd.NA for extension dtypes). (:issue:`49111`)
When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)

Deprecations

Performance improvements

Performance improvement in :meth:`.DataFrameGroupBy.median` and :meth:`.SeriesGroupBy.median` and :meth:`.GroupBy.cumprod` for nullable dtypes (:issue:`37493`)
Performance improvement in :meth:`MultiIndex.argsort` and :meth:`MultiIndex.sort_values` (:issue:`48406`)
Performance improvement in :meth:`MultiIndex.size` (:issue:`48723`)
Performance improvement in :meth:`MultiIndex.union` without missing values and without duplicates (:issue:`48505`, :issue:`48752`)
Performance improvement in :meth:`MultiIndex.difference` (:issue:`48606`)
Performance improvement in :class:`MultiIndex` set operations with sort=None (:issue:`49010`)
Performance improvement in :meth:`.DataFrameGroupBy.mean`, :meth:`.SeriesGroupBy.mean`, :meth:`.DataFrameGroupBy.var`, and :meth:`.SeriesGroupBy.var` for extension array dtypes (:issue:`37493`)
Performance improvement in :meth:`MultiIndex.isin` when level=None (:issue:`48622`)
Performance improvement in :meth:`Index.union` and :meth:`MultiIndex.union` when index contains duplicates (:issue:`48900`)
Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
Performance improvement for :class:`DatetimeIndex` constructor passing a list (:issue:`48609`)
Performance improvement in :func:`merge` and :meth:`DataFrame.join` when joining on a sorted :class:`MultiIndex` (:issue:`48504`)
Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` for tuple-based indexing of a :class:`MultiIndex` (:issue:`48384`)
Performance improvement for :meth:`MultiIndex.unique` (:issue:`48335`)
Performance improvement for :class:`~arrays.StringArray` constructor passing a numpy array with type np.str_ (:issue:`49109`)
Performance improvement for :func:`concat` with extension array backed indexes (:issue:`49128`)
Performance improvement in :meth:`DataFrame.join` when joining on a subset of a :class:`MultiIndex` (:issue:`48611`)
Performance improvement for :meth:`MultiIndex.intersection` (:issue:`48604`)
Performance improvement in var for nullable dtypes (:issue:`48379`).
Performance improvements to :func:`read_sas` (:issue:`47403`, :issue:`47405`, :issue:`47656`, :issue:`48502`)
Memory improvement in :meth:`RangeIndex.sort_values` (:issue:`48801`)
Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when by is a categorical type and sort=False (:issue:`48976`)

Bug fixes

Categorical

Bug in :meth:`Categorical.set_categories` losing dtype information (:issue:`48812`)

Datetimelike

Bug in :func:`pandas.infer_freq`, raising TypeError when inferred on :class:`RangeIndex` (:issue:`47084`)
Bug in :func:`to_datetime` was raising on invalid offsets with errors='coerce' and infer_datetime_format=True (:issue:`48633`)
Bug in :class:`DatetimeIndex` constructor failing to raise when tz=None is explicitly specified in conjunction with timezone-aware dtype or data (:issue:`48659`)
Bug in subtracting a datetime scalar from :class:`DatetimeIndex` failing to retain the original freq attribute (:issue:`48818`)

Timedelta

Bug in :func:`to_timedelta` raising error when input has nullable dtype Float64 (:issue:`48796`)
Bug in :class:`Timedelta` constructor incorrectly raising instead of returning NaT when given a np.timedelta64("nat") (:issue:`48898`)
Bug in :class:`Timedelta` constructor failing to raise when passed both a :class:`Timedelta` object and keywords (e.g. days, seconds) (:issue:`48898`)

Timezones

Numeric

Bug in :meth:`DataFrame.add` cannot apply ufunc when inputs contain mixed DataFrame type and Series type (:issue:`39853`)

Conversion

Bug in constructing :class:`Series` with int64 dtype from a string list raising instead of casting (:issue:`44923`)
Bug in :meth:`DataFrame.eval` incorrectly raising an AttributeError when there are negative values in function call (:issue:`46471`)
Bug in :meth:`Series.convert_dtypes` not converting dtype to nullable dtype when :class:`Series` contains NA and has dtype object (:issue:`48791`)
Bug where any :class:`ExtensionDtype` subclass with kind="M" would be interpreted as a timezone type (:issue:`34986`)

Strings

Interval

Indexing

Bug in :meth:`DataFrame.reindex` filling with wrong values when indexing columns and index for uint dtypes (:issue:`48184`)
Bug in :meth:`DataFrame.__setitem__` raising ValueError when right hand side is :class:`DataFrame` with :class:`MultiIndex` columns (:issue:`49121`)
Bug in :meth:`DataFrame.reindex` casting dtype to object when :class:`DataFrame` has single extension array column when re-indexing columns and index (:issue:`48190`)
Bug in :func:`~DataFrame.describe` when formatting percentiles in the resulting index showed more decimals than needed (:issue:`46362`)
Bug in :meth:`DataFrame.compare` does not recognize differences when comparing NA with value in nullable dtypes (:issue:`48939`)

Missing

Bug in :meth:`Index.equals` raising TypeError when :class:`Index` consists of tuples that contain NA (:issue:`48446`)
Bug in :meth:`Series.map` caused incorrect result when data has NaNs and defaultdict mapping was used (:issue:`48813`)
Bug in :class:`NA` raising a TypeError instead of return :class:`NA` when performing a binary operation with a bytes object (:issue:`49108`)

MultiIndex

Bug in :meth:`MultiIndex.argsort` raising TypeError when index contains :attr:`NA` (:issue:`48495`)
Bug in :meth:`MultiIndex.difference` losing extension array dtype (:issue:`48606`)
Bug in :class:`MultiIndex.set_levels` raising IndexError when setting empty level (:issue:`48636`)
Bug in :meth:`MultiIndex.unique` losing extension array dtype (:issue:`48335`)
Bug in :meth:`MultiIndex.intersection` losing extension array (:issue:`48604`)
Bug in :meth:`MultiIndex.union` losing extension array (:issue:`48498`, :issue:`48505`, :issue:`48900`)
Bug in :meth:`MultiIndex.union` not sorting when sort=None and index contains missing values (:issue:`49010`)
Bug in :meth:`MultiIndex.append` not checking names for equality (:issue:`48288`)
Bug in :meth:`MultiIndex.symmetric_difference` losing extension array (:issue:`48607`)

I/O

Bug in :func:`read_sas` caused fragmentation of :class:`DataFrame` and raised :class:`.errors.PerformanceWarning` (:issue:`48595`)
Bug in :func:`read_csv` for a single-line csv with fewer columns than names raised :class:`.errors.ParserError` with engine="c" (:issue:`47566`)

Period

Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, raising UnicodeDecodeError when a locale-specific directive was passed (:issue:`46319`)

Plotting

ax.set_xlim was sometimes raising UserWarning which users couldn't address due to set_xlim not accepting parsing arguments - the converter now uses :func:`Timestamp` instead (:issue:`49148`)

Groupby/resample/rolling

Bug in :class:`.ExponentialMovingWindow` with online not raising a NotImplementedError for unsupported operations (:issue:`48834`)
Bug in :meth:`DataFrameGroupBy.sample` raises ValueError when the object is empty (:issue:`48459`)
Bug in :meth:`Series.groupby` raises ValueError when an entry of the index is equal to the name of the index (:issue:`48567`)
Bug in :meth:`DataFrameGroupBy.resample` produces inconsistent results when passing empty DataFrame (:issue:`47705`)

Reshaping

Bug in :meth:`DataFrame.pivot_table` raising TypeError for nullable dtype and margins=True (:issue:`48681`)
Bug in :meth:`DataFrame.unstack` and :meth:`Series.unstack` unstacking wrong level of :class:`MultiIndex` when :class:`MultiIndex` has mixed names (:issue:`48763`)
Bug in :meth:`DataFrame.pivot` not respecting None as column name (:issue:`48293`)
Bug in :func:`join` when left_on or right_on is or includes a :class:`CategoricalIndex` incorrectly raising AttributeError (:issue:`48464`)

Sparse

ExtensionArray

Bug in :meth:`Series.mean` overflowing unnecessarily with nullable integers (:issue:`48378`)
Bug when concatenating an empty DataFrame with an ExtensionDtype to another DataFrame with the same ExtensionDtype, the resulting dtype turned into object (:issue:`48510`)

Styler

Metadata

Fixed metadata propagation in :meth:`DataFrame.corr` and :meth:`DataFrame.cov` (:issue:`28283`)

Files

v2.0.0.rst

Latest commit

History