These are the changes in pandas 2.0.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
- :func:`read_sas` now supports using
encoding='infer'
to correctly read and use the encoding specified by the sas file. (:issue:`48048`) - :meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` now preserve nullable dtypes instead of casting to numpy dtypes (:issue:`37493`)
- :meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support an
axis
argument. Ifaxis
is set, the default behaviour of which axis to consider can be overwritten (:issue:`47819`) - :func:`assert_frame_equal` now shows the first element where the DataFrames differ, analogously to
pytest
's output (:issue:`47910`) - Added new argument
use_nullable_dtypes
to :func:`read_csv` to enable automatic conversion to nullable dtypes (:issue:`36712`) - Added
index
parameter to :meth:`DataFrame.to_dict` (:issue:`46398`) - Added metadata propagation for binary operators on :class:`DataFrame` (:issue:`28283`)
- :class:`.CategoricalConversionWarning`, :class:`.InvalidComparison`, :class:`.InvalidVersion`, :class:`.LossySetitemError`, and :class:`.NoBufferPresent` are now exposed in
pandas.errors
(:issue:`27656`) - :func:`DataFrame.astype` exception message thrown improved to include column name when type conversion is not possible. (:issue:`47571`)
These are bug fixes that might have notable behavior changes.
:meth:`.GroupBy.cumsum` and :meth:`.GroupBy.cumprod` overflow instead of lossy casting to float
In previous versions we cast to float when applying cumsum
and cumprod
which
lead to incorrect results even if the result could be hold by int64
dtype.
Additionally, the aggregation overflows consistent with numpy and the regular
:meth:`DataFrame.cumprod` and :meth:`DataFrame.cumsum` methods when the limit of
int64
is reached (:issue:`37493`).
Old Behavior
In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16
We return incorrect results with the 6th value.
New Behavior
.. ipython:: python df = pd.DataFrame({"key": ["b"] * 7, "value": 625}) df.groupby("key")["value"].cumprod()
We overflow with the 7th value, but the 6th value is still correct.
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
mypy (dev) | 0.981 | X | |
python-dateutil | 2.8.2 | X | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
pyarrow | 6.0.0 | X |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
- Passing
nanoseconds
greater than 999 or less than 0 in :class:`Timestamp` now raises aValueError
(:issue:`48538`, :issue:`48255`) - :func:`read_csv`: specifying an incorrect number of columns with
index_col
of now raisesParserError
instead ofIndexError
when using the c parser. - Default value of
dtype
in :func:`get_dummies` is changed tobool
fromuint8
(:issue:`45848`) - :meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting datetime64 data to any of "datetime64[s]", "datetime64[ms]", "datetime64[us]" will return an object with the given resolution instead of coercing back to "datetime64[ns]" (:issue:`48928`)
- :meth:`DataFrame.astype`, :meth:`Series.astype`, and :meth:`DatetimeIndex.astype` casting timedelta64 data to any of "timedelta64[s]", "timedelta64[ms]", "timedelta64[us]" will return an object with the given resolution instead of coercing to "float64" dtype (:issue:`48963`)
- Passing data with dtype of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; timedelta64 data with lower resolution will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
- Passing
dtype
of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; passing a dtype with lower resolution for :class:`Series` or :class:`DataFrame` will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`) - Passing a
np.datetime64
object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`) - The
other
argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults tono_default
instead ofnp.nan
consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (np.nan
for numpy dtypes,pd.NA
for extension dtypes). (:issue:`49111`) - When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)
- Performance improvement in :meth:`.DataFrameGroupBy.median` and :meth:`.SeriesGroupBy.median` and :meth:`.GroupBy.cumprod` for nullable dtypes (:issue:`37493`)
- Performance improvement in :meth:`MultiIndex.argsort` and :meth:`MultiIndex.sort_values` (:issue:`48406`)
- Performance improvement in :meth:`MultiIndex.size` (:issue:`48723`)
- Performance improvement in :meth:`MultiIndex.union` without missing values and without duplicates (:issue:`48505`, :issue:`48752`)
- Performance improvement in :meth:`MultiIndex.difference` (:issue:`48606`)
- Performance improvement in :class:`MultiIndex` set operations with sort=None (:issue:`49010`)
- Performance improvement in :meth:`.DataFrameGroupBy.mean`, :meth:`.SeriesGroupBy.mean`, :meth:`.DataFrameGroupBy.var`, and :meth:`.SeriesGroupBy.var` for extension array dtypes (:issue:`37493`)
- Performance improvement in :meth:`MultiIndex.isin` when
level=None
(:issue:`48622`) - Performance improvement in :meth:`Index.union` and :meth:`MultiIndex.union` when index contains duplicates (:issue:`48900`)
- Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
- Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
- Performance improvement for :class:`DatetimeIndex` constructor passing a list (:issue:`48609`)
- Performance improvement in :func:`merge` and :meth:`DataFrame.join` when joining on a sorted :class:`MultiIndex` (:issue:`48504`)
- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` for tuple-based indexing of a :class:`MultiIndex` (:issue:`48384`)
- Performance improvement for :meth:`MultiIndex.unique` (:issue:`48335`)
- Performance improvement for :class:`~arrays.StringArray` constructor passing a numpy array with type
np.str_
(:issue:`49109`) - Performance improvement for :func:`concat` with extension array backed indexes (:issue:`49128`)
- Performance improvement in :meth:`DataFrame.join` when joining on a subset of a :class:`MultiIndex` (:issue:`48611`)
- Performance improvement for :meth:`MultiIndex.intersection` (:issue:`48604`)
- Performance improvement in
var
for nullable dtypes (:issue:`48379`). - Performance improvements to :func:`read_sas` (:issue:`47403`, :issue:`47405`, :issue:`47656`, :issue:`48502`)
- Memory improvement in :meth:`RangeIndex.sort_values` (:issue:`48801`)
- Performance improvement in :class:`DataFrameGroupBy` and :class:`SeriesGroupBy` when
by
is a categorical type andsort=False
(:issue:`48976`)
- Bug in :meth:`Categorical.set_categories` losing dtype information (:issue:`48812`)
- Bug in :func:`pandas.infer_freq`, raising
TypeError
when inferred on :class:`RangeIndex` (:issue:`47084`) - Bug in :func:`to_datetime` was raising on invalid offsets with
errors='coerce'
andinfer_datetime_format=True
(:issue:`48633`) - Bug in :class:`DatetimeIndex` constructor failing to raise when
tz=None
is explicitly specified in conjunction with timezone-awaredtype
or data (:issue:`48659`) - Bug in subtracting a
datetime
scalar from :class:`DatetimeIndex` failing to retain the originalfreq
attribute (:issue:`48818`)
- Bug in :func:`to_timedelta` raising error when input has nullable dtype
Float64
(:issue:`48796`) - Bug in :class:`Timedelta` constructor incorrectly raising instead of returning
NaT
when given anp.timedelta64("nat")
(:issue:`48898`) - Bug in :class:`Timedelta` constructor failing to raise when passed both a :class:`Timedelta` object and keywords (e.g. days, seconds) (:issue:`48898`)
- Bug in :meth:`DataFrame.add` cannot apply ufunc when inputs contain mixed DataFrame type and Series type (:issue:`39853`)
- Bug in constructing :class:`Series` with
int64
dtype from a string list raising instead of casting (:issue:`44923`) - Bug in :meth:`DataFrame.eval` incorrectly raising an
AttributeError
when there are negative values in function call (:issue:`46471`) - Bug in :meth:`Series.convert_dtypes` not converting dtype to nullable dtype when :class:`Series` contains
NA
and has dtypeobject
(:issue:`48791`) - Bug where any :class:`ExtensionDtype` subclass with
kind="M"
would be interpreted as a timezone type (:issue:`34986`)
- Bug in :meth:`DataFrame.reindex` filling with wrong values when indexing columns and index for
uint
dtypes (:issue:`48184`) - Bug in :meth:`DataFrame.__setitem__` raising
ValueError
when right hand side is :class:`DataFrame` with :class:`MultiIndex` columns (:issue:`49121`) - Bug in :meth:`DataFrame.reindex` casting dtype to
object
when :class:`DataFrame` has single extension array column when re-indexingcolumns
andindex
(:issue:`48190`) - Bug in :func:`~DataFrame.describe` when formatting percentiles in the resulting index showed more decimals than needed (:issue:`46362`)
- Bug in :meth:`DataFrame.compare` does not recognize differences when comparing
NA
with value in nullable dtypes (:issue:`48939`)
- Bug in :meth:`Index.equals` raising
TypeError
when :class:`Index` consists of tuples that containNA
(:issue:`48446`) - Bug in :meth:`Series.map` caused incorrect result when data has NaNs and defaultdict mapping was used (:issue:`48813`)
- Bug in :class:`NA` raising a
TypeError
instead of return :class:`NA` when performing a binary operation with abytes
object (:issue:`49108`)
- Bug in :meth:`MultiIndex.argsort` raising
TypeError
when index contains :attr:`NA` (:issue:`48495`) - Bug in :meth:`MultiIndex.difference` losing extension array dtype (:issue:`48606`)
- Bug in :class:`MultiIndex.set_levels` raising
IndexError
when setting empty level (:issue:`48636`) - Bug in :meth:`MultiIndex.unique` losing extension array dtype (:issue:`48335`)
- Bug in :meth:`MultiIndex.intersection` losing extension array (:issue:`48604`)
- Bug in :meth:`MultiIndex.union` losing extension array (:issue:`48498`, :issue:`48505`, :issue:`48900`)
- Bug in :meth:`MultiIndex.union` not sorting when sort=None and index contains missing values (:issue:`49010`)
- Bug in :meth:`MultiIndex.append` not checking names for equality (:issue:`48288`)
- Bug in :meth:`MultiIndex.symmetric_difference` losing extension array (:issue:`48607`)
- Bug in :func:`read_sas` caused fragmentation of :class:`DataFrame` and raised :class:`.errors.PerformanceWarning` (:issue:`48595`)
- Bug in :func:`read_csv` for a single-line csv with fewer columns than
names
raised :class:`.errors.ParserError` withengine="c"
(:issue:`47566`)
- Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, raising
UnicodeDecodeError
when a locale-specific directive was passed (:issue:`46319`)
ax.set_xlim
was sometimes raisingUserWarning
which users couldn't address due toset_xlim
not accepting parsing arguments - the converter now uses :func:`Timestamp` instead (:issue:`49148`)
- Bug in :class:`.ExponentialMovingWindow` with
online
not raising aNotImplementedError
for unsupported operations (:issue:`48834`) - Bug in :meth:`DataFrameGroupBy.sample` raises
ValueError
when the object is empty (:issue:`48459`) - Bug in :meth:`Series.groupby` raises
ValueError
when an entry of the index is equal to the name of the index (:issue:`48567`) - Bug in :meth:`DataFrameGroupBy.resample` produces inconsistent results when passing empty DataFrame (:issue:`47705`)
- Bug in :meth:`DataFrame.pivot_table` raising
TypeError
for nullable dtype andmargins=True
(:issue:`48681`) - Bug in :meth:`DataFrame.unstack` and :meth:`Series.unstack` unstacking wrong level of :class:`MultiIndex` when :class:`MultiIndex` has mixed names (:issue:`48763`)
- Bug in :meth:`DataFrame.pivot` not respecting
None
as column name (:issue:`48293`) - Bug in :func:`join` when
left_on
orright_on
is or includes a :class:`CategoricalIndex` incorrectly raisingAttributeError
(:issue:`48464`)
- Bug in :meth:`Series.mean` overflowing unnecessarily with nullable integers (:issue:`48378`)
- Bug when concatenating an empty DataFrame with an ExtensionDtype to another DataFrame with the same ExtensionDtype, the resulting dtype turned into object (:issue:`48510`)
- Fixed metadata propagation in :meth:`DataFrame.corr` and :meth:`DataFrame.cov` (:issue:`28283`)