These are the changes in pandas 1.6.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
- :func:`read_sas` now supports using
encoding='infer'
to correctly read and use the encoding specified by the sas file. (:issue:`48048`) - :meth:`.DataFrameGroupBy.quantile` and :meth:`.SeriesGroupBy.quantile` now preserve nullable dtypes instead of casting to numpy dtypes (:issue:`37493`)
- :meth:`Series.add_suffix`, :meth:`DataFrame.add_suffix`, :meth:`Series.add_prefix` and :meth:`DataFrame.add_prefix` support an
axis
argument. Ifaxis
is set, the default behaviour of which axis to consider can be overwritten (:issue:`47819`) - :func:`assert_frame_equal` now shows the first element where the DataFrames differ, analogously to
pytest
's output (:issue:`47910`) - Added
index
parameter to :meth:`DataFrame.to_dict` (:issue:`46398`) - Added metadata propagation for binary operators on :class:`DataFrame` (:issue:`28283`)
- :class:`.CategoricalConversionWarning`, :class:`.InvalidComparison`, :class:`.InvalidVersion`, :class:`.LossySetitemError`, and :class:`.NoBufferPresent` are now exposed in
pandas.errors
(:issue:`27656`)
These are bug fixes that might have notable behavior changes.
:meth:`.GroupBy.cumsum` and :meth:`.GroupBy.cumprod` overflow instead of lossy casting to float
In previous versions we cast to float when applying cumsum
and cumprod
which
lead to incorrect results even if the result could be hold by int64
dtype.
Additionally, the aggregation overflows consistent with numpy and the regular
:meth:`DataFrame.cumprod` and :meth:`DataFrame.cumsum` methods when the limit of
int64
is reached (:issue:`37493`).
Old Behavior
In [1]: df = pd.DataFrame({"key": ["b"] * 7, "value": 625})
In [2]: df.groupby("key")["value"].cumprod()[5]
Out[2]: 5.960464477539062e+16
We return incorrect results with the 6th value.
New Behavior
.. ipython:: python df = pd.DataFrame({"key": ["b"] * 7, "value": 625}) df.groupby("key")["value"].cumprod()
We overflow with the 7th value, but the 6th value is still correct.
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
X | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
X |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
- :func:`read_csv`: specifying an incorrect number of columns with
index_col
of now raisesParserError
instead ofIndexError
when using the c parser.
- Performance improvement in :meth:`.DataFrameGroupBy.median` and :meth:`.SeriesGroupBy.median` and :meth:`.GroupBy.cumprod` for nullable dtypes (:issue:`37493`)
- Performance improvement in :meth:`MultiIndex.argsort` and :meth:`MultiIndex.sort_values` (:issue:`48406`)
- Performance improvement in :meth:`MultiIndex.size` (:issue:`48723`)
- Performance improvement in :meth:`MultiIndex.union` without missing values and without duplicates (:issue:`48505`)
- Performance improvement in :meth:`MultiIndex.difference` (:issue:`48606`)
- Performance improvement in :meth:`.DataFrameGroupBy.mean`, :meth:`.SeriesGroupBy.mean`, :meth:`.DataFrameGroupBy.var`, and :meth:`.SeriesGroupBy.var` for extension array dtypes (:issue:`37493`)
- Performance improvement in :meth:`MultiIndex.isin` when
level=None
(:issue:`48622`) - Performance improvement for :meth:`Series.value_counts` with nullable dtype (:issue:`48338`)
- Performance improvement for :class:`Series` constructor passing integer numpy array with nullable dtype (:issue:`48338`)
- Performance improvement for :class:`DatetimeIndex` constructor passing a list (:issue:`48609`)
- Performance improvement in :func:`merge` and :meth:`DataFrame.join` when joining on a sorted :class:`MultiIndex` (:issue:`48504`)
- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` for tuple-based indexing of a :class:`MultiIndex` (:issue:`48384`)
- Performance improvement for :meth:`MultiIndex.unique` (:issue:`48335`)
- Performance improvement in :meth:`DataFrame.join` when joining on a subset of a :class:`MultiIndex` (:issue:`48611`)
- Performance improvement for :meth:`MultiIndex.intersection` (:issue:`48604`)
- Performance improvement in
var
for nullable dtypes (:issue:`48379`). - Performance improvement to :func:`read_sas` with
blank_missing=True
(:issue:`48502`) - Memory improvement in :meth:`RangeIndex.sort_values` (:issue:`48801`)
- Bug in :func:`pandas.infer_freq`, raising
TypeError
when inferred on :class:`RangeIndex` (:issue:`47084`) - Bug in :class:`DatetimeIndex` constructor failing to raise when
tz=None
is explicitly specified in conjunction with timezone-awaredtype
or data (:issue:`48659`) - Bug in subtracting a
datetime
scalar from :class:`DatetimeIndex` failing to retain the originalfreq
attribute (:issue:`48818`)
- Bug in constructing :class:`Series` with
int64
dtype from a string list raising instead of casting (:issue:`44923`) - Bug in :meth:`DataFrame.eval` incorrectly raising an
AttributeError
when there are negative values in function call (:issue:`46471`) - Bug in :meth:`Series.convert_dtypes` not converting dtype to nullable dtype when :class:`Series` contains
NA
and has dtypeobject
(:issue:`48791`) - Bug where any :class:`ExtensionDtype` subclass with
kind="M"
would be interpreted as a timezone type (:issue:`34986`)
- Bug in :meth:`DataFrame.reindex` filling with wrong values when indexing columns and index for
uint
dtypes (:issue:`48184`) - Bug in :meth:`DataFrame.reindex` casting dtype to
object
when :class:`DataFrame` has single extension array column when re-indexingcolumns
andindex
(:issue:`48190`) - Bug in :func:`~DataFrame.describe` when formatting percentiles in the resulting index showed more decimals than needed (:issue:`46362`)
- Bug in :meth:`Index.equals` raising
TypeError
when :class:`Index` consists of tuples that containNA
(:issue:`48446`)
- Bug in :meth:`MultiIndex.difference` losing extension array dtype (:issue:`48606`)
- Bug in :class:`MultiIndex.set_levels` raising
IndexError
when setting empty level (:issue:`48636`) - Bug in :meth:`MultiIndex.unique` losing extension array dtype (:issue:`48335`)
- Bug in :meth:`MultiIndex.intersection` losing extension array (:issue:`48604`)
- Bug in :meth:`MultiIndex.union` losing extension array (:issue:`48498`, :issue:`48505`)
- Bug in :meth:`MultiIndex.append` not checking names for equality (:issue:`48288`)
- Bug in :meth:`MultiIndex.symmetric_difference` losing extension array (:issue:`48607`)
- Bug in :func:`read_sas` caused fragmentation of :class:`DataFrame` and raised :class:`.errors.PerformanceWarning` (:issue:`48595`)
- Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, raising
UnicodeDecodeError
when a locale-specific directive was passed (:issue:`46319`)
- Bug in :meth:`DataFrameGroupBy.sample` raises
ValueError
when the object is empty (:issue:`48459`)
- Bug in :meth:`DataFrame.pivot_table` raising
TypeError
for nullable dtype andmargins=True
(:issue:`48681`) - Bug in :meth:`DataFrame.pivot` not respecting
None
as column name (:issue:`48293`) - Bug in :func:`join` when
left_on
orright_on
is or includes a :class:`CategoricalIndex` incorrectly raisingAttributeError
(:issue:`48464`)
- Bug in :meth:`Series.mean` overflowing unnecessarily with nullable integers (:issue:`48378`)
- Bug when concatenating an empty DataFrame with an ExtensionDtype to another DataFrame with the same ExtensionDtype, the resulting dtype turned into object (:issue:`48510`)
- Fixed metadata propagation in :meth:`DataFrame.corr` and :meth:`DataFrame.cov` (:issue:`28283`)