These are the changes in pandas 1.5.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
- New method :meth:`.Styler.to_string` for alternative customisable output methods (:issue:`44502`)
- New keyword argument
rename
added to :meth:`.Styler.format_index` to allow simple label string replacement (:issue:`45288`)- Various bug fixes, see below.
- Added the ability to render
border
andborder-{side}
CSS properties in Excel (:issue:`42276`)- Added a new method :meth:`.Styler.concat` which allows adding customised footer rows to visualise additional calculations on the data, e.g. totals and counts etc. (:issue:`43875`, :issue:`46186`)
- :meth:`.Styler.highlight_null` now accepts
color
consistently with other builtin methods and deprecatesnull_color
although this remains backwards compatible (:issue:`45907`)
- :meth:`MultiIndex.to_frame` now supports the argument
allow_duplicates
and raises on duplicate labels if it is missing or False (:issue:`45245`) - :class:`StringArray` now accepts array-likes containing nan-likes (
None
,np.nan
) for thevalues
parameter in its constructor in addition to strings and :attr:`pandas.NA`. (:issue:`40839`) - Improved the rendering of
categories
in :class:`CategoricalIndex` (:issue:`45218`) - :meth:`to_numeric` now preserves float64 arrays when downcasting would generate values not representable in float32 (:issue:`43693`)
- :meth:`Series.reset_index` and :meth:`DataFrame.reset_index` now support the argument
allow_duplicates
(:issue:`44410`) - :meth:`.GroupBy.min` and :meth:`.GroupBy.max` now supports Numba execution with the
engine
keyword (:issue:`45428`) - :meth:`DataFrame.rolling` and :meth:`Series.rolling` now support a
step
parameter with fixed-length windows (:issue:`15354`) - Implemented a
bool
-dtype :class:`Index`, passing a bool-dtype array-like topd.Index
will now retainbool
dtype instead of casting toobject
(:issue:`45061`) - Implemented a complex-dtype :class:`Index`, passing a complex-dtype array-like to
pd.Index
will now retain complex dtype instead of casting toobject
(:issue:`45845`) - Improved error message in :class:`~pandas.core.window.Rolling` when
window
is a frequency andNaT
is in the rolling axis (:issue:`46087`) - :class:`Series` and :class:`DataFrame` with
IntegerDtype
now supports bitwise operations (:issue:`34463`)
These are bug fixes that might have notable behavior changes.
- Fixed bug in :class:`CSSToExcelConverter` leading to
TypeError
when border color provided without border style forxlsxwriter
engine (:issue:`42276`)
A transform is an operation whose result has the same size as its input. When the
result is a :class:`DataFrame` or :class:`Series`, it is also required that the
index of the result matches that of the input. In pandas 1.4, using
:meth:`.DataFrameGroupBy.transform` or :meth:`.SeriesGroupBy.transform` with null
values in the groups and dropna=True
gave incorrect results. Demonstrated by the
examples below, the incorrect results either contained incorrect values, or the result
did not have the same index as the input.
.. ipython:: python df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [2, 3, 4]})
Old behavior:
In [3]: df.groupby('a', dropna=True).transform(lambda x: x)
Out[3]:
b
0 2
1 3
New behavior:
.. ipython:: python df.groupby('a', dropna=True).transform(lambda x: x)
Similar to other IO methods, :func:`pandas.read_xml` now supports assigning specific dtypes to columns, apply converter methods, and parse dates (:issue:`43567`).
.. ipython:: python xml_dates = """<?xml version='1.0' encoding='utf-8'?> <data> <row> <shape>square</shape> <degrees>00360</degrees> <sides>4.0</sides> <date>2020-01-01</date> </row> <row> <shape>circle</shape> <degrees>00360</degrees> <sides/> <date>2021-01-01</date> </row> <row> <shape>triangle</shape> <degrees>00180</degrees> <sides>3.0</sides> <date>2022-01-01</date> </row> </data>""" df = pd.read_xml( xml_dates, dtype={'sides': 'Int64'}, converters={'degrees': str}, parse_dates=['date'] ) df df.dtypes
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
mypy (dev) | 0.931 | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
X |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
In a future version, integer slicing on a :class:`Series` with a :class:`Int64Index` or :class:`RangeIndex` will be treated as label-based, not positional. This will make the behavior consistent with other :meth:`Series.__getitem__` and :meth:`Series.__setitem__` behaviors (:issue:`45162`).
For example:
.. ipython:: python ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11])
In the old behavior, ser[2:4]
treats the slice as positional:
Old behavior:
In [3]: ser[2:4]
Out[3]:
5 3
7 4
dtype: int64
In a future version, this will be treated as label-based:
Future behavior:
In [4]: ser.loc[2:4]
Out[4]:
2 1
3 2
dtype: int64
To retain the old behavior, use series.iloc[i:j]
. To get the future behavior,
use series.loc[i:j]
.
Slicing on a :class:`DataFrame` will not be affected.
:class:`ExcelWriter` attributes
All attributes of :class:`ExcelWriter` were previously documented as not
public. However some third party Excel engines documented accessing
ExcelWriter.book
or ExcelWriter.sheets
, and users were utilizing these
and possibly other attributes. Previously these attributes were not safe to use;
e.g. modifications to ExcelWriter.book
would not update ExcelWriter.sheets
and conversely. In order to support this, pandas has made some attributes public
and improved their implementations so that they may now be safely used. (:issue:`45572`)
The following attributes are now public and considered safe to access.
book
check_extension
close
date_format
datetime_format
engine
if_sheet_exists
sheets
supported_extensions
The following attributes have been deprecated. They now raise a FutureWarning
when accessed and will be removed in a future version. Users should be aware
that their usage is considered unsafe, and can lead to unexpected results.
cur_sheet
handles
path
save
write_cells
See the documentation of :class:`ExcelWriter` for further details.
- Deprecated the keyword
line_terminator
in :meth:`DataFrame.to_csv` and :meth:`Series.to_csv`, uselineterminator
instead; this is for consistency with :func:`read_csv` and the standard library 'csv' module (:issue:`9568`) - Deprecated behavior of :meth:`SparseArray.astype`, :meth:`Series.astype`, and :meth:`DataFrame.astype` with :class:`SparseDtype` when passing a non-sparse
dtype
. In a future version, this will cast to that non-sparse dtype instead of wrapping it in a :class:`SparseDtype` (:issue:`34457`) - Deprecated behavior of :meth:`DatetimeIndex.intersection` and :meth:`DatetimeIndex.symmetric_difference` (
union
behavior was already deprecated in version 1.3.0) with mixed time zones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`, :issue:`45357`) - Deprecated :meth:`DataFrame.iteritems`, :meth:`Series.iteritems`, :meth:`HDFStore.iteritems` in favor of :meth:`DataFrame.items`, :meth:`Series.items`, :meth:`HDFStore.items` (:issue:`45321`)
- Deprecated :meth:`Series.is_monotonic` and :meth:`Index.is_monotonic` in favor of :meth:`Series.is_monotonic_increasing` and :meth:`Index.is_monotonic_increasing` (:issue:`45422`, :issue:`21335`)
- Deprecated behavior of :meth:`DatetimeIndex.astype`, :meth:`TimedeltaIndex.astype`, :meth:`PeriodIndex.astype` when converting to an integer dtype other than
int64
. In a future version, these will convert to exactly the specified dtype (instead of alwaysint64
) and will raise if the conversion overflows (:issue:`45034`) - Deprecated the
__array_wrap__
method of DataFrame and Series, rely on standard numpy ufuncs instead (:issue:`45451`) - Deprecated treating float-dtype data as wall-times when passed with a timezone to :class:`Series` or :class:`DatetimeIndex` (:issue:`45573`)
- Deprecated the behavior of :meth:`Series.fillna` and :meth:`DataFrame.fillna` with
timedelta64[ns]
dtype and incompatible fill value; in a future version this will cast to a common dtype (usually object) instead of raising, matching the behavior of other dtypes (:issue:`45746`) - Deprecated the
warn
parameter in :func:`infer_freq` (:issue:`45947`) - Deprecated allowing non-keyword arguments in :meth:`ExtensionArray.argsort` (:issue:`46134`)
- Deprecated treating all-bool
object
-dtype columns as bool-like in :meth:`DataFrame.any` and :meth:`DataFrame.all` withbool_only=True
, explicitly cast to bool instead (:issue:`46188`)
- Performance improvement in :meth:`DataFrame.corrwith` for column-wise (axis=0) Pearson and Spearman correlation when other is a :class:`Series` (:issue:`46174`)
- Performance improvement in :meth:`.GroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`)
- Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`)
- Performance improvement in :meth:`.GroupBy.diff` (:issue:`16706`)
- Performance improvement in :meth:`.GroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`)
- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions when only a single group exists (:issue:`44977`)
- Performance improvement in :meth:`MultiIndex.get_locs` (:issue:`45681`, :issue:`46040`)
- Performance improvement in :func:`merge` when left and/or right are empty (:issue:`45838`)
- Performance improvement in :meth:`DataFrame.join` when left and/or right are empty (:issue:`46015`)
- Performance improvement in :meth:`DataFrame.reindex` and :meth:`Series.reindex` when target is a :class:`MultiIndex` (:issue:`46235`)
- Performance improvement in :func:`factorize` (:issue:`46109`)
- Performance improvement in :class:`DataFrame` and :class:`Series` constructors for extension dtype scalars (:issue:`45854`)
- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`)
- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains
NaN
values incorrectly raising instead of casting tofloat64
(:issue:`45362`)
- Bug in :meth:`DataFrame.quantile` with datetime-like dtypes and no rows incorrectly returning
float64
dtype instead of retaining datetime-like dtype (:issue:`41544`) - Bug in :func:`to_datetime` with sequences of
np.str_
objects incorrectly raising (:issue:`32264`) - Bug in :class:`Timestamp` construction when passing datetime components as positional arguments and
tzinfo
as a keyword argument incorrectly raising (:issue:`31929`) - Bug in :meth:`Index.astype` when casting from object dtype to
timedelta64[ns]
dtype incorrectly castingnp.datetime64("NaT")
values tonp.timedelta64("NaT")
instead of raising (:issue:`45722`) - Bug in :meth:`SeriesGroupBy.value_counts` index when passing categorical column (:issue:`44324`)
- Bug in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`)
- Bug in operations with array-likes with
dtype="boolean"
and :attr:`NA` incorrectly altering the array in-place (:issue:`45421`) - Bug in division,
pow
andmod
operations on array-likes withdtype="boolean"
not being like theirnp.bool_
counterparts (:issue:`46063`) - Bug in multiplying a :class:`Series` with
IntegerDtype
orFloatingDtype
by an array-like withtimedelta64[ns]
dtype incorrectly raising (:issue:`45622`)
- Bug in :meth:`DataFrame.astype` not preserving subclasses (:issue:`40810`)
- Bug in constructing a :class:`Series` from a float-containing list or a floating-dtype ndarray-like (e.g.
dask.Array
) and an integer dtype raising instead of casting like we would with annp.ndarray
(:issue:`40110`) - Bug in :meth:`Float64Index.astype` to unsigned integer dtype incorrectly casting to
np.int64
dtype (:issue:`45309`) - Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`)
- Bug in :func:`array` with
FloatingDtype
and values containing float-castable strings incorrectly raising (:issue:`45424`) - Bug when comparing string and datetime64ns objects causing
OverflowError
exception. (:issue:`45506`)
- Bug in :meth:`str.startswith` and :meth:`str.endswith` when using other series as parameter _pat_. Now raises
TypeError
(:issue:`3485`)
- Bug in :meth:`IntervalArray.__setitem__` when setting
np.nan
into an integer-backed array raisingValueError
instead ofTypeError
(:issue:`45484`)
- Bug in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between
frame.at[x, y]
vsframe[y].loc[x]
(:issue:`22372`) - Bug in :meth:`DataFrame.iloc` where indexing a single row on a :class:`DataFrame` with a single ExtensionDtype column gave a copy instead of a view on the underlying data (:issue:`45241`)
- Bug in :meth:`Series.align` does not create :class:`MultiIndex` with union of levels when both MultiIndexes intersections are identical (:issue:`45224`)
- Bug in setting a NA value (
None
ornp.nan
) into a :class:`Series` with int-based :class:`IntervalDtype` incorrectly casting to object dtype instead of a float-based :class:`IntervalDtype` (:issue:`45568`) - Bug in indexing setting values into an
ExtensionDtype
column withdf.iloc[:, i] = values
withvalues
having the same dtype asdf.iloc[:, i]
incorrectly inserting a new array instead of setting in-place (:issue:`33457`) - Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a
ValueError
was raised instead of casting to a common dtype (:issue:`45070`) - Bug in :meth:`Series.__setitem__` when setting incompatible values into a
PeriodDtype
orIntervalDtype
:class:`Series` raising when indexing with a boolean mask but coercing when indexing with otherwise-equivalent indexers; these now consistently coerce, along with :meth:`Series.mask` and :meth:`Series.where` (:issue:`45768`) - Bug in :meth:`DataFrame.where` with multiple columns with datetime-like dtypes failing to downcast results consistent with other dtypes (:issue:`45837`)
- Bug in :meth:`Series.loc.__setitem__` and :meth:`Series.loc.__getitem__` not raising when using multiple keys without using a :class:`MultiIndex` (:issue:`13831`)
- Bug when setting a value too large for a :class:`Series` dtype failing to coerce to a common type (:issue:`26049`, :issue:`32878`)
- Bug in :meth:`loc.__setitem__` treating
range
keys as positional instead of label-based (:issue:`45479`) - Bug in :meth:`Series.__setitem__` when setting
boolean
dtype values containingNA
incorrectly raising instead of casting toboolean
dtype (:issue:`45462`) - Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtpye :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as
np.nan
(:issue:`44199`) - Bug in :meth:`Series.__setitem__` with
datetime64[ns]
dtype, an all-False
boolean mask, and an incompatible value incorrectly casting toobject
instead of retainingdatetime64[ns]
dtype (:issue:`45967`) - Bug in :meth:`Index.__getitem__` raising
ValueError
when indexer is from boolean dtype withNA
(:issue:`45806`) - Bug in :meth:`Series.mask` with
inplace=True
or setting values with a boolean mask with small integer dtypes incorrectly raising (:issue:`45750`) - Bug in :meth:`DataFrame.mask` with
inplace=True
andExtensionDtype
columns incorrectly raising (:issue:`45577`) - Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`)
- Bug in indexing on a :class:`DatetimeIndex` with a
np.str_
key incorrectly raising (:issue:`45580`) - Bug in :meth:`CategoricalIndex.get_indexer` when index contains
NaN
values, resulting in elements that are in target but not present in the index to be mapped to the index of the NaN element, instead of -1 (:issue:`45361`) - Bug in setting large integer values into :class:`Series` with
float32
orfloat16
dtype incorrectly altering these values instead of coercing tofloat64
dtype (:issue:`45844`) - Bug in :meth:`Series.asof` and :meth:`DataFrame.asof` incorrectly casting bool-dtype results to
float64
dtype (:issue:`16063`)
- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with
downcast
keyword not being respected in some cases where there are no NA values present (:issue:`45423`) - Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with :class:`IntervalDtype` and incompatible value raising instead of casting to a common (usually object) dtype (:issue:`45796`)
- Bug in :meth:`DataFrame.interpolate` with object-dtype column not returning a copy with
inplace=False
(:issue:`45791`)
- Bug in :meth:`Series.to_numpy` where multiindexed Series could not be converted to numpy arrays when an
na_value
was supplied (:issue:`45774`) - Bug in :class:`MultiIndex.equals` not commutative when only one side has extension array dtype (:issue:`46026`)
- Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains
-np.inf
(:issue:`45350`) - Bug in :func:`read_excel` results in an infinite loop with certain
skiprows
callables (:issue:`45585`) - Bug in :meth:`DataFrame.info` where a new line at the end of the output is omitted when called on an empty :class:`DataFrame` (:issue:`45494`)
- Bug in :func:`read_csv` not recognizing line break for
on_bad_lines="warn"
forengine="c"
(:issue:`41710`) - Bug in :meth:`DataFrame.to_csv` not respecting
float_format
forFloat64
dtype (:issue:`45991`) - Bug in :func:`read_csv` not respecting a specified converter to index columns in all cases (:issue:`40589`)
- Bug in :func:`read_parquet` when
engine="pyarrow"
which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`) - Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a
.ods
file (:issue:`45793`) - Bug in Parquet roundtrip for Interval dtype with
datetime64[ns]
subtype (:issue:`45881`) - Bug in :func:`read_excel` when reading a
.ods
file with newlines between xml elements(:issue:`45598`)
- Bug in subtraction of :class:`Period` from :class:`PeriodArray` returning wrong results (:issue:`45999`)
- Bug in :meth:`DataFrame.plot.barh` that prevented labeling the x-axis and
xlabel
updating the y-axis label (:issue:`45144`) - Bug in :meth:`DataFrame.plot.box` that prevented labeling the x-axis (:issue:`45463`)
- Bug in :meth:`DataFrame.boxplot` that prevented passing in
xlabel
andylabel
(:issue:`45463`) - Bug in :meth:`DataFrame.boxplot` that prevented specifying
vert=False
(:issue:`36918`) - Bug in :meth:`DataFrame.scatter` that prevented specifying
norm
(:issue:`45809`)
- Bug in :meth:`DataFrame.resample` ignoring
closed="right"
on :class:`TimedeltaIndex` (:issue:`45414`) - Bug in :meth:`.DataFrameGroupBy.transform` fails when
func="size"
and the input DataFrame has multiple columns (:issue:`27469`) - Bug in :meth:`.DataFrameGroupBy.size` and :meth:`.DataFrameGroupBy.transform` with
func="size"
produced incorrect results whenaxis=1
(:issue:`45715`) - Bug in :meth:`.ExponentialMovingWindow.mean` with
axis=1
andengine='numba'
when the :class:`DataFrame` has more columns than rows (:issue:`46086`) - Bug when using
engine="numba"
would return the same jitted function when modifyingengine_kwargs
(:issue:`46086`) - Bug in :meth:`.DataFrameGroupby.transform` fails when
axis=1
andfunc
is"first"
or"last"
(:issue:`45986`) - Bug in :meth:`DataFrameGroupby.cumsum` with
skipna=False
giving incorrect results (:issue:`46216`) - Bug in :meth:`.GroupBy.cumsum` with
timedelta64[ns]
dtype failing to recognizeNaT
as a null value (:issue:`46216`) - Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`)
- Bug in :func:`concat` between a :class:`Series` with integer dtype and another with :class:`CategoricalDtype` with integer categories and containing
NaN
values casting to object dtype instead offloat64
(:issue:`45359`) - Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`)
- Bug in :meth:`Series.where` and :meth:`DataFrame.where` with
SparseDtype
failing to retain the array'sfill_value
(:issue:`45691`)
- Bug in :meth:`IntegerArray.searchsorted` and :meth:`FloatingArray.searchsorted` returning inconsistent results when acting on
np.nan
(:issue:`45255`)
- Bug when attempting to apply styling functions to an empty DataFrame subset (:issue:`45313`)