These are the changes in pandas 2.2.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
Calamine engine for :func:`read_excel`
The calamine
engine was added to :func:`read_excel`.
It uses python-calamine
, which provides Python bindings for the Rust library calamine.
This engine supports Excel files (.xlsx
, .xlsm
, .xls
, .xlsb
) and OpenDocument spreadsheets (.ods
) (:issue:`50395`).
There are two advantages of this engine:
- Calamine is often faster than other engines, some benchmarks show results up to 5x faster than 'openpyxl', 20x - 'odf', 4x - 'pyxlsb', and 1.5x - 'xlrd'. But, 'openpyxl' and 'pyxlsb' are faster in reading a few rows from large files because of lazy iteration over rows.
- Calamine supports the recognition of datetime in
.xlsb
files, unlike 'pyxlsb' which is the only other engine in pandas that can read.xlsb
files.
pd.read_excel("path_to_file.xlsb", engine="calamine")
For more, see :ref:`io.calamine` in the user guide on IO tools.
The Series.struct
accessor provides attributes and methods for processing
data with struct[pyarrow]
dtype Series. For example,
:meth:`Series.struct.explode` converts PyArrow structured data to a pandas
DataFrame. (:issue:`54938`)
.. ipython:: python import pyarrow as pa series = pd.Series( [ {"project": "pandas", "version": "2.2.0"}, {"project": "numpy", "version": "1.25.2"}, {"project": "pyarrow", "version": "13.0.0"}, ], dtype=pd.ArrowDtype( pa.struct([ ("project", pa.string()), ("version", pa.string()), ]) ), ) series.struct.explode()
- :func:`read_csv` now supports
on_bad_lines
parameter withengine="pyarrow"
. (:issue:`54480`) - :meth:`ExtensionArray._explode` interface method added to allow extension type implementations of the
explode
method (:issue:`54833`) - DataFrame.apply now allows the usage of numba (via
engine="numba"
) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`)
These are bug fixes that might have notable behavior changes.
:func:`merge` and :meth:`DataFrame.join` now consistently follow documented sort behavior
In previous versions of pandas, :func:`merge` and :meth:`DataFrame.join` did not always return a result that followed the documented sort behavior. pandas now follows the documented sort behavior in merge and join operations (:issue:`54611`).
As documented, sort=True
sorts the join keys lexicographically in the resulting
:class:`DataFrame`. With sort=False
, the order of the join keys depends on the
join type (how
keyword):
how="left"
: preserve the order of the left keyshow="right"
: preserve the order of the right keyshow="inner"
: preserve the order of the left keyshow="outer"
: sort keys lexicographically
One example with changing behavior is inner joins with non-unique left join keys
and sort=False
:
.. ipython:: python left = pd.DataFrame({"a": [1, 2, 1]}) right = pd.DataFrame({"a": [1, 2]}) result = pd.merge(left, right, how="inner", on="a", sort=False)
Old Behavior
In [5]: result
Out[5]:
a
0 1
1 1
2 2
New Behavior
.. ipython:: python result
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
X | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
X |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
The alias M
is deprecated in favour of ME
for offsets, please use ME
for "month end" instead of M
(:issue:`9586`)
For example:
Previous behavior:
In [7]: pd.date_range('2020-01-01', periods=3, freq='M')
Out [7]:
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31'],
dtype='datetime64[ns]', freq='M')
Future behavior:
.. ipython:: python pd.date_range('2020-01-01', periods=3, freq='ME')
- Changed :meth:`Timedelta.resolution_string` to return
min
,s
,ms
,us
, andns
instead ofT
,S
,L
,U
, andN
, for compatibility with respective deprecations in frequency aliases (:issue:`52536`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_clipboard`. (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_csv` except
path_or_buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_dict`. (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_excel` except
excel_writer
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_gbq` except
destination_table
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_hdf` except
path_or_buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_html` except
buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_json` except
path_or_buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_latex` except
buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_markdown` except
buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_parquet` except
path
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_pickle` except
path
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_string` except
buf
. (:issue:`54229`) - Deprecated automatic downcasting of object-dtype results in :meth:`Series.replace` and :meth:`DataFrame.replace`, explicitly call
result = result.infer_objects(copy=False)
instead. To opt in to the future version, usepd.set_option("future.no_silent_downcasting", True)
(:issue:`54710`) - Deprecated downcasting behavior in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, :meth:`DataFrame.mask`, :meth:`Series.clip`, :meth:`DataFrame.clip`; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Call
result.infer_objects(copy=False)
on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, usepd.set_option("future.no_silent_downcasting", True)
(:issue:`53656`) - Deprecated including the groups in computations when using :meth:`DataFrameGroupBy.apply` and :meth:`DataFrameGroupBy.resample`; pass
include_groups=False
to exclude the groups (:issue:`7155`) - Deprecated not passing a tuple to :class:`DataFrameGroupBy.get_group` or :class:`SeriesGroupBy.get_group` when grouping by a length-1 list-like (:issue:`25971`)
- Deprecated strings
S
,U
, andN
denoting units in :func:`to_timedelta` (:issue:`52536`) - Deprecated strings
T
,S
,L
,U
, andN
denoting frequencies in :class:`Minute`, :class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`52536`) - Deprecated strings
T
,S
,L
,U
, andN
denoting units in :class:`Timedelta` (:issue:`52536`) - Deprecated the extension test classes
BaseNoReduceTests
,BaseBooleanReduceTests
, andBaseNumericReduceTests
, useBaseReduceTests
instead (:issue:`54663`) - Deprecated the option
mode.data_manager
and theArrayManager
; only theBlockManager
will be available in future versions (:issue:`55043`) - Deprecating downcasting the results of :meth:`DataFrame.fillna`, :meth:`Series.fillna`, :meth:`DataFrame.ffill`, :meth:`Series.ffill`, :meth:`DataFrame.bfill`, :meth:`Series.bfill` in object-dtype cases. To opt in to the future version, use
pd.set_option("future.no_silent_downcasting", True)
(:issue:`54261`)
- Performance improvement in :func:`concat` with
axis=1
and objects with unaligned indexes (:issue:`55084`) - Performance improvement in :func:`to_dict` on converting DataFrame to dictionary (:issue:`50990`)
- Performance improvement in :meth:`DataFrame.groupby` when aggregating pyarrow timestamp and duration dtypes (:issue:`55031`)
- Performance improvement in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` when indexed by a :class:`MultiIndex` (:issue:`54835`)
- Performance improvement in :meth:`Index.difference` (:issue:`55108`)
- Performance improvement when indexing with more than 4 keys (:issue:`54550`)
- Performance improvement when localizing time to UTC (:issue:`55241`)
- Bug in :class:`AbstractHolidayCalendar` where timezone data was not propagated when computing holiday observances (:issue:`54580`)
- Bug in :class:`pandas.core.window.Rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with
closed='left'
andclosed='neither'
(:issue:`20712`) - Bug in :meth:`DataFrame.apply` where passing
raw=True
ignoredargs
passed to the applied function (:issue:`55009`) - Bug in :meth:`pandas.read_excel` with a ODS file without cached formatted cell for float values (:issue:`55219`)
- :meth:`Categorical.isin` raising
InvalidIndexError
for categorical containing overlapping :class:`Interval` values (:issue:`34974`)
- Bug in :func:`read_csv` with
engine="pyarrow"
causing rounding errors for large integers (:issue:`52505`)
- Bug in :class:`Interval`
__repr__
not displaying UTC offsets for :class:`Timestamp` bounds. Additionally the hour, minute and second components will now be shown. (:issue:`55015`) - Bug in :meth:`IntervalIndex.get_indexer` with datetime or timedelta intervals incorrectly matching on integer targets (:issue:`47772`)
- Bug in :meth:`IntervalIndex.get_indexer` with timezone-aware datetime intervals incorrectly matching on a sequence of timezone-naive targets (:issue:`47772`)
- Bug in :meth:`Index.difference` not returning a unique set of values when
other
is empty orother
is considered non-comparable (:issue:`55113`) - Bug in setting :class:`Categorical` values into a :class:`DataFrame` with numpy dtypes raising
RecursionError
(:issue:`52927`)
- Bug in :func:`read_csv` where
on_bad_lines="warn"
would write tostderr
instead of raise a Python warning. This now yields a :class:`.errors.ParserWarning` (:issue:`54296`) - Bug in :func:`read_excel`, with
engine="xlrd"
(xls
files) erroring when file contains NaNs/Infs (:issue:`54564`) - Bug in :func:`to_excel`, with
OdsWriter
(ods
files) writing boolean/string value (:issue:`54994`)
- Bug in :meth:`DataFrame.plot.box` with
vert=False
and a matplotlibAxes
created withsharey=True
(:issue:`54941`)
- Bug in :meth:`DataFrameGroupBy.idxmax`, :meth:`DataFrameGroupBy.idxmin`, :meth:`SeriesGroupBy.idxmax`, and :meth:`SeriesGroupBy.idxmin` would not consistently raise when grouping with
observed=False
and unobserved categoricals (:issue:`10694`)
- Bug in :func:`concat` ignoring
sort
parameter when passed :class:`DatetimeIndex` indexes (:issue:`54769`) - Bug in :func:`merge` returning columns in incorrect order when left and/or right is empty (:issue:`51929`)
- Bug in :func:`cut` incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (:issue:`54964`)