These are the changes in pandas 2.2.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
:func:`read_sql` and :meth:`~DataFrame.to_sql` now work with Apache Arrow ADBC drivers. Compared to traditional drivers used via SQLAlchemy, ADBC drivers should provide significant performance improvements, better type support and cleaner nullability handling.
import adbc_driver_postgresql.dbapi as pg_dbapi
df = pd.DataFrame(
[
[1, 2, 3],
[4, 5, 6],
],
columns=['a', 'b', 'c']
)
uri = "postgresql://postgres:postgres@localhost/postgres"
with pg_dbapi.connect(uri) as conn:
df.to_sql("pandas_table", conn, index=False)
# for roundtripping
with pg_dbapi.connect(uri) as conn:
df2 = pd.read_sql("pandas_table", conn)
The Arrow type system offers a wider array of types that can more closely match what databases like PostgreSQL can offer. To illustrate, note this (non-exhaustive) listing of types available in different databases and pandas backends:
numpy/pandas | arrow | postgres | sqlite |
---|---|---|---|
int16/Int16 | int16 | SMALLINT | INTEGER |
int32/Int32 | int32 | INTEGER | INTEGER |
int64/Int64 | int64 | BIGINT | INTEGER |
float32 | float32 | REAL | REAL |
float64 | float64 | DOUBLE PRECISION | REAL |
object | string | TEXT | TEXT |
bool | bool_ |
BOOLEAN | |
datetime64[ns] | timestamp(us) | TIMESTAMP | |
datetime64[ns,tz] | timestamp(us,tz) | TIMESTAMPTZ | |
date32 | DATE | ||
month_day_nano_interval | INTERVAL | ||
binary | BINARY | BLOB | |
decimal128 | DECIMAL [1] | ||
list | ARRAY [1] | ||
struct |
|
Footnotes
[1] | (1, 2, 3) Not implemented as of writing, but theoretically possible |
If you are interested in preserving database types as best as possible
throughout the lifecycle of your DataFrame, users are encouraged to
leverage the dtype_backend="pyarrow"
argument of :func:`~pandas.read_sql`
# for roundtripping
with pg_dbapi.connect(uri) as conn:
df2 = pd.read_sql("pandas_table", conn, dtype_backend="pyarrow")
This will prevent your data from being converted to the traditional pandas/NumPy type system, which often converts SQL types in ways that make them impossible to round-trip.
For a full list of ADBC drivers and their development status, see the ADBC Driver Implementation Status documentation.
:meth:`ExtensionArray.to_numpy` will now convert to a suitable NumPy dtype instead
of object
dtype for nullable extension dtypes.
Old behavior:
In [1]: ser = pd.Series([1, 2, 3], dtype="Int64")
In [2]: ser.to_numpy()
Out[2]: array([1, 2, 3], dtype=object)
New behavior:
.. ipython:: python ser = pd.Series([1, 2, 3], dtype="Int64") ser.to_numpy()
The default NumPy dtype (without any arguments) is determined as follows:
- float dtypes are cast to NumPy floats
- integer dtypes without missing values are cast to NumPy integer dtypes
- integer dtypes with missing values are cast to NumPy float dtypes and
NaN
is used as missing value indicator - boolean dtypes without missing values are cast to NumPy bool dtype
- boolean dtypes with missing values keep object dtype
The Series.struct
accessor provides attributes and methods for processing
data with struct[pyarrow]
dtype Series. For example,
:meth:`Series.struct.explode` converts PyArrow structured data to a pandas
DataFrame. (:issue:`54938`)
.. ipython:: python import pyarrow as pa series = pd.Series( [ {"project": "pandas", "version": "2.2.0"}, {"project": "numpy", "version": "1.25.2"}, {"project": "pyarrow", "version": "13.0.0"}, ], dtype=pd.ArrowDtype( pa.struct([ ("project", pa.string()), ("version", pa.string()), ]) ), ) series.struct.explode()
The Series.list
accessor provides attributes and methods for processing
data with list[pyarrow]
dtype Series. For example,
:meth:`Series.list.__getitem__` allows indexing pyarrow lists in
a Series. (:issue:`55323`)
.. ipython:: python import pyarrow as pa series = pd.Series( [ [1, 2, 3], [4, 5], [6], ], dtype=pd.ArrowDtype( pa.list_(pa.int64()) ), ) series.list[0]
Calamine engine for :func:`read_excel`
The calamine
engine was added to :func:`read_excel`.
It uses python-calamine
, which provides Python bindings for the Rust library calamine.
This engine supports Excel files (.xlsx
, .xlsm
, .xls
, .xlsb
) and OpenDocument spreadsheets (.ods
) (:issue:`50395`).
There are two advantages of this engine:
- Calamine is often faster than other engines, some benchmarks show results up to 5x faster than 'openpyxl', 20x - 'odf', 4x - 'pyxlsb', and 1.5x - 'xlrd'. But, 'openpyxl' and 'pyxlsb' are faster in reading a few rows from large files because of lazy iteration over rows.
- Calamine supports the recognition of datetime in
.xlsb
files, unlike 'pyxlsb' which is the only other engine in pandas that can read.xlsb
files.
pd.read_excel("path_to_file.xlsb", engine="calamine")
For more, see :ref:`io.calamine` in the user guide on IO tools.
- :meth:`~DataFrame.to_sql` with method parameter set to
multi
works with Oracle on the backend - :attr:`Series.attrs` / :attr:`DataFrame.attrs` now uses a deepcopy for propagating
attrs
(:issue:`54134`). - :func:`get_dummies` now returning extension dtypes
boolean
orbool[pyarrow]
that are compatible with the input dtype (:issue:`56273`) - :func:`read_csv` now supports
on_bad_lines
parameter withengine="pyarrow"
. (:issue:`54480`) - :func:`read_sas` returns
datetime64
dtypes with resolutions better matching those stored natively in SAS, and avoids returning object-dtype in cases that cannot be stored withdatetime64[ns]
dtype (:issue:`56127`) - :func:`read_spss` now returns a :class:`DataFrame` that stores the metadata in :attr:`DataFrame.attrs`. (:issue:`54264`)
- :func:`tseries.api.guess_datetime_format` is now part of the public API (:issue:`54727`)
- :meth:`ExtensionArray._explode` interface method added to allow extension type implementations of the
explode
method (:issue:`54833`) - :meth:`ExtensionArray.duplicated` added to allow extension type implementations of the
duplicated
method (:issue:`55255`) - Allow passing
read_only
,data_only
andkeep_links
arguments to openpyxl usingengine_kwargs
of :func:`read_excel` (:issue:`55027`) - DataFrame.apply now allows the usage of numba (via
engine="numba"
) to JIT compile the passed function, allowing for potential speedups (:issue:`54666`) - Implement masked algorithms for :meth:`Series.value_counts` (:issue:`54984`)
- Implemented :meth:`Series.str.extract` for :class:`ArrowDtype` (:issue:`56268`)
- Improved error message that appears in :meth:`DatetimeIndex.to_period` with frequencies which are not supported as period frequencies, such as "BMS" (:issue:`56243`)
- Improved error message when constructing :class:`Period` with invalid offsets such as "QS" (:issue:`55785`)
These are bug fixes that might have notable behavior changes.
:func:`merge` and :meth:`DataFrame.join` now consistently follow documented sort behavior
In previous versions of pandas, :func:`merge` and :meth:`DataFrame.join` did not always return a result that followed the documented sort behavior. pandas now follows the documented sort behavior in merge and join operations (:issue:`54611`, :issue:`56426`, :issue:`56443`).
As documented, sort=True
sorts the join keys lexicographically in the resulting
:class:`DataFrame`. With sort=False
, the order of the join keys depends on the
join type (how
keyword):
how="left"
: preserve the order of the left keyshow="right"
: preserve the order of the right keyshow="inner"
: preserve the order of the left keyshow="outer"
: sort keys lexicographically
One example with changing behavior is inner joins with non-unique left join keys
and sort=False
:
.. ipython:: python left = pd.DataFrame({"a": [1, 2, 1]}) right = pd.DataFrame({"a": [1, 2]}) result = pd.merge(left, right, how="inner", on="a", sort=False)
Old Behavior
In [5]: result
Out[5]:
a
0 1
1 1
2 2
New Behavior
.. ipython:: python result
:func:`merge` and :meth:`DataFrame.join` no longer reorder levels when levels differ
In previous versions of pandas, :func:`merge` and :meth:`DataFrame.join` would reorder index levels when joining on two indexes with different levels (:issue:`34133`).
.. ipython:: python left = pd.DataFrame({"left": 1}, index=pd.MultiIndex.from_tuples([("x", 1), ("x", 2)], names=["A", "B"])) right = pd.DataFrame({"right": 2}, index=pd.MultiIndex.from_tuples([(1, 1), (2, 2)], names=["B", "C"])) result = left.join(right)
Old Behavior
In [5]: result
Out[5]:
left right
B A C
1 x 1 1 2
2 x 2 1 2
New Behavior
.. ipython:: python result
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
X | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
X |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
check_exact
now only takes effect for floating-point dtypes in :func:`testing.assert_frame_equal` and :func:`testing.assert_series_equal`. In particular, integer dtypes are always checked exactly (:issue:`55882`)
Deprecated the following frequency aliases (:issue:`9586`):
offsets | deprecated aliases | new aliases |
---|---|---|
:class:`MonthEnd` | M |
ME |
:class:`BusinessMonthEnd` | BM |
BME |
:class:`SemiMonthEnd` | SM |
SME |
:class:`CustomBusinessMonthEnd` | CBM |
CBME |
:class:`QuarterEnd` | Q |
QE |
:class:`BQuarterEnd` | BQ |
BQE |
:class:`YearEnd` | Y |
YE |
:class:`BYearEnd` | BY |
BYE |
For example:
Previous behavior:
In [8]: pd.date_range('2020-01-01', periods=3, freq='Q-NOV')
Out[8]:
DatetimeIndex(['2020-02-29', '2020-05-31', '2020-08-31'],
dtype='datetime64[ns]', freq='Q-NOV')
Future behavior:
.. ipython:: python pd.date_range('2020-01-01', periods=3, freq='QE-NOV')
Deprecated the automatic downcasting of object dtype results in a number of methods. These would silently change the dtype in a hard to predict manner since the behavior was value dependent. Additionally, pandas is moving away from silent dtype changes (:issue:`54710`, :issue:`54261`).
These methods are:
- :meth:`Series.replace` and :meth:`DataFrame.replace`
- :meth:`DataFrame.fillna`, :meth:`Series.fillna`
- :meth:`DataFrame.ffill`, :meth:`Series.ffill`
- :meth:`DataFrame.bfill`, :meth:`Series.bfill`
Explicitly call :meth:`DataFrame.infer_objects` to replicate the current behavior in the future.
result = result.infer_objects(copy=False)
Set the following option to opt into the future behavior:
In [9]: pd.set_option("future.no_silent_downcasting", True)
- Changed :meth:`Timedelta.resolution_string` to return
h
,min
,s
,ms
,us
, andns
instead ofH
,T
,S
,L
,U
, andN
, for compatibility with respective deprecations in frequency aliases (:issue:`52536`) - Deprecated :func:`pandas.api.types.is_interval` and :func:`pandas.api.types.is_period`, use
isinstance(obj, pd.Interval)
andisinstance(obj, pd.Period)
instead (:issue:`55264`) - Deprecated :func:`pd.core.internals.api.make_block`, use public APIs instead (:issue:`40226`)
- Deprecated :func:`read_gbq` and :meth:`DataFrame.to_gbq`. Use
pandas_gbq.read_gbq
andpandas_gbq.to_gbq
instead https://pandas-gbq.readthedocs.io/en/latest/api.html (:issue:`55525`) - Deprecated :meth:`.DataFrameGroupBy.fillna` and :meth:`.SeriesGroupBy.fillna`; use :meth:`.DataFrameGroupBy.ffill`, :meth:`.DataFrameGroupBy.bfill` for forward and backward filling or :meth:`.DataFrame.fillna` to fill with a single value (or the Series equivalents) (:issue:`55718`)
- Deprecated :meth:`Index.format`, use
index.astype(str)
orindex.map(formatter)
instead (:issue:`55413`) - Deprecated :meth:`Series.ravel`, the underlying array is already 1D, so ravel is not necessary (:issue:`52511`)
- Deprecated :meth:`Series.view`, use :meth:`Series.astype` instead to change the dtype (:issue:`20251`)
- Deprecated
core.internals
membersBlock
,ExtensionBlock
, andDatetimeTZBlock
, use public APIs instead (:issue:`55139`) - Deprecated
year
,month
,quarter
,day
,hour
,minute
, andsecond
keywords in the :class:`PeriodIndex` constructor, use :meth:`PeriodIndex.from_fields` instead (:issue:`55960`) - Deprecated accepting a type as an argument in :meth:`Index.view`, call without any arguments instead (:issue:`55709`)
- Deprecated allowing non-integer
periods
argument in :func:`date_range`, :func:`timedelta_range`, :func:`period_range`, and :func:`interval_range` (:issue:`56036`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_clipboard`. (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_csv` except
path_or_buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_dict`. (:issue:`54229`)
- Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_excel` except
excel_writer
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_gbq` except
destination_table
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_hdf` except
path_or_buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_html` except
buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_json` except
path_or_buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_latex` except
buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_markdown` except
buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_parquet` except
path
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_pickle` except
path
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_string` except
buf
. (:issue:`54229`) - Deprecated allowing non-keyword arguments in :meth:`DataFrame.to_xml` except
path_or_buffer
. (:issue:`54229`) - Deprecated allowing passing :class:`BlockManager` objects to :class:`DataFrame` or :class:`SingleBlockManager` objects to :class:`Series` (:issue:`52419`)
- Deprecated behavior of :meth:`Index.insert` with an object-dtype index silently performing type inference on the result, explicitly call
result.infer_objects(copy=False)
for the old behavior instead (:issue:`51363`) - Deprecated casting non-datetimelike values (mainly strings) in :meth:`Series.isin` and :meth:`Index.isin` with
datetime64
,timedelta64
, and :class:`PeriodDtype` dtypes (:issue:`53111`) - Deprecated downcasting behavior in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, :meth:`DataFrame.mask`, :meth:`Series.clip`, :meth:`DataFrame.clip`; in a future version these will not infer object-dtype columns to non-object dtype, or all-round floats to integer dtype. Call
result.infer_objects(copy=False)
on the result for object inference, or explicitly cast floats to ints. To opt in to the future version, usepd.set_option("future.no_silent_downcasting", True)
(:issue:`53656`) - Deprecated dtype inference in :class:`Index`, :class:`Series` and :class:`DataFrame` constructors when giving a pandas input, call
.infer_objects
on the input to keep the current behavior (:issue:`56012`) - Deprecated dtype inference when setting a :class:`Index` into a :class:`DataFrame`, cast explicitly instead (:issue:`56102`)
- Deprecated including the groups in computations when using :meth:`.DataFrameGroupBy.apply` and :meth:`.DataFrameGroupBy.resample`; pass
include_groups=False
to exclude the groups (:issue:`7155`) - Deprecated indexing an :class:`Index` with a boolean indexer of length zero (:issue:`55820`)
- Deprecated not passing a tuple to :class:`.DataFrameGroupBy.get_group` or :class:`.SeriesGroupBy.get_group` when grouping by a length-1 list-like (:issue:`25971`)
- Deprecated string
AS
denoting frequency in :class:`YearBegin` and stringsAS-DEC
,AS-JAN
, etc. denoting annual frequencies with various fiscal year starts (:issue:`54275`) - Deprecated string
A
denoting frequency in :class:`YearEnd` and stringsA-DEC
,A-JAN
, etc. denoting annual frequencies with various fiscal year ends (:issue:`54275`) - Deprecated string
BAS
denoting frequency in :class:`BYearBegin` and stringsBAS-DEC
,BAS-JAN
, etc. denoting annual frequencies with various fiscal year starts (:issue:`54275`) - Deprecated string
BA
denoting frequency in :class:`BYearEnd` and stringsBA-DEC
,BA-JAN
, etc. denoting annual frequencies with various fiscal year ends (:issue:`54275`) - Deprecated strings
H
,BH
, andCBH
denoting frequencies in :class:`Hour`, :class:`BusinessHour`, :class:`CustomBusinessHour` (:issue:`52536`) - Deprecated strings
H
,S
,U
, andN
denoting units in :func:`to_timedelta` (:issue:`52536`) - Deprecated strings
H
,T
,S
,L
,U
, andN
denoting units in :class:`Timedelta` (:issue:`52536`) - Deprecated strings
T
,S
,L
,U
, andN
denoting frequencies in :class:`Minute`, :class:`Second`, :class:`Milli`, :class:`Micro`, :class:`Nano` (:issue:`52536`) - Deprecated the :class:`.BaseGrouper` attributes
group_keys_seq
andreconstructed_codes
; these will be removed in a future version of pandas (:issue:`56148`) - Deprecated the :class:`.Grouping` attributes
group_index
,result_index
, andgroup_arraylike
; these will be removed in a future version of pandas (:issue:`56148`) - Deprecated the
errors="ignore"
option in :func:`to_datetime`, :func:`to_timedelta`, and :func:`to_numeric`; explicitly catch exceptions instead (:issue:`54467`) - Deprecated the
fastpath
keyword in the :class:`Series` constructor (:issue:`20110`) - Deprecated the
kind
keyword in :meth:`Series.resample` and :meth:`DataFrame.resample`, explicitly cast the object'sindex
instead (:issue:`55895`) - Deprecated the
ordinal
keyword in :class:`PeriodIndex`, use :meth:`PeriodIndex.from_ordinals` instead (:issue:`55960`) - Deprecated the
unit
keyword in :class:`TimedeltaIndex` construction, use :func:`to_timedelta` instead (:issue:`55499`) - Deprecated the behavior of :meth:`DataFrame.replace` and :meth:`Series.replace` with :class:`CategoricalDtype`; in a future version replace will change the values while preserving the categories. To change the categories, use
ser.cat.rename_categories
instead (:issue:`55147`) - Deprecated the behavior of :meth:`Series.value_counts` and :meth:`Index.value_counts` with object dtype; in a future version these will not perform dtype inference on the resulting :class:`Index`, do
result.index = result.index.infer_objects()
to retain the old behavior (:issue:`56161`) - Deprecated the default of
observed=False
in :meth:`DataFrame.pivot_table`; will beTrue
in a future version (:issue:`56236`) - Deprecated the extension test classes
BaseNoReduceTests
,BaseBooleanReduceTests
, andBaseNumericReduceTests
, useBaseReduceTests
instead (:issue:`54663`) - Deprecated the option
mode.data_manager
and theArrayManager
; only theBlockManager
will be available in future versions (:issue:`55043`) - Deprecated the previous implementation of :class:`DataFrame.stack`; specify
future_stack=True
to adopt the future version (:issue:`53515`)
- Performance improvement in :func:`.testing.assert_frame_equal` and :func:`.testing.assert_series_equal` (:issue:`55949`, :issue:`55971`)
- Performance improvement in :func:`concat` with
axis=1
and objects with unaligned indexes (:issue:`55084`) - Performance improvement in :func:`get_dummies` (:issue:`56089`)
- Performance improvement in :func:`merge_asof` when
by
is notNone
(:issue:`55580`, :issue:`55678`) - Performance improvement in :func:`read_stata` for files with many variables (:issue:`55515`)
- Performance improvement in :meth:`DataFrame.groupby` when aggregating pyarrow timestamp and duration dtypes (:issue:`55031`)
- Performance improvement in :meth:`DataFrame.join` when joining on unordered categorical indexes (:issue:`56345`)
- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` when indexing with a :class:`MultiIndex` (:issue:`56062`)
- Performance improvement in :meth:`DataFrame.sort_index` and :meth:`Series.sort_index` when indexed by a :class:`MultiIndex` (:issue:`54835`)
- Performance improvement in :meth:`DataFrame.to_dict` on converting DataFrame to dictionary (:issue:`50990`)
- Performance improvement in :meth:`Index.difference` (:issue:`55108`)
- Performance improvement in :meth:`Index.sort_values` when index is already sorted (:issue:`56128`)
- Performance improvement in :meth:`MultiIndex.get_indexer` when
method
is notNone
(:issue:`55839`) - Performance improvement in :meth:`Series.duplicated` for pyarrow dtypes (:issue:`55255`)
- Performance improvement in :meth:`Series.str.get_dummies` when dtype is
"string[pyarrow]"
or"string[pyarrow_numpy]"
(:issue:`56110`) - Performance improvement in :meth:`Series.str` methods (:issue:`55736`)
- Performance improvement in :meth:`Series.value_counts` and :meth:`Series.mode` for masked dtypes (:issue:`54984`, :issue:`55340`)
- Performance improvement in :meth:`.DataFrameGroupBy.nunique` and :meth:`.SeriesGroupBy.nunique` (:issue:`55972`)
- Performance improvement in :meth:`.SeriesGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.DataFrameGroupBy.idxmin` (:issue:`54234`)
- Performance improvement when indexing into a non-unique index (:issue:`55816`)
- Performance improvement when indexing with more than 4 keys (:issue:`54550`)
- Performance improvement when localizing time to UTC (:issue:`55241`)
- :meth:`Categorical.isin` raising
InvalidIndexError
for categorical containing overlapping :class:`Interval` values (:issue:`34974`) - Bug in :meth:`CategoricalDtype.__eq__` returning false for unordered categorical data with mixed types (:issue:`55468`)
- Bug in :class:`DatetimeIndex` construction when passing both a
tz
and eitherdayfirst
oryearfirst
ignoring dayfirst/yearfirst (:issue:`55813`) - Bug in :class:`DatetimeIndex` when passing an object-dtype ndarray of float objects and a
tz
incorrectly localizing the result (:issue:`55780`) - Bug in :func:`Series.isin` with :class:`DatetimeTZDtype` dtype and comparison values that are all
NaT
incorrectly returning all-False
even if the series containsNaT
entries (:issue:`56427`) - Bug in :func:`concat` raising
AttributeError
when concatenating all-NA DataFrame with :class:`DatetimeTZDtype` dtype DataFrame. (:issue:`52093`) - Bug in :func:`testing.assert_extension_array_equal` that could use the wrong unit when comparing resolutions (:issue:`55730`)
- Bug in :func:`to_datetime` and :class:`DatetimeIndex` when passing a list of mixed-string-and-numeric types incorrectly raising (:issue:`55780`)
- Bug in :func:`to_datetime` and :class:`DatetimeIndex` when passing mixed-type objects with a mix of timezones or mix of timezone-awareness failing to raise
ValueError
(:issue:`55693`) - Bug in :meth:`DatetimeIndex.shift` with non-nanosecond resolution incorrectly returning with nanosecond resolution (:issue:`56117`)
- Bug in :meth:`DatetimeIndex.union` returning object dtype for tz-aware indexes with the same timezone but different units (:issue:`55238`)
- Bug in :meth:`Index.is_monotonic_increasing` and :meth:`Index.is_monotonic_decreasing` always caching :meth:`Index.is_unique` as
True
when first value in index isNaT
(:issue:`55755`) - Bug in :meth:`Index.view` to a datetime64 dtype with non-supported resolution incorrectly raising (:issue:`55710`)
- Bug in :meth:`Series.dt.round` with non-nanosecond resolution and
NaT
entries incorrectly raisingOverflowError
(:issue:`56158`) - Bug in :meth:`Series.fillna` with non-nanosecond resolution dtypes and higher-resolution vector values returning incorrect (internally-corrupted) results (:issue:`56410`)
- Bug in :meth:`Tick.delta` with very large ticks raising
OverflowError
instead ofOutOfBoundsTimedelta
(:issue:`55503`) - Bug in :meth:`Timestamp.unit` being inferred incorrectly from an ISO8601 format string with minute or hour resolution and a timezone offset (:issue:`56208`)
- Bug in
.astype
converting from a higher-resolutiondatetime64
dtype to a lower-resolutiondatetime64
dtype (e.g.datetime64[us]->datetim64[ms]
) silently overflowing with values near the lower implementation bound (:issue:`55979`) - Bug in adding or subtracting a :class:`Week` offset to a
datetime64
:class:`Series`, :class:`Index`, or :class:`DataFrame` column with non-nanosecond resolution returning incorrect results (:issue:`55583`) - Bug in addition or subtraction of :class:`BusinessDay` offset with
offset
attribute to non-nanosecond :class:`Index`, :class:`Series`, or :class:`DataFrame` column giving incorrect results (:issue:`55608`) - Bug in addition or subtraction of :class:`DateOffset` objects with microsecond components to
datetime64
:class:`Index`, :class:`Series`, or :class:`DataFrame` columns with non-nanosecond resolution (:issue:`55595`) - Bug in addition or subtraction of very large :class:`Tick` objects with :class:`Timestamp` or :class:`Timedelta` objects raising
OverflowError
instead ofOutOfBoundsTimedelta
(:issue:`55503`) - Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond :class:`DatetimeTZDtype` and inputs that would be out of bounds with nanosecond resolution incorrectly raising
OutOfBoundsDatetime
(:issue:`54620`) - Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond
datetime64
(or :class:`DatetimeTZDtype`) from mixed-numeric inputs treating those as nanoseconds instead of as multiples of the dtype's unit (which would happen with non-mixed numeric inputs) (:issue:`56004`) - Bug in creating a :class:`Index`, :class:`Series`, or :class:`DataFrame` with a non-nanosecond
datetime64
dtype and inputs that would be out of bounds for adatetime64[ns]
incorrectly raisingOutOfBoundsDatetime
(:issue:`55756`) - Bug in parsing datetime strings with nanosecond resolution with non-ISO8601 formats incorrectly truncating sub-microsecond components (:issue:`56051`)
- Bug in parsing datetime strings with sub-second resolution and trailing zeros incorrectly inferring second or millisecond resolution (:issue:`55737`)
- Bug in the results of :func:`to_datetime` with an floating-dtype argument with
unit
not matching the pointwise results of :class:`Timestamp` (:issue:`56037`)
- Bug in :class:`Timedelta` construction raising
OverflowError
instead ofOutOfBoundsTimedelta
(:issue:`55503`) - Bug in rendering (
__repr__
) of :class:`TimedeltaIndex` and :class:`Series` with timedelta64 values with non-nanosecond resolution entries that are all multiples of 24 hours failing to use the compact representation used in the nanosecond cases (:issue:`55405`)
- Bug in :class:`AbstractHolidayCalendar` where timezone data was not propagated when computing holiday observances (:issue:`54580`)
- Bug in :class:`Timestamp` construction with an ambiguous value and a
pytz
timezone failing to raisepytz.AmbiguousTimeError
(:issue:`55657`) - Bug in :meth:`Timestamp.tz_localize` with
nonexistent="shift_forward
around UTC+0 during DST (:issue:`51501`)
- Bug in :func:`read_csv` with
engine="pyarrow"
causing rounding errors for large integers (:issue:`52505`) - Bug in :meth:`Series.pow` not filling missing values correctly (:issue:`55512`)
- Bug in :meth:`DataFrame.astype` when called with
str
on unpickled array - the array might change in-place (:issue:`54654`) - Bug in :meth:`DataFrame.astype` where
errors="ignore"
had no effect for extension types (:issue:`54654`) - Bug in :meth:`Series.convert_dtypes` not converting all NA column to
null[pyarrow]
(:issue:`55346`)
- Bug in :func:`pandas.api.types.is_string_dtype` while checking object array with no elements is of the string dtype (:issue:`54661`)
- Bug in :meth:`DataFrame.apply` failing when
engine="numba"
and columns or index haveStringDtype
(:issue:`56189`) - Bug in :meth:`DataFrame.reindex` not matching :class:`Index` with
string[pyarrow_numpy]
dtype (:issue:`56106`) - Bug in :meth:`Index.str.cat` always casting result to object dtype (:issue:`56157`)
- Bug in :meth:`Series.__mul__` for :class:`ArrowDtype` with
pyarrow.string
dtype andstring[pyarrow]
for the pyarrow backend (:issue:`51970`) - Bug in :meth:`Series.str.find` when
start < 0
for :class:`ArrowDtype` withpyarrow.string
(:issue:`56411`) - Bug in :meth:`Series.str.replace` when
n < 0
for :class:`ArrowDtype` withpyarrow.string
(:issue:`56404`) - Bug in :meth:`Series.str.startswith` and :meth:`Series.str.endswith` with arguments of type
tuple[str, ...]
forstring[pyarrow]
(:issue:`54942`)
- Bug in :class:`Interval`
__repr__
not displaying UTC offsets for :class:`Timestamp` bounds. Additionally the hour, minute and second components will now be shown. (:issue:`55015`) - Bug in :meth:`IntervalIndex.factorize` and :meth:`Series.factorize` with :class:`IntervalDtype` with datetime64 or timedelta64 intervals not preserving non-nanosecond units (:issue:`56099`)
- Bug in :meth:`IntervalIndex.from_arrays` when passed
datetime64
ortimedelta64
arrays with mismatched resolutions constructing an invalidIntervalArray
object (:issue:`55714`) - Bug in :meth:`IntervalIndex.get_indexer` with datetime or timedelta intervals incorrectly matching on integer targets (:issue:`47772`)
- Bug in :meth:`IntervalIndex.get_indexer` with timezone-aware datetime intervals incorrectly matching on a sequence of timezone-naive targets (:issue:`47772`)
- Bug in setting values on a :class:`Series` with an :class:`IntervalIndex` using a slice incorrectly raising (:issue:`54722`)
- Bug in :meth:`DataFrame.loc` when setting :class:`Series` with extension dtype into NumPy dtype (:issue:`55604`)
- Bug in :meth:`Index.difference` not returning a unique set of values when
other
is empty orother
is considered non-comparable (:issue:`55113`) - Bug in setting :class:`Categorical` values into a :class:`DataFrame` with numpy dtypes raising
RecursionError
(:issue:`52927`)
- Bug in :meth:`DataFrame.update` wasn't updating in-place for tz-aware datetime64 dtypes (:issue:`56227`)
- Bug in :meth:`MultiIndex.get_indexer` not raising
ValueError
whenmethod
provided and index is non-monotonic (:issue:`53452`)
- Bug in :func:`read_csv` where
engine="python"
did not respectchunksize
arg whenskiprows
was specified. (:issue:`56323`) - Bug in :func:`read_csv` where
engine="python"
was causing aTypeError
when a callableskiprows
and a chunk size was specified. (:issue:`55677`) - Bug in :func:`read_csv` where
on_bad_lines="warn"
would write tostderr
instead of raise a Python warning. This now yields a :class:`.errors.ParserWarning` (:issue:`54296`) - Bug in :func:`read_csv` with
engine="pyarrow"
wherequotechar
was ignored (:issue:`52266`) - Bug in :func:`read_csv` with
engine="pyarrow"
whereusecols
wasn't working with a csv with no headers (:issue:`54459`) - Bug in :func:`read_excel`, with
engine="xlrd"
(xls
files) erroring when file contains NaNs/Infs (:issue:`54564`) - Bug in :func:`read_json` not handling dtype conversion properly if
infer_string
is set (:issue:`56195`) - Bug in :meth:`DataFrame.to_excel`, with
OdsWriter
(ods
files) writing boolean/string value (:issue:`54994`) - Bug in :meth:`DataFrame.to_hdf` and :func:`read_hdf` with
datetime64
dtypes with non-nanosecond resolution failing to round-trip correctly (:issue:`55622`) - Bug in :meth:`~pandas.read_excel` with
engine="odf"
(ods
files) when string contains annotation (:issue:`55200`) - Bug in :meth:`~pandas.read_excel` with an ODS file without cached formatted cell for float values (:issue:`55219`)
- Bug where :meth:`DataFrame.to_json` would raise an
OverflowError
instead of aTypeError
with unsupported NumPy types (:issue:`55403`)
- Bug in :class:`PeriodIndex` construction when more than one of
data
,ordinal
and**fields
are passed failing to raiseValueError
(:issue:`55961`) - Bug in :class:`Period` addition silently wrapping around instead of raising
OverflowError
(:issue:`55503`) - Bug in casting from :class:`PeriodDtype` with
astype
todatetime64
or :class:`DatetimeTZDtype` with non-nanosecond unit incorrectly returning with nanosecond unit (:issue:`55958`)
- Bug in :meth:`DataFrame.plot.box` with
vert=False
and a matplotlibAxes
created withsharey=True
(:issue:`54941`) - Bug in :meth:`DataFrame.plot.scatter` discaring string columns (:issue:`56142`)
- Bug in :meth:`Series.plot` when reusing an
ax
object failing to raise when ahow
keyword is passed (:issue:`55953`)
- Bug in :class:`.Rolling` where duplicate datetimelike indexes are treated as consecutive rather than equal with
closed='left'
andclosed='neither'
(:issue:`20712`) - Bug in :meth:`.DataFrameGroupBy.idxmin`, :meth:`.DataFrameGroupBy.idxmax`, :meth:`.SeriesGroupBy.idxmin`, and :meth:`.SeriesGroupBy.idxmax` would not retain :class:`.Categorical` dtype when the index was a :class:`.CategoricalIndex` that contained NA values (:issue:`54234`)
- Bug in :meth:`.DataFrameGroupBy.transform` and :meth:`.SeriesGroupBy.transform` when
observed=False
andf="idxmin"
orf="idxmax"
would incorrectly raise on unobserved categories (:issue:`54234`) - Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_count` could result in incorrect sorting if the columns of the DataFrame or name of the Series are integers (:issue:`55951`)
- Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_count` would not respect
sort=False
in :meth:`DataFrame.groupby` and :meth:`Series.groupby` (:issue:`55951`) - Bug in :meth:`.DataFrameGroupBy.value_counts` and :meth:`.SeriesGroupBy.value_count` would sort by proportions rather than frequencies when
sort=True
andnormalize=True
(:issue:`55951`) - Bug in :meth:`DataFrame.asfreq` and :meth:`Series.asfreq` with a :class:`DatetimeIndex` with non-nanosecond resolution incorrectly converting to nanosecond resolution (:issue:`55958`)
- Bug in :meth:`DataFrame.ewm` when passed
times
with non-nanoseconddatetime64
or :class:`DatetimeTZDtype` dtype (:issue:`56262`) - Bug in :meth:`DataFrame.resample` not respecting
closed
andlabel
arguments for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55282`) - Bug in :meth:`DataFrame.resample` when resampling on a :class:`ArrowDtype` of
pyarrow.timestamp
orpyarrow.duration
type (:issue:`55989`) - Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.BusinessDay` (:issue:`55281`)
- Bug in :meth:`DataFrame.resample` where bin edges were not correct for :class:`~pandas.tseries.offsets.MonthBegin` (:issue:`55271`)
- Bug in :func:`concat` ignoring
sort
parameter when passed :class:`DatetimeIndex` indexes (:issue:`54769`) - Bug in :func:`concat` renaming :class:`Series` when
ignore_index=False
(:issue:`15047`) - Bug in :func:`merge_asof` raising
TypeError
whenby
dtype is notobject
,int64
, oruint64
(:issue:`22794`) - Bug in :func:`merge_asof` raising incorrect error for string dtype (:issue:`56444`)
- Bug in :func:`merge` returning columns in incorrect order when left and/or right is empty (:issue:`51929`)
- Bug in :meth:`DataFrame.melt` where an exception was raised if
var_name
was not a string (:issue:`55948`) - Bug in :meth:`DataFrame.melt` where it would not preserve the datetime (:issue:`55254`)
- Bug in :meth:`DataFrame.pivot_table` where the row margin is incorrect when the columns have numeric names (:issue:`26568`)
- Bug in :meth:`SparseArray.take` when using a different fill value than the array's fill value (:issue:`55181`)
- Bug in :func:`DataFrame.describe` when formatting percentiles in the resulting percentile 99.999% is rounded to 100% (:issue:`55765`)
- Bug in :func:`cut` and :func:`qcut` with
datetime64
dtype values with non-nanosecond units incorrectly returning nanosecond-unit bins (:issue:`56101`) - Bug in :func:`cut` incorrectly allowing cutting of timezone-aware datetimes with timezone-naive bins (:issue:`54964`)
- Bug in :func:`infer_freq` and :meth:`DatetimeIndex.inferred_freq` with weekly frequencies and non-nanosecond resolutions (:issue:`55609`)
- Bug in :meth:`DataFrame.apply` where passing
raw=True
ignoredargs
passed to the applied function (:issue:`55009`) - Bug in :meth:`DataFrame.from_dict` which would always sort the rows of the created :class:`DataFrame`. (:issue:`55683`)
- Bug in rendering
inf
values inside a a :class:`DataFrame` with theuse_inf_as_na
option enabled (:issue:`55483`) - Bug in rendering a :class:`Series` with a :class:`MultiIndex` when one of the index level's names is 0 not having that name displayed (:issue:`55415`)
- Bug in the error message when assigning an empty dataframe to a column (:issue:`55956`)