These are the changes in pandas 3.0.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
- :func:`DataFrame.to_excel` now raises an
UserWarning
when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`) - :func:`read_stata` now returns
datetime64
resolutions better matching those natively stored in the stata format (:issue:`55642`) - :meth:`Styler.set_tooltips` provides alternative method to storing tooltips by using title attribute of td elements. (:issue:`56981`)
- Allow dictionaries to be passed to :meth:`pandas.Series.str.replace` via
pat
parameter (:issue:`51748`) - Support passing a :class:`Series` input to :func:`json_normalize` that retains the :class:`Series` :class:`Index` (:issue:`51452`)
- Users can globally disable any
PerformanceWarning
by setting the optionmode.performance_warnings
toFalse
(:issue:`56920`)
These are bug fixes that might have notable behavior changes.
A number of bugs have been fixed due to improved handling of unobserved groups (:issue:`55738`). All remarks in this section equally impact :class:`.SeriesGroupBy`.
In previous versions of pandas, a single grouping with :meth:`.DataFrameGroupBy.apply` or :meth:`.DataFrameGroupBy.agg` would pass the unobserved groups to the provided function, resulting in 0
below.
.. ipython:: python df = pd.DataFrame( { "key1": pd.Categorical(list("aabb"), categories=list("abc")), "key2": [1, 1, 1, 2], "values": [1, 2, 3, 4], } ) df gb = df.groupby("key1", observed=False) gb[["values"]].apply(lambda x: x.sum())
However this was not the case when using multiple groupings, resulting in NaN
below.
In [1]: gb = df.groupby(["key1", "key2"], observed=False)
In [2]: gb[["values"]].apply(lambda x: x.sum())
Out[2]:
values
key1 key2
a 1 3.0
2 NaN
b 1 3.0
2 4.0
c 1 NaN
2 NaN
Now using multiple groupings will also pass the unobserved groups to the provided function.
.. ipython:: python gb = df.groupby(["key1", "key2"], observed=False) gb[["values"]].apply(lambda x: x.sum())
Similarly:
- In previous versions of pandas the method :meth:`.DataFrameGroupBy.sum` would result in
0
for unobserved groups, but :meth:`.DataFrameGroupBy.prod`, :meth:`.DataFrameGroupBy.all`, and :meth:`.DataFrameGroupBy.any` would all result in NA values. Now these methods result in1
,True
, andFalse
respectively.- :meth:`.DataFrameGroupBy.groups` did not include unobserved groups and now does.
These improvements also fixed certain bugs in groupby:
- :meth:`.DataFrameGroupBy.nunique` would fail when there are multiple groupings, unobserved groups, and
as_index=False
(:issue:`52848`)- :meth:`.DataFrameGroupBy.agg` would fail when there are multiple groupings, unobserved groups, and
as_index=False
(:issue:`36698`)- :meth:`.DataFrameGroupBy.sum` would have incorrect values when there are multiple groupings, unobserved groups, and non-numeric data (:issue:`43891`)
- :meth:`.DataFrameGroupBy.groups` with
sort=False
would sort groups; they now occur in the order they are observed (:issue:`56966`)- :meth:`.DataFrameGroupBy.value_counts` would produce incorrect results when used with some categorical and some non-categorical groupings and
observed=False
(:issue:`56016`)
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
numpy | 1.23.5 | X | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | New Minimum Version |
---|---|
fastparquet | 2023.04.0 |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
- 3rd party
py.path
objects are no longer explicitly supported in IO methods. Use :py:class:`pathlib.Path` objects instead (:issue:`57091`) - :attr:`MultiIndex.codes`, :attr:`MultiIndex.levels`, and :attr:`MultiIndex.names` now returns a
tuple
instead of aFrozenList
(:issue:`53531`) - :func:`read_table`'s
parse_dates
argument defaults toNone
to improve consistency with :func:`read_csv` (:issue:`57476`) - Made
dtype
a required argument in :meth:`ExtensionArray._from_sequence_of_strings` (:issue:`56519`) - Updated :meth:`DataFrame.to_excel` so that the output spreadsheet has no styling. Custom styling can still be done using :meth:`Styler.to_excel` (:issue:`54154`)
- pickle and HDF (
.h5
) files created with Python 2 are no longer explicitly supported (:issue:`57387`) - pickled objects from pandas version less than
1.0.0
are no longer supported (:issue:`57155`)
The copy
keyword argument in the following methods is deprecated and
will be removed in a future version:
- :meth:`DataFrame.truncate` / :meth:`Series.truncate`
- :meth:`DataFrame.tz_convert` / :meth:`Series.tz_convert`
- :meth:`DataFrame.tz_localize` / :meth:`Series.tz_localize`
- :meth:`DataFrame.infer_objects` / :meth:`Series.infer_objects`
- :meth:`DataFrame.align` / :meth:`Series.align`
- :meth:`DataFrame.astype` / :meth:`Series.astype`
- :meth:`DataFrame.reindex` / :meth:`Series.reindex`
- :meth:`DataFrame.reindex_like` / :meth:`Series.reindex_like`
Copy-on-Write utilizes a lazy copy mechanism that defers copying the data until
necessary. Use .copy
to trigger an eager copy. The copy keyword has no effect
starting with 3.0, so it can be safely removed from your code.
- Deprecated :meth:`Timestamp.utcfromtimestamp`, use
Timestamp.fromtimestamp(ts, "UTC")
instead (:issue:`56680`) - Deprecated :meth:`Timestamp.utcnow`, use
Timestamp.now("UTC")
instead (:issue:`56680`) - Deprecated allowing non-keyword arguments in :meth:`Series.to_markdown` except
buf
. (:issue:`57280`) - Deprecated allowing non-keyword arguments in :meth:`Series.to_string` except
buf
. (:issue:`57280`)
- :func:`read_excel`, :func:`read_json`, :func:`read_html`, and :func:`read_xml` no longer accept raw string or byte representation of the data. That type of data must be wrapped in a :py:class:`StringIO` or :py:class:`BytesIO` (:issue:`53767`)
- :meth:`Series.dt.to_pydatetime` now returns a :class:`Series` of :py:class:`datetime.datetime` objects (:issue:`52459`)
- :meth:`SeriesGroupBy.agg` no longer pins the name of the group to the input passed to the provided
func
(:issue:`51703`) - All arguments except
name
in :meth:`Index.rename` are now keyword only (:issue:`56493`) - All arguments except the first
path
-like argument in IO writers are now keyword only (:issue:`54229`) - All arguments in :meth:`Index.sort_values` are now keyword only (:issue:`56493`)
- All arguments in :meth:`Series.to_dict` are now keyword only (:issue:`56493`)
- Changed the default value of
observed
in :meth:`DataFrame.groupby` and :meth:`Series.groupby` toTrue
(:issue:`51811`) - Enforced deprecation disallowing parsing datetimes with mixed time zones unless user passes
utc=True
to :func:`to_datetime` (:issue:`57275`) - Enforced deprecation of
axis=None
acting the same asaxis=0
in the DataFrame reductionssum
,prod
,std
,var
, andsem
, passingaxis=None
will now reduce over both axes; this is particularly the case when doing e.g.numpy.sum(df)
(:issue:`21597`) - Enforced silent-downcasting deprecation for :ref:`all relevant methods <whatsnew_220.silent_downcasting>` (:issue:`54710`)
- In :meth:`DataFrame.stack`, the default value of
future_stack
is nowTrue
; specifyingFalse
will raise aFutureWarning
(:issue:`55448`) - Methods
apply
,agg
, andtransform
will no longer replace NumPy functions (e.g.np.sum
) and built-in functions (e.g.min
) with the equivalent pandas implementation; use string aliases (e.g."sum"
and"min"
) if you desire to use the pandas implementation (:issue:`53974`) - Passing both
freq
andfill_value
in :meth:`DataFrame.shift` and :meth:`Series.shift` and :meth:`.DataFrameGroupBy.shift` now raises aValueError
(:issue:`54818`) - Removed :meth:`DateOffset.is_anchored` and :meth:`offsets.Tick.is_anchored` (:issue:`56594`)
- Removed
DataFrame.applymap
,Styler.applymap
andStyler.applymap_index
(:issue:`52364`) - Removed
DataFrame.bool
andSeries.bool
(:issue:`51756`) - Removed
DataFrame.first
andDataFrame.last
(:issue:`53710`) - Removed
DataFrame.swapaxes
andSeries.swapaxes
(:issue:`51946`) - Removed
DataFrameGroupBy.grouper
andSeriesGroupBy.grouper
(:issue:`56521`) - Removed
DataFrameGroupby.fillna
andSeriesGroupBy.fillna`
(:issue:`55719`) - Removed
Index.format
, use :meth:`Index.astype` withstr
or :meth:`Index.map` with aformatter
function instead (:issue:`55439`) - Removed
Resample.fillna
(:issue:`55719`) - Removed
Series.__int__
andSeries.__float__
. Callint(Series.iloc[0])
orfloat(Series.iloc[0])
instead. (:issue:`51131`) - Removed
Series.ravel
(:issue:`56053`) - Removed
Series.view
(:issue:`56054`) - Removed
StataReader.close
(:issue:`49228`) - Removed
_data
from :class:`DataFrame`, :class:`Series`, :class:`.arrays.ArrowExtensionArray` (:issue:`52003`) - Removed
axis
argument from :meth:`DataFrame.groupby`, :meth:`Series.groupby`, :meth:`DataFrame.rolling`, :meth:`Series.rolling`, :meth:`DataFrame.resample`, and :meth:`Series.resample` (:issue:`51203`) - Removed
axis
argument from all groupby operations (:issue:`50405`) - Removed
convert_dtype
from :meth:`Series.apply` (:issue:`52257`) - Removed
method
,limit
fill_axis
andbroadcast_axis
keywords from :meth:`DataFrame.align` (:issue:`51968`) - Removed
pandas.api.types.is_interval
andpandas.api.types.is_period
, useisinstance(obj, pd.Interval)
andisinstance(obj, pd.Period)
instead (:issue:`55264`) - Removed
pandas.io.sql.execute
(:issue:`50185`) - Removed
pandas.value_counts
, use :meth:`Series.value_counts` instead (:issue:`53493`) - Removed
read_gbq
andDataFrame.to_gbq
. Usepandas_gbq.read_gbq
andpandas_gbq.to_gbq
instead https://pandas-gbq.readthedocs.io/en/latest/api.html (:issue:`55525`) - Removed
use_nullable_dtypes
from :func:`read_parquet` (:issue:`51853`) - Removed
year
,month
,quarter
,day
,hour
,minute
, andsecond
keywords in the :class:`PeriodIndex` constructor, use :meth:`PeriodIndex.from_fields` instead (:issue:`55960`) - Removed deprecated argument
obj
in :meth:`.DataFrameGroupBy.get_group` and :meth:`.SeriesGroupBy.get_group` (:issue:`53545`) - Removed deprecated behavior of :meth:`Series.agg` using :meth:`Series.apply` (:issue:`53325`)
- Removed deprecated keyword
method
on :meth:`Series.fillna`, :meth:`DataFrame.fillna` (:issue:`57760`) - Removed option
mode.use_inf_as_na
, convert inf entries toNaN
before instead (:issue:`51684`) - Removed support for :class:`DataFrame` in :meth:`DataFrame.from_records`(:issue:`51697`)
- Removed support for
errors="ignore"
in :func:`to_datetime`, :func:`to_timedelta` and :func:`to_numeric` (:issue:`55734`) - Removed support for
slice
in :meth:`DataFrame.take` (:issue:`51539`) - Removed the
ArrayManager
(:issue:`55043`) - Removed the
fastpath
argument from the :class:`Series` constructor (:issue:`55466`) - Removed the
is_boolean
,is_integer
,is_floating
,holds_integer
,is_numeric
,is_categorical
,is_object
, andis_interval
attributes of :class:`Index` (:issue:`50042`) - Removed the
ordinal
keyword in :class:`PeriodIndex`, use :meth:`PeriodIndex.from_ordinals` instead (:issue:`55960`) - Removed unused arguments
*args
and**kwargs
in :class:`Resampler` methods (:issue:`50977`) - Unrecognized timezones when parsing strings to datetimes now raises a
ValueError
(:issue:`51477`)
- :meth:`Series.str.extract` returns a :class:`RangeIndex` columns instead of an :class:`Index` column when possible (:issue:`57542`)
- Performance improvement in :class:`DataFrame` when
data
is adict
andcolumns
is specified (:issue:`24368`) - Performance improvement in :meth:`DataFrame.join` for sorted but non-unique indexes (:issue:`56941`)
- Performance improvement in :meth:`DataFrame.join` when left and/or right are non-unique and
how
is"left"
,"right"
, or"inner"
(:issue:`56817`) - Performance improvement in :meth:`DataFrame.join` with
how="left"
orhow="right"
andsort=True
(:issue:`56919`) - Performance improvement in :meth:`DataFrameGroupBy.ffill`, :meth:`DataFrameGroupBy.bfill`, :meth:`SeriesGroupBy.ffill`, and :meth:`SeriesGroupBy.bfill` (:issue:`56902`)
- Performance improvement in :meth:`Index.join` by propagating cached attributes in cases where the result matches one of the inputs (:issue:`57023`)
- Performance improvement in :meth:`Index.take` when
indices
is a full range indexer from zero to length of index (:issue:`56806`) - Performance improvement in :meth:`MultiIndex.equals` for equal length indexes (:issue:`56990`)
- Performance improvement in :meth:`RangeIndex.__getitem__` with a boolean mask returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57588`)
- Performance improvement in :meth:`RangeIndex.append` when appending the same index (:issue:`57252`)
- Performance improvement in :meth:`RangeIndex.join` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57651`)
- Performance improvement in :meth:`RangeIndex.reindex` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57647`)
- Performance improvement in :meth:`RangeIndex.take` returning a :class:`RangeIndex` instead of a :class:`Index` when possible. (:issue:`57445`)
- Performance improvement in
DataFrameGroupBy.__len__
andSeriesGroupBy.__len__
(:issue:`57595`) - Performance improvement in indexing operations for string dtypes (:issue:`56997`)
- Fixed bug in :meth:`DataFrame.join` inconsistently setting result index name (:issue:`55815`)
- Fixed bug in :meth:`DataFrame.to_string` that raised
StopIteration
with nested DataFrames. (:issue:`16098`) - Fixed bug in :meth:`DataFrame.update` bool dtype being converted to object (:issue:`55509`)
- Fixed bug in :meth:`Series.diff` allowing non-integer values for the
periods
argument. (:issue:`56607`)
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
- Bug in
np.matmul
with :class:`Index` inputs raising aTypeError
(:issue:`57079`)
- Bug in :meth:`Series.astype` might modify read-only array inplace when casting to a string dtype (:issue:`57212`)
- Bug in :meth:`Series.reindex` not maintaining
float32
type when areindex
introduces a missing value (:issue:`45857`)
- Bug in :meth:`Series.value_counts` would not respect
sort=False
for series havingstring
dtype (:issue:`55224`)
- Bug in :func:`interval_range` where start and end numeric types were always cast to 64 bit (:issue:`57268`)
- Bug in :meth:`DataFrame.to_excel` when writing empty :class:`DataFrame` with :class:`MultiIndex` on both axes (:issue:`57696`)
- Bug in :meth:`.DataFrameGroupBy.groups` and :meth:`.SeriesGroupby.groups` that would not respect groupby argument
dropna
(:issue:`55919`) - Bug in :meth:`.DataFrameGroupBy.quantile` when
interpolation="nearest"
is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`) - Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed
times
and aggregation functions other than mean (:issue:`51695`)
- Fixed bug in :meth:`api.types.is_datetime64_any_dtype` where a custom :class:`ExtensionDtype` would return
False
for array-likes (:issue:`57055`)
- Bug in :class:`DataFrame` when passing a
dict
with a NA scalar andcolumns
that would always returnnp.nan
(:issue:`57205`) - Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)
- Bug in :meth:`DataFrame.sort_index` when passing
axis="columns"
andignore_index=True
andascending=False
not returning a :class:`RangeIndex` columns (:issue:`57293`) - Bug in :meth:`DataFrame.where` where using a non-bool type array in the function would return a
ValueError
instead of aTypeError
(:issue:`56330`) - Bug in Dataframe Interchange Protocol implementation was returning incorrect results for data buffers' associated dtype, for string and datetime columns (:issue:`54781`)