These are the changes in pandas 1.4.0. See :ref:`release` for a full changelog including other versions of pandas.
{{ header }}
- :class:`DataFrameGroupBy` operations with
as_index=False
now correctly retainExtensionDtype
dtypes for columns being grouped on (:issue:`41373`) - Add support for assigning values to
by
argument in :meth:`DataFrame.plot.hist` and :meth:`DataFrame.plot.box` (:issue:`15079`) - :meth:`Series.sample`, :meth:`DataFrame.sample`, and :meth:`.GroupBy.sample` now accept a
np.random.Generator
as input torandom_state
. A generator will be more performant, especially withreplace=False
(:issue:`38100`) - Additional options added to :meth:`.Styler.bar` to control alignment and display, with keyword only arguments (:issue:`26070`, :issue:`36419`)
- :meth:`Styler.bar` now validates the input argument
width
andheight
(:issue:`42511`) - :meth:`Series.ewm`, :meth:`DataFrame.ewm`, now support a
method
argument with a'table'
option that performs the windowing operation over an entire :class:`DataFrame`. See :ref:`Window Overview <window.overview>` for performance and functional benefits (:issue:`42273`) - Added
sparse_index
andsparse_columns
keyword arguments to :meth:`.Styler.to_html` (:issue:`41946`) - Added keyword argument
environment
to :meth:`.Styler.to_latex` also allowing a specific "longtable" entry with a separate jinja2 template (:issue:`41866`) - :meth:`.GroupBy.cummin` and :meth:`.GroupBy.cummax` now support the argument
skipna
(:issue:`34047`)
These are bug fixes that might have notable behavior changes.
Some minimum supported versions of dependencies were updated. If installed, we now require:
Package | Minimum Version | Required | Changed |
---|---|---|---|
numpy | 1.18.5 | X | X |
pytz | 2020.1 | X | X |
python-dateutil | 2.8.1 | X | X |
bottleneck | 1.3.1 | X | |
numexpr | 2.7.1 | X | |
pytest (dev) | 6.0 | ||
mypy (dev) | 0.910 | X |
For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.
Package | Minimum Version | Changed |
---|---|---|
beautifulsoup4 | 4.8.2 | X |
fastparquet | 0.4.0 | |
fsspec | 0.7.4 | |
gcsfs | 0.6.0 | |
lxml | 4.5.0 | X |
matplotlib | 3.3.2 | X |
numba | 0.50.1 | X |
openpyxl | 3.0.2 | X |
pyarrow | 0.17.0 | |
pymysql | 0.10.1 | X |
pytables | 3.6.1 | X |
s3fs | 0.4.0 | |
scipy | 1.4.1 | X |
sqlalchemy | 1.3.11 | X |
tabulate | 0.8.7 | |
xarray | 0.15.1 | X |
xlrd | 2.0.1 | X |
xlsxwriter | 1.2.2 | X |
xlwt | 1.3.0 | |
pandas-gbq | 0.14.0 | X |
See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.
- :meth:`Index.get_indexer_for` no longer accepts keyword arguments (other than 'target'); in the past these would be silently ignored if the index was not unique (:issue:`42310`)
- Deprecated :meth:`Index.is_type_compatible` (:issue:`42113`)
- Deprecated
method
argument in :meth:`Index.get_loc`, useindex.get_indexer([label], method=...)
instead (:issue:`42269`) - Deprecated treating integer keys in :meth:`Series.__setitem__` as positional when the index is a :class:`Float64Index` not containing the key, a :class:`IntervalIndex` with no entries containing the key, or a :class:`MultiIndex` with leading :class:`Float64Index` level not containing the key (:issue:`33469`)
- Deprecated treating
numpy.datetime64
objects as UTC times when passed to the :class:`Timestamp` constructor along with a timezone. In a future version, these will be treated as wall-times. To retain the old behavior, useTimestamp(dt64).tz_localize("UTC").tz_convert(tz)
(:issue:`24559`) - Deprecated ignoring missing labels when indexing with a sequence of labels on a level of a MultiIndex (:issue:`42351`)
- Creating an empty Series without a dtype will now raise a more visible
FutureWarning
instead of aDeprecationWarning
(:issue:`30017`)
- Performance improvement in :meth:`.GroupBy.sample`, especially when
weights
argument provided (:issue:`34483`) - Performance improvement in :meth:`.GroupBy.transform` for user-defined functions (:issue:`41598`)
- Performance improvement in constructing :class:`DataFrame` objects (:issue:`42631`)
- Performance improvement in :meth:`GroupBy.shift` when
fill_value
argument is provided (:issue:`26615`) - Performance improvement in :meth:`DataFrame.corr` for
method=pearson
on data without missing values (:issue:`40956`)
- Bug in setting dtype-incompatible values into a :class:`Categorical` (or
Series
orDataFrame
backed byCategorical
) raisingValueError
instead ofTypeError
(:issue:`41919`) - Bug in :meth:`Categorical.searchsorted` when passing a dtype-incompatible value raising
KeyError
instead ofTypeError
(:issue:`41919`) - Bug in :meth:`Series.where` with
CategoricalDtype
when passing a dtype-incompatible value raisingValueError
instead ofTypeError
(:issue:`41919`) - Bug in :meth:`Categorical.fillna` when passing a dtype-incompatible value raising
ValueError
instead ofTypeError
(:issue:`41919`) - Bug in :meth:`Categorical.fillna` with a tuple-like category raising
ValueError
instead ofTypeError
when filling with a non-category tuple (:issue:`41919`)
- Bug in :class:`DataFrame` constructor unnecessarily copying non-datetimelike 2D object arrays (:issue:`39272`)
- Bug in :meth:`DataFrame.rank` raising
ValueError
withobject
columns andmethod="first"
(:issue:`41931`) - Bug in :meth:`DataFrame.rank` treating missing values and extreme values as equal (for example
np.nan
andnp.inf
), causing incorrect results whenna_option="bottom"
orna_option="top
used (:issue:`41931`) - Bug in
numexpr
engine still being used when the optioncompute.use_numexpr
is set toFalse
(:issue:`32556`)
- Bug in :class:`UInt64Index` constructor when passing a list containing both positive integers small enough to cast to int64 and integers too large too hold in int64 (:issue:`42201`)
- Bug in :meth:`DataFrame.truncate` and :meth:`Series.truncate` when the object's Index has a length greater than one but only one unique value (:issue:`42365`)
- Bug in :meth:`Series.loc` when with a :class:`MultiIndex` whose first level contains only
np.nan
values (:issue:`42055`) - Bug in indexing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` when passing a string, the return type depended on whether the index was monotonic (:issue:`24892`)
- Bug in indexing on a :class:`MultiIndex` failing to drop scalar levels when the indexer is a tuple containing a datetime-like string (:issue:`42476`)
- Bug in :meth:`DataFrame.sort_values` and :meth:`Series.sort_values` when passing an ascending value, failed to raise or incorrectly raising
ValueError
(:issue:`41634`) - Bug in updating values of :class:`pandas.Series` using boolean index, created by using :meth:`pandas.DataFrame.pop` (:issue:`42530`)
- Bug in :meth:`Index.get_indexer_non_unique` when index contains multiple
np.nan
(:issue:`35392`)
- Bug in :meth:`MultiIndex.get_loc` where the first level is a :class:`DatetimeIndex` and a string key is passed (:issue:`42465`)
- Bug in :meth:`MultiIndex.reindex` when passing a
level
that corresponds to anExtensionDtype
level (:issue:`42043`) - Bug in :meth:`MultiIndex.get_loc` raising
TypeError
instead ofKeyError
on nested tuple (:issue:`42440`)
- Bug in :func:`read_excel` attempting to read chart sheets from .xlsx files (:issue:`41448`)
- Bug in :func:`json_normalize` where
errors=ignore
could fail to ignore missing values ofmeta
whenrecord_path
has a length greater than one (:issue:`41876`) - Bug in :func:`read_csv` with multi-header input and arguments referencing column names as tuples (:issue:`42446`)
- Fixed bug in :meth:`SeriesGroupBy.apply` where passing an unrecognized string argument failed to raise
TypeError
when the underlyingSeries
is empty (:issue:`42021`) - Bug in :meth:`Series.rolling.apply`, :meth:`DataFrame.rolling.apply`, :meth:`Series.expanding.apply` and :meth:`DataFrame.expanding.apply` with
engine="numba"
where*args
were being cached with the user passed function (:issue:`42287`) - Bug in :meth:`DataFrame.groupby.rolling.var` would calculate the rolling variance only on the first group (:issue:`42442`)
- Bug in :meth:`GroupBy.shift` that would return the grouping columns if
fill_value
was not None (:issue:`41556`)
- Improved error message when creating a :class:`DataFrame` column from a multi-dimensional :class:`numpy.ndarray` (:issue:`42463`)
- :func:`concat` creating :class:`MultiIndex` with duplicate level entries when concatenating a :class:`DataFrame` with duplicates in :class:`Index` and multiple keys (:issue:`42651`)
- Bug in :meth:`pandas.cut` on :class:`Series` with duplicate indices (:issue:`42185`) and non-exact :meth:`pandas.CategoricalIndex` (:issue:`42425`)
- Bug in :meth:`CustomBusinessMonthBegin.__add__` (:meth:`CustomBusinessMonthEnd.__add__`) not applying the extra
offset
parameter when beginning (end) of the target month is already a business day (:issue:`41356`)