What's new in 1.5.0 (??)

These are the changes in pandas 1.5.0. See :ref:`release` for a full changelog including other versions of pandas.

Enhancements

Styler

New method :meth:`.Styler.to_string` for alternative customisable output methods (:issue:`44502`)

Various bug fixes, see below.

enhancement2

Other enhancements

:meth:`MultiIndex.to_frame` now supports the argument allow_duplicates and raises on duplicate labels if it is missing or False (:issue:`45245`)
:class:`StringArray` now accepts array-likes containing nan-likes (None, np.nan) for the values parameter in its constructor in addition to strings and :attr:`pandas.NA`. (:issue:`40839`)
Improved the rendering of categories in :class:`CategoricalIndex` (:issue:`45218`)
:meth:`to_numeric` now preserves float64 arrays when downcasting would generate values not representable in float32 (:issue:`43693`)
:meth:`Series.reset_index` and :meth:`DataFrame.reset_index` now support the argument allow_duplicates (:issue:`44410`)
:meth:`.GroupBy.min` and :meth:`.GroupBy.max` now supports Numba execution with the engine keyword (:issue:`45428`)

Notable bug fixes

These are bug fixes that might have notable behavior changes.

notable_bug_fix1

notable_bug_fix2

Backwards incompatible API changes

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package	Minimum Version	Required	Changed
mypy (dev)	0.931		X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package	Minimum Version	Changed
		X

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

read_xml now supports `dtype`, `converters`, and `parse_dates`

Similar to other IO methods, :func:`pandas.read_xml` now supports assigning specific dtypes to columns, apply converter methods, and parse dates (:issue:`43567`).

.. ipython:: python

    xml_dates = """<?xml version='1.0' encoding='utf-8'?>
    <data>
      <row>
        <shape>square</shape>
        <degrees>00360</degrees>
        <sides>4.0</sides>
        <date>2020-01-01</date>
       </row>
      <row>
        <shape>circle</shape>
        <degrees>00360</degrees>
        <sides/>
        <date>2021-01-01</date>
      </row>
      <row>
        <shape>triangle</shape>
        <degrees>00180</degrees>
        <sides>3.0</sides>
        <date>2022-01-01</date>
      </row>
    </data>"""

    df = pd.read_xml(
        xml_dates,
        dtype={'sides': 'Int64'},
        converters={'degrees': str},
        parse_dates=['date']
    )
    df
    df.dtypes

read_xml now supports large XML using iterparse

For very large XML files that can range in hundreds of megabytes to gigabytes, :func:`pandas.read_xml` now supports parsing such sizeable files using lxml's iterparse and etree's iterparse which are memory-efficient methods to iterate through XML trees and extract specific elements and attributes without holding entire tree in memory (:issue:`#45442`).

In [1]: df = pd.read_xml(
...      "/path/to/downloaded/enwikisource-latest-pages-articles.xml",
...      iterparse = {"page": ["title", "ns", "id"]})
...  )
df
Out[2]:
                                                     title   ns        id
0                                       Gettysburg Address    0     21450
1                                                Main Page    0     42950
2                            Declaration by United Nations    0      8435
3             Constitution of the United States of America    0      8435
4                     Declaration of Independence (Israel)    0     17858
...                                                    ...  ...       ...
3578760               Page:Black cat 1897 07 v2 n10.pdf/17  104    219649
3578761               Page:Black cat 1897 07 v2 n10.pdf/43  104    219649
3578762               Page:Black cat 1897 07 v2 n10.pdf/44  104    219649
3578763      The History of Tom Jones, a Foundling/Book IX    0  12084291
3578764  Page:Shakespeare of Stratford (1926) Yale.djvu/91  104     21450

[3578765 rows x 3 columns]

Other API changes

Deprecations

In a future version, integer slicing on a :class:`Series` with a :class:`Int64Index` or :class:`RangeIndex` will be treated as label-based, not positional. This will make the behavior consistent with other :meth:`Series.__getitem__` and :meth:`Series.__setitem__` behaviors (:issue:`45162`).

For example:

.. ipython:: python

   ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11])

In the old behavior, ser[2:4] treats the slice as positional:

Old behavior:

In [3]: ser[2:4]
Out[3]:
5    3
7    4
dtype: int64

In a future version, this will be treated as label-based:

Future behavior:

In [4]: ser.loc[2:4]
Out[4]:
2    1
3    2
dtype: int64

To retain the old behavior, use series.iloc[i:j]. To get the future behavior, use series.loc[i:j].

Slicing on a :class:`DataFrame` will not be affected.

Other Deprecations

Deprecated the keyword line_terminator in :meth:`DataFrame.to_csv` and :meth:`Series.to_csv`, use lineterminator instead; this is for consistency with :func:`read_csv` and the standard library 'csv' module (:issue:`9568`)
Deprecated behavior of :meth:`SparseArray.astype`, :meth:`Series.astype`, and :meth:`DataFrame.astype` with :class:`SparseDtype` when passing a non-sparse dtype. In a future version, this will cast to that non-sparse dtype instead of wrapping it in a :class:`SparseDtype` (:issue:`34457`)
Deprecated behavior of :meth:`DatetimeIndex.intersection` and :meth:`DatetimeIndex.symmetric_difference` (union behavior was already deprecated in version 1.3.0) with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`, :issue:`45357`)
Deprecated :meth:`DataFrame.iteritems`, :meth:`Series.iteritems`, :meth:`HDFStore.iteritems` in favor of :meth:`DataFrame.items`, :meth:`Series.items`, :meth:`HDFStore.items` (:issue:`45321`)
Deprecated :meth:`Series.is_monotonic` and :meth:`Index.is_monotonic` in favor of :meth:`Series.is_monotonic_increasing` and :meth:`Index.is_monotonic_increasing` (:issue:`45422`, :issue:`21335`)
Deprecated the __array_wrap__ method of DataFrame and Series, rely on standard numpy ufuncs instead (:issue:`45451`)

Performance improvements

Performance improvement in :meth:`.GroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`)
Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`)
Performance improvement in :meth:`.GroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`)

Bug fixes

Categorical

Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains NaN values incorrectly raising instead of casting to float64 (:issue:`45362`)

Datetimelike

Bug in :meth:`DataFrame.quantile` with datetime-like dtypes and no rows incorrectly returning float64 dtype instead of retaining datetime-like dtype (:issue:`41544`)
Bug in :func:`to_datetime` with sequences of np.str_ objects incorrectly raising (:issue:`32264`)
Bug in :class:`Timestamp` construction when passing datetime components as positional arguments and tzinfo as a keyword argument incorrectly raising (:issue:`31929`)
Bug in :meth:`Index.astype` when casting from object dtype to timedelta64[ns] dtype incorrectly casting np.datetime64("NaT") values to np.timedelta64("NaT") instead of raising (:issue:`45722`)

Timedelta

Timezones

Numeric

Bug in operations with array-likes with dtype="boolean" and :attr:`NA` incorrectly altering the array in-place (:issue:`45421`)
Bug in multiplying a :class:`Series` with IntegerDtype or FloatingDtype by an arraylike with timedelta64[ns] dtype incorrectly raising (:issue:`45622`)

Conversion

Bug in :meth:`DataFrame.astype` not preserving subclasses (:issue:`40810`)
Bug in constructing a :class:`Series` from a float-containing list or a floating-dtype ndarray-like (e.g. dask.Array) and an integer dtype raising instead of casting like we would with an np.ndarray (:issue:`40110`)
Bug in :meth:`Float64Index.astype` to unsigned integer dtype incorrectly casting to np.int64 dtype (:issue:`45309`)
Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`)
Bug in :func:`array` with FloatingDtype and values containing float-castable strings incorrectly raising (:issue:`45424`)

Strings

Interval

Bug in :meth:`IntervalArray.__setitem__` when setting np.nan into an integer-backed array raising ValueError instead of TypeError (:issue:`45484`)

Indexing

Bug in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between frame.at[x, y] vs frame[y].loc[x] (:issue:`22372`)
Bug in :meth:`DataFrame.iloc` where indexing a single row on a :class:`DataFrame` with a single ExtensionDtype column gave a copy instead of a view on the underlying data (:issue:`45241`)
Bug in setting a NA value (None or np.nan) into a :class:`Series` with int-based :class:`IntervalDtype` incorrectly casting to object dtype instead of a float-based :class:`IntervalDtype` (:issue:`45568`)
Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ValueError was raised insead of casting to a common dtype (:issue:`45070`)
Bug in :meth:`Series.loc.__setitem__` and :meth:`Series.loc.__getitem__` not raising when using multiple keys without using a :class:`MultiIndex` (:issue:`13831`)
Bug when setting a value too large for a :class:`Series` dtype failing to coerce to a common type (:issue:`26049`, :issue:`32878`)
Bug in :meth:`loc.__setitem__` treating range keys as positional instead of label-based (:issue:`45479`)
Bug in :meth:`Series.__setitem__` when setting boolean dtype values containing NA incorrectly raising instead of casting to boolean dtype (:issue:`45462`)
Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtpye :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as np.nan (:issue:`44199`)
Bug in :meth:`DataFrame.mask` with inplace=True and ExtensionDtype columns incorrectly raising (:issue:`45577`)
Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`)
Bug in indexing on a :class:`DatetimeIndex` with a np.str_ key incorrectly raising (:issue:`45580`)
Bug in :meth:`CategoricalIndex.get_indexer` when index contains NaN values, resulting in elements that are in target but not present in the index to be mapped to the index of the NaN element, instead of -1 (:issue:`45361`)

Missing

MultiIndex

I/O

Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains -np.inf (:issue:`45350`)
Bug in :meth:`DataFrame.info` where a new line at the end of the output is omitted when called on an empty :class:`DataFrame` (:issue:`45494`)
Bug in :func:`read_csv` not recognizing line break for on_bad_lines="warn" for engine="c" (:issue:`41710`)
Bug in :func:`read_parquet` when engine="pyarrow" which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`)

Period

Plotting

Bug in :meth:`DataFrame.plot.barh` that prevented labeling the x-axis and xlabel updating the y-axis label (:issue:`45144`)
Bug in :meth:`DataFrame.plot.box` that prevented labeling the x-axis (:issue:`45463`)
Bug in :meth:`DataFrame.boxplot` that prevented passing in xlabel and ylabel (:issue:`45463`)
Bug in :meth:`DataFrame.boxplot` that prevented specifying vert=False (:issue:`36918`)

Groupby/resample/rolling

Bug in :meth:`DataFrame.resample` ignoring closed="right" on :class:`TimedeltaIndex` (:issue:`45414`)
Bug in :meth:`.DataFrameGroupBy.transform` fails when the input DataFrame has multiple columns (:issue:`27469`)

Reshaping

Bug in :func:`concat` between a :class:`Series` with integer dtype and another with :class:`CategoricalDtype` with integer categories and containing NaN values casting to object dtype instead of float64 (:issue:`45359`)
Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`)

Sparse

Bug in :meth:`Series.where` and :meth:`DataFrame.where` with SparseDtype failing to retain the array's fill_value (:issue:`45691`)

ExtensionArray

Bug in :meth:`IntegerArray.searchsorted` and :meth:`FloatingArray.searchsorted` returning inconsistent results when acting on np.nan (:issue:`45255`)

Styler

Minor bug when attempting to apply styling functions to an empty DataFrame subset (:issue:`45313`)

Other

Bug in :meth:`Series.asof` and :meth:`DataFrame.asof` incorrectly casting bool-dtype results to float64 dtype (:issue:`16063`)

Files

v1.5.0.rst

Latest commit

History

v1.5.0.rst

File metadata and controls

What's new in 1.5.0 (??)

Enhancements

Styler

enhancement2

Other enhancements

Notable bug fixes

notable_bug_fix1

notable_bug_fix2

Backwards incompatible API changes

Increased minimum versions for dependencies

read_xml now supports dtype, converters, and parse_dates

read_xml now supports large XML using iterparse

Other API changes

Deprecations

Other Deprecations

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Period

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Styler

Other

Contributors

read_xml now supports `dtype`, `converters`, and `parse_dates`