Skip to content

Latest commit

 

History

History
388 lines (294 loc) · 16.1 KB

v1.5.0.rst

File metadata and controls

388 lines (294 loc) · 16.1 KB

What's new in 1.5.0 (??)

These are the changes in pandas 1.5.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Enhancements

Styler

enhancement2

Other enhancements

Notable bug fixes

These are bug fixes that might have notable behavior changes.

notable_bug_fix1

notable_bug_fix2

Backwards incompatible API changes

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package Minimum Version Required Changed
mypy (dev) 0.931   X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version Changed
    X

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

read_xml now supports dtype, converters, and parse_dates

Similar to other IO methods, :func:`pandas.read_xml` now supports assigning specific dtypes to columns, apply converter methods, and parse dates (:issue:`43567`).

.. ipython:: python

    xml_dates = """<?xml version='1.0' encoding='utf-8'?>
    <data>
      <row>
        <shape>square</shape>
        <degrees>00360</degrees>
        <sides>4.0</sides>
        <date>2020-01-01</date>
       </row>
      <row>
        <shape>circle</shape>
        <degrees>00360</degrees>
        <sides/>
        <date>2021-01-01</date>
      </row>
      <row>
        <shape>triangle</shape>
        <degrees>00180</degrees>
        <sides>3.0</sides>
        <date>2022-01-01</date>
      </row>
    </data>"""

    df = pd.read_xml(
        xml_dates,
        dtype={'sides': 'Int64'},
        converters={'degrees': str},
        parse_dates=['date']
    )
    df
    df.dtypes

read_xml now supports large XML using iterparse

For very large XML files that can range in hundreds of megabytes to gigabytes, :func:`pandas.read_xml` now supports parsing such sizeable files using lxml's iterparse and etree's iterparse which are memory-efficient methods to iterate through XML trees and extract specific elements and attributes without holding entire tree in memory (:issue:`#45442`).

In [1]: df = pd.read_xml(
...      "/path/to/downloaded/enwikisource-latest-pages-articles.xml",
...      iterparse = {"page": ["title", "ns", "id"]})
...  )
df
Out[2]:
                                                     title   ns        id
0                                       Gettysburg Address    0     21450
1                                                Main Page    0     42950
2                            Declaration by United Nations    0      8435
3             Constitution of the United States of America    0      8435
4                     Declaration of Independence (Israel)    0     17858
...                                                    ...  ...       ...
3578760               Page:Black cat 1897 07 v2 n10.pdf/17  104    219649
3578761               Page:Black cat 1897 07 v2 n10.pdf/43  104    219649
3578762               Page:Black cat 1897 07 v2 n10.pdf/44  104    219649
3578763      The History of Tom Jones, a Foundling/Book IX    0  12084291
3578764  Page:Shakespeare of Stratford (1926) Yale.djvu/91  104     21450

[3578765 rows x 3 columns]

Other API changes

Deprecations

In a future version, integer slicing on a :class:`Series` with a :class:`Int64Index` or :class:`RangeIndex` will be treated as label-based, not positional. This will make the behavior consistent with other :meth:`Series.__getitem__` and :meth:`Series.__setitem__` behaviors (:issue:`45162`).

For example:

.. ipython:: python

   ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11])

In the old behavior, ser[2:4] treats the slice as positional:

Old behavior:

In [3]: ser[2:4]
Out[3]:
5    3
7    4
dtype: int64

In a future version, this will be treated as label-based:

Future behavior:

In [4]: ser.loc[2:4]
Out[4]:
2    1
3    2
dtype: int64

To retain the old behavior, use series.iloc[i:j]. To get the future behavior, use series.loc[i:j].

Slicing on a :class:`DataFrame` will not be affected.

Other Deprecations

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

  • Bug in operations with array-likes with dtype="boolean" and :attr:`NA` incorrectly altering the array in-place (:issue:`45421`)
  • Bug in multiplying a :class:`Series` with IntegerDtype or FloatingDtype by an arraylike with timedelta64[ns] dtype incorrectly raising (:issue:`45622`)

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Period

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Styler

  • Minor bug when attempting to apply styling functions to an empty DataFrame subset (:issue:`45313`)

Other

Contributors