Skip to content

Latest commit

 

History

History
453 lines (355 loc) · 21.5 KB

v1.3.0.rst

File metadata and controls

453 lines (355 loc) · 21.5 KB

What's new in 1.3.0 (??)

These are the changes in pandas 1.3.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Enhancements

Custom HTTP(s) headers when reading csv or json files

When reading from a remote URL that is not handled by fsspec (ie. HTTP and HTTPS) the dictionary passed to storage_options will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers (:issue:`36688`). For example:

.. ipython:: python

    headers = {"User-Agent": "pandas"}
    df = pd.read_csv(
        "https://download.bls.gov/pub/time.series/cu/cu.item",
        sep="\t",
        storage_options=headers
    )

:class:`Rolling` and :class:`Expanding` now support a method argument with a 'table' option that performs the windowing operation over an entire :class:`DataFrame`. See ref:window.overview for performance and functional benefits. (:issue:`15095`, :issue:`38995`)

Control of index with group_keys in :meth:`DataFrame.resample`

The argument group_keys has been added to the method :meth:`DataFrame.resample`. As with :meth:`DataFrame.groupby`, this argument controls the whether each group is added to the index in the resample when :meth:`.Resampler.apply` is used.

Warning

Not specifying the group_keys argument will retain the previous behavior and emit a warning. In a future version of pandas, not specifying group_keys will default to the same behavior as group_keys=False.

.. ipython:: python

    df = pd.DataFrame(
        {'a': range(6)},
        index=pd.date_range("2021-01-01", periods=6, freq="8H")
    )
    df.resample("D", group_keys=True).apply(lambda x: x)
    df.resample("D", group_keys=False).apply(lambda x: x)

Previously, the resulting index would depend upon the values returned by apply, as seen in the following example.

>>> # pandas 1.2
>>> df.resample("D").apply(lambda x: x)
                     a
2021-01-01 00:00:00  0
2021-01-01 08:00:00  1
2021-01-01 16:00:00  2
2021-01-02 00:00:00  3
2021-01-02 08:00:00  4
2021-01-02 16:00:00  5
>>> df.resample("D").apply(lambda x: x.reset_index())
                           index  a
2021-01-01 0 2021-01-01 00:00:00  0
           1 2021-01-01 08:00:00  1
           2 2021-01-01 16:00:00  2
2021-01-02 0 2021-01-02 00:00:00  3
           1 2021-01-02 08:00:00  4
           2 2021-01-02 16:00:00  5

Other enhancements

Notable bug fixes

These are bug fixes that might have notable behavior changes.

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package Minimum Version Required Changed
numpy 1.16.5 X  
pytz 2017.3 X  
python-dateutil 2.7.3 X  
bottleneck 1.2.1    
numexpr 2.6.8    
pytest (dev) 5.0.1    
mypy (dev) 0.790   X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version Changed
beautifulsoup4 4.6.0  
fastparquet 0.3.2  
fsspec 0.7.4  
gcsfs 0.6.0  
lxml 4.3.0  
matplotlib 2.2.3  
numba 0.46.0  
openpyxl 2.6.0  
pyarrow 0.15.0  
pymysql 0.7.11  
pytables 3.5.1  
s3fs 0.4.0  
scipy 1.2.0  
sqlalchemy 1.2.8  
tabulate 0.8.7 X
xarray 0.12.0  
xlrd 1.2.0  
xlsxwriter 1.0.2  
xlwt 1.3.0  
pandas-gbq 0.12.0  

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Other API changes

  • Partially initialized :class:`CategoricalDtype` (i.e. those with categories=None objects will no longer compare as equal to fully initialized dtype objects.

Deprecations

:meth:`~DataFrame.groupby` no longer ignores group_keys for transform-like apply

If group_keys=True is specified when calling :meth:`~DataFrame.groupby`, functions passed to apply that return like-indexed outputs will have the group keys added to the result index. Previous versions of pandas would add the group keys only when the result from the applied function had a different index than the input. If group_keys is not specified, the group keys will not be added for like-indexed outputs.

Previous behavior:

>>> # pandas 1.2
>>> df = pd.DataFrame({"A": [1, 2, 2], "B": [1, 2, 3]})
>>> df
   A  B
0  1  1
1  2  2
2  2  3
>>> df.groupby("A").apply(lambda x: x.rename(np.exp))  # Different index
            A  B
A
1 1.000000  1  1
2 2.718282  2  2
  7.389056  2  3

>>> df.groupby("A").apply(lambda x: x)  # Same index
   A  B
0  1  1
1  2  2
2  2  3

In this future this behavior will change to always respect group_keys, which defaults to True.

New behavior:

.. ipython:: python

   df = pd.DataFrame({"A": [1, 2, 2], "B": [1, 2, 3]})
   df.groupby("A", group_keys=True).apply(lambda x: x)
   df.groupby("A", group_keys=True).apply(lambda x: x.rename(np.exp))

A warning will be issued if the result would change from pandas 1.2

.. ipython:: python
   :okwarning:

   df.groupby("A").apply(lambda x: x)


Other Deprecations

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Period

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Other

Contributors