Skip to content

Latest commit

 

History

History
395 lines (298 loc) · 14.5 KB

v1.3.0.rst

File metadata and controls

395 lines (298 loc) · 14.5 KB

What's new in 1.3.0 (??)

These are the changes in pandas 1.3.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Enhancements

Custom HTTP(s) headers when reading csv or json files

When reading from a remote URL that is not handled by fsspec (ie. HTTP and HTTPS) the dictionary passed to storage_options will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers (:issue:`36688`). For example:

.. ipython:: python

    headers = {"User-Agent": "pandas"}
    df = pd.read_csv(
        "https://download.bls.gov/pub/time.series/cu/cu.item",
        sep="\t",
        storage_options=headers
    )

:class:`Rolling` and :class:`Expanding` now support a method argument with a 'table' option that performs the windowing operation over an entire :class:`DataFrame`. See ref:window.overview for performance and functional benefits. (:issue:`15095`)

Other enhancements

Notable bug fixes

These are bug fixes that might have notable behavior changes.

Assigning with DataFrame.__setitem__ consistently creates a new array

Assigning values with DataFrame.__setitem__ now consistently assigns a new array, rather than mutating inplace (:issue:`33457`, :issue:`35271`, :issue:`35266`)

Previously, DataFrame.__setitem__ would sometimes operate inplace on the underlying array, and sometimes assign a new array. Fixing this inconsistency can have behavior-changing implications for workloads that relied on inplace mutation. The two most common cases are creating a DataFrame from an array and slicing a DataFrame.

Previous Behavior

The array would be mutated inplace for some dtypes, like NumPy's int64 dtype.

>>> import pandas as pd
>>> import numpy as np
>>> a = np.array([1, 2, 3])
>>> df = pd.DataFrame(a, columns=['a'])
>>> df['a'] = 0
>>> a  # mutated inplace
array([0, 0, 0])

But not others, like :class:`Int64Dtype`.

>>> import pandas as pd
>>> import numpy as np
>>> a = pd.array([1, 2, 3], dtype="Int64")
>>> df = pd.DataFrame(a, columns=['a'])
>>> df['a'] = 0
>>> a  # not mutated
<IntegerArray>
[1, 2, 3]
Length: 3, dtype: Int64

New Behavior

In pandas 1.3.0, DataFrame.__setitem__ consistently sets on a new array rather than mutating the existing array inplace.

For NumPy's int64 dtype

.. ipython:: python

   import pandas as pd
   import numpy as np
   a = np.array([1, 2, 3])
   df = pd.DataFrame(a, columns=['a'])
   df['a'] = 0
   a  # not mutated

For :class:`Int64Dtype`.

.. ipython:: python

   import pandas as pd
   import numpy as np
   a = pd.array([1, 2, 3], dtype="Int64")
   df = pd.DataFrame(a, columns=['a'])
   df['a'] = 0
   a  # not mutated

This also affects cases where a second Series or DataFrame is a view on a first DataFrame.

df = pd.DataFrame({"A": [1, 2, 3]})
df2 = df[['A']]
df['A'] = np.array([0, 0, 0])

Previously, whether df2 was mutated depending on the dtype of the array being assigned to. Now, a new array is consistently assigned, so df2 is not mutated.

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package Minimum Version Required Changed
numpy 1.16.5 X  
pytz 2017.3 X  
python-dateutil 2.7.3 X  
bottleneck 1.2.1    
numexpr 2.6.8    
pytest (dev) 5.0.1    
mypy (dev) 0.782    

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version Changed
beautifulsoup4 4.6.0  
fastparquet 0.3.2  
fsspec 0.7.4  
gcsfs 0.6.0  
lxml 4.3.0  
matplotlib 2.2.3  
numba 0.46.0  
openpyxl 2.6.0  
pyarrow 0.15.0  
pymysql 0.7.11  
pytables 3.5.1  
s3fs 0.4.0  
scipy 1.2.0  
sqlalchemy 1.2.8  
tabulate 0.8.7 X
xarray 0.12.0  
xlrd 1.2.0  
xlsxwriter 1.0.2  
xlwt 1.3.0  
pandas-gbq 0.12.0  

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Other API changes

  • Partially initialized :class:`CategoricalDtype` (i.e. those with categories=None objects will no longer compare as equal to fully initialized dtype objects.

Deprecations

  • Deprecating allowing scalars passed to the :class:`Categorical` constructor (:issue:`38433`)
  • Deprecated allowing subclass-specific keyword arguments in the :class:`Index` constructor, use the specific subclass directly instead (:issue:`14093`,:issue:21311,:issue:22315,:issue:26974)
  • Deprecated astype of datetimelike (timedelta64[ns], datetime64[ns], Datetime64TZDtype, PeriodDtype) to integer dtypes, use values.view(...) instead (:issue:`38544`)

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Period

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Other

Contributors