Skip to content

Latest commit

 

History

History
630 lines (516 loc) · 40.4 KB

v1.3.0.rst

File metadata and controls

630 lines (516 loc) · 40.4 KB

What's new in 1.3.0 (??)

These are the changes in pandas 1.3.0. See :ref:`release` for a full changelog including other versions of pandas.

{{ header }}

Warning

When reading new Excel 2007+ (.xlsx) files, the default argument engine=None to :func:`~pandas.read_excel` will now result in using the openpyxl engine in all cases when the option :attr:`io.excel.xlsx.reader` is set to "auto". Previously, some cases would use the xlrd engine instead. See :ref:`What's new 1.2.0 <whatsnew_120>` for background on this change.

Enhancements

Custom HTTP(s) headers when reading csv or json files

When reading from a remote URL that is not handled by fsspec (ie. HTTP and HTTPS) the dictionary passed to storage_options will be used to create the headers included in the request. This can be used to control the User-Agent header or send other custom headers (:issue:`36688`). For example:

.. ipython:: python

    headers = {"User-Agent": "pandas"}
    df = pd.read_csv(
        "https://download.bls.gov/pub/time.series/cu/cu.item",
        sep="\t",
        storage_options=headers
    )

Read and write XML documents

We added I/O support to read and render shallow versions of XML documents with :func:`pandas.read_xml` and :meth:`DataFrame.to_xml`. Using lxml as parser, both XPath 1.0 and XSLT 1.0 is available. (:issue:`27554`)

In [1]: xml = """<?xml version='1.0' encoding='utf-8'?>
   ...: <data>
   ...:  <row>
   ...:     <shape>square</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides>4.0</sides>
   ...:  </row>
   ...:  <row>
   ...:     <shape>circle</shape>
   ...:     <degrees>360</degrees>
   ...:     <sides/>
   ...:  </row>
   ...:  <row>
   ...:     <shape>triangle</shape>
   ...:     <degrees>180</degrees>
   ...:     <sides>3.0</sides>
   ...:  </row>
   ...:  </data>"""

In [2]: df = pd.read_xml(xml)
In [3]: df
Out[3]:
      shape  degrees  sides
0    square      360    4.0
1    circle      360    NaN
2  triangle      180    3.0

In [4]: df.to_xml()
Out[4]:
<?xml version='1.0' encoding='utf-8'?>
<data>
  <row>
    <index>0</index>
    <shape>square</shape>
    <degrees>360</degrees>
    <sides>4.0</sides>
  </row>
  <row>
    <index>1</index>
    <shape>circle</shape>
    <degrees>360</degrees>
    <sides/>
  </row>
  <row>
    <index>2</index>
    <shape>triangle</shape>
    <degrees>180</degrees>
    <sides>3.0</sides>
  </row>
</data>

For more, see :ref:`io.xml` in the user guide on IO tools.

Other enhancements

Notable bug fixes

These are bug fixes that might have notable behavior changes.

:meth:`~pandas.DataFrame.combine_first` will now preserve dtypes (:issue:`7509`)

.. ipython:: python

   df1 = pd.DataFrame({"A": [1, 2, 3], "B": [1, 2, 3]}, index=[0, 1, 2])
   df1
   df2 = pd.DataFrame({"B": [4, 5, 6], "C": [1, 2, 3]}, index=[2, 3, 4])
   df2
   combined = df1.combine_first(df2)

pandas 1.2.x

In [1]: combined.dtypes
Out[2]:
A    float64
B    float64
C    float64
dtype: object

pandas 1.3.0

.. ipython:: python

   combined.dtypes


Try operating inplace when setting values with loc and iloc

When setting an entire column using loc or iloc, pandas will try to insert the values into the existing data rather than create an entirely new array.

.. ipython:: python

   df = pd.DataFrame(range(3), columns=["A"], dtype="float64")
   values = df.values
   new = np.array([5, 6, 7], dtype="int64")
   df.loc[[0, 1, 2], "A"] = new

In both the new and old behavior, the data in values is overwritten, but in the old behavior the dtype of df["A"] changed to int64.

pandas 1.2.x

In [1]: df.dtypes
Out[1]:
A    int64
dtype: object
In [2]: np.shares_memory(df["A"].values, new)
Out[2]: False
In [3]: np.shares_memory(df["A"].values, values)
Out[3]: False

In pandas 1.3.0, df continues to share data with values

pandas 1.3.0

.. ipython:: python

   df.dtypes
   np.shares_memory(df["A"], new)
   np.shares_memory(df["A"], values)


Consistent Casting With Setting Into Boolean Series

Setting non-boolean values into a :class:`Series with ``dtype=bool`` consistently cast to dtype=object (:issue:`38709`)

.. ipython:: python

   orig = pd.Series([True, False])
   ser = orig.copy()
   ser.iloc[1] = np.nan
   ser2 = orig.copy()
   ser2.iloc[1] = 2.0

pandas 1.2.x

In [1]: ser
Out [1]:
0    1.0
1    NaN
dtype: float64

In [2]:ser2
Out [2]:
0    True
1     2.0
dtype: object

pandas 1.3.0

.. ipython:: python

   ser
   ser2

Increased minimum versions for dependencies

Some minimum supported versions of dependencies were updated. If installed, we now require:

Package Minimum Version Required Changed
numpy 1.16.5 X  
pytz 2017.3 X  
python-dateutil 2.7.3 X  
bottleneck 1.2.1    
numexpr 2.6.8    
pytest (dev) 5.0.1    
mypy (dev) 0.800   X
setuptools 38.6.0   X

For optional libraries the general recommendation is to use the latest version. The following table lists the lowest version per library that is currently being tested throughout the development of pandas. Optional libraries below the lowest tested version may still work, but are not considered supported.

Package Minimum Version Changed
beautifulsoup4 4.6.0  
fastparquet 0.3.2  
fsspec 0.7.4  
gcsfs 0.6.0  
lxml 4.3.0  
matplotlib 2.2.3  
numba 0.46.0  
openpyxl 3.0.0 X
pyarrow 0.15.0  
pymysql 0.7.11  
pytables 3.5.1  
s3fs 0.4.0  
scipy 1.2.0  
sqlalchemy 1.2.8  
tabulate 0.8.7 X
xarray 0.12.0  
xlrd 1.2.0  
xlsxwriter 1.0.2  
xlwt 1.3.0  
pandas-gbq 0.12.0  

See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more.

Other API changes

Deprecations

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

  • Bug in different tzinfo objects representing UTC not being treated as equivalent (:issue:`39216`)
  • Bug in dateutil.tz.gettz("UTC") not being recognized as equivalent to other UTC-representing tzinfos (:issue:`39276`)

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

I/O

Period

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Other

Contributors