Skip to content

Latest commit

 

History

History
501 lines (369 loc) · 26.4 KB

v1.0.0.rst

File metadata and controls

501 lines (369 loc) · 26.4 KB

What's new in 1.0.0 (??)

Warning

Starting with the 1.x series of releases, pandas only supports Python 3.6.1 and higher.

New Deprecation Policy

Starting with Pandas 1.0.0, pandas will adopt a version of SemVer.

Historically, pandas has used a "rolling" deprecation policy, with occasional outright breaking API changes. Where possible, we would deprecate the behavior we'd like to change, giving an option to adopt the new behavior (via a keyword or an alternative method), and issuing a warning for users of the old behavior. Sometimes, a deprecation was not possible, and we would make an outright API breaking change.

We'll continue to introduce deprecations in major and minor releases (e.g. 1.0.0, 1.1.0, ...). Those deprecations will be enforced in the next major release.

Note that behavior changes and API breaking changes are not identical. API breaking changes will only be released in major versions. If we consider a behavior to be a bug, and fixing that bug induces a behavior change, we'll release that change in a minor release. This is a sometimes difficult judgment call that we'll do our best on.

This doesn't mean that pandas' pace of development will slow down. In the 2019 Pandas User Survey, about 95% of the respondents said they considered pandas "stable enough". This indicates there's an appetite for new features, even if it comes at the cost of break API. The difference is that now API breaking changes will be accompanied with a bump in the major version number (e.g. pandas 1.5.1 -> 2.0.0).

See :ref:`policies.version` for more.

{{ header }}

These are the changes in pandas 1.0.0. See :ref:`release` for a full changelog including other versions of pandas.

Enhancements

Dedicated string data type

We've added :class:`StringDtype`, an extension type dedicated to string data. Previously, strings were typically stored in object-dtype NumPy arrays.

Warning

StringDtype is currently considered experimental. The implementation and parts of the API may change without warning.

The 'string' extension type solves several issues with object-dtype NumPy arrays:

  1. You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.
  2. object dtype breaks dtype-specific operations like :meth:`DataFrame.select_dtypes`. There isn't a clear way to select just text while excluding non-text, but still object-dtype columns.
  3. When reading code, the contents of an object dtype array is less clear than string.
.. ipython:: python

   pd.Series(['abc', None, 'def'], dtype=pd.StringDtype())

You can use the alias "string" as well.

.. ipython:: python

   s = pd.Series(['abc', None, 'def'], dtype="string")
   s

The usual string accessor methods work. Where appropriate, the return type of the Series or columns of a DataFrame will also have string dtype.

.. ipython:: python

   s.str.upper()
   s.str.split('b', expand=True).dtypes

String accessor methods returning integers will return a value with :class:`Int64Dtype`

.. ipython:: python

   s.str.count("a")

We recommend explicitly using the string data type when working with strings. See :ref:`text.types` for more.

Other enhancements

Build Changes

Pandas has added a pyproject.toml file and will no longer include cythonized files in the source distribution uploaded to PyPI (:issue:`28341`, :issue:`20775`). If you're installing a built distribution (wheel) or via conda, this shouldn't have any effect on you. If you're building pandas from source, you should no longer need to install Cython into your build environment before calling pip install pandas.

Backwards incompatible API changes

Avoid using names from MultiIndex.levels

As part of a larger refactor to :class:`MultiIndex` the level names are now stored separately from the levels (:issue:`27242`). We recommend using :attr:`MultiIndex.names` to access the names, and :meth:`Index.set_names` to update the names.

For backwards compatibility, you can still access the names via the levels.

.. ipython:: python

   mi = pd.MultiIndex.from_product([[1, 2], ['a', 'b']], names=['x', 'y'])
   mi.levels[0].name

However, it is no longer possible to update the names of the MultiIndex via the name of the level. The following will silently fail to update the name of the MultiIndex

.. ipython:: python

   mi.levels[0].name = "new name"
   mi.names

To update, use MultiIndex.set_names, which returns a new MultiIndex.

.. ipython:: python

   mi2 = mi.set_names("new name", level=0)
   mi2.names

pandas 0.25.x

In [1]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[2]:
IntervalArray([(0, 1], (2, 3]],
              closed='right',
              dtype='interval[int64]')

pandas 1.0.0

.. ipython:: python

   pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])

Other API changes

Documentation Improvements

Deprecations

Removed SparseSeries and SparseDataFrame

SparseSeries, SparseDataFrame and the DataFrame.to_sparse method have been removed (:issue:`28425`). We recommend using a Series or DataFrame with sparse values instead. See :ref:`sparse.migration` for help with migrating existing code.

Removal of prior version deprecations/changes

Matplotlib unit registration

Previously, pandas would register converters with matplotlib as a side effect of importing pandas (:issue:`18720`). This changed the output of plots made via matplotlib plots after pandas was imported, even if you were using matplotlib directly rather than rather than :meth:`~DataFrame.plot`.

To use pandas formatters with a matplotlib plot, specify

>>> import pandas as pd
>>> pd.options.plotting.matplotlib.register_converters = True

Note that plots created by :meth:`DataFrame.plot` and :meth:`Series.plot` do register the converters automatically. The only behavior change is when plotting a date-like object via matplotlib.pyplot.plot or matplotlib.Axes.plot. See :ref:`plotting.formatters` for more.

Other removals

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

  • Constructior for :class:`MultiIndex` verifies that the given sortorder is compatible with the actual lexsort_depth if verify_integrity parameter is True (the default) (:issue:`28735`)

I/O

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Other

Contributors