Skip to content

Latest commit

 

History

History
394 lines (282 loc) · 17.8 KB

v1.0.0.rst

File metadata and controls

394 lines (282 loc) · 17.8 KB

What's new in 1.0.0 (??)

New Deprecation Policy

Starting with Pandas 1.0.0, pandas will adopt a version of SemVer.

Historically, pandas has used a "rolling" deprecation policy, with occasional outright breaking API changes. Where possible, we would deprecate the behavior we'd like to change, giving an option to adopt the new behavior (via a keyword or an alternative method), and issuing a warning for users of the old behavior. Sometimes, a deprecation was not possible, and we would make an outright API breaking change.

We'll continue to introduce deprecations in major and minor releases (e.g. 1.0.0, 1.1.0, ...). Those deprecations will be enforced in the next major release.

Note that behavior changes and API breaking changes are not identical. API breaking changes will only be released in major versions. If we consider a behavior to be a bug, and fixing that bug induces a behavior change, we'll release that change in a minor release. This is a sometimes difficult judgment call that we'll do our best on.

This doesn't mean that pandas' pace of development will slow down. In the 2019 Pandas User Survey, about 95% of the respondents said they considered pandas "stable enough". This indicates there's an appetite for new features, even if it comes at the cost of break API. The difference is that now API breaking changes will be accompanied with a bump in the major version number (e.g. pandas 1.5.1 -> 2.0.0).

See :ref:`policies.version` for more.

Warning

The minimum supported Python version will be bumped to 3.6 in a future release.

{{ header }}

These are the changes in pandas 1.0.0. See :ref:`release` for a full changelog including other versions of pandas.

Enhancements

Dedicated string data type

We've added :class:`StringDtype`, an extension type dedicated to string data. Previously, strings were typically stored in object-dtype NumPy arrays.

Warning

StringDtype and is currently considered experimental. The implementation and parts of the API may change without warning.

The text extension type solves several issues with object-dtype NumPy arrays:

  1. You can accidentally store a mixture of strings and non-strings in an object dtype array. A StringArray can only store strings.
  2. object dtype breaks dtype-specific operations like :meth:`DataFrame.select_dtypes`. There isn't a clear way to select just text while excluding non-text, but still object-dtype columns.
  3. When reading code, the contents of an object dtype array is less clear than string.
.. ipython:: python

   pd.Series(['abc', None, 'def'], dtype=pd.StringDtype())

You can use the alias "string" as well.

.. ipython:: python

   s = pd.Series(['abc', None, 'def'], dtype="string")
   s

The usual string accessor methods work. Where appropriate, the return type of the Series or columns of a DataFrame will also have string dtype.

s.str.upper() s.str.split('b', expand=True).dtypes

We recommend explicitly using the string data type when working with strings. See :ref:`text.types` for more.

Changes to the unique-method

The method :meth:`pandas.unique` now supports the keyword return_inverse, which, if passed, makes the output a tuple where the second component is an ndarray that contains the mapping from the indices of the values to their location in the return unique values.

.. ipython:: python

    idx = pd.Index([1, 0, 0, 1])
    uniques, inverse = pd.unique(idx, return_inverse=True)
    uniques
    inverse
    reconstruct = pd.Index(uniques[inverse])
    reconstruct.equals(idx)

Other enhancements

Build Changes

Pandas has added a pyproject.toml file and will no longer include cythonized files in the source distribution uploaded to PyPI (:issue:`28341`, :issue:`20775`). If you're installing a built distribution (wheel) or via conda, this shouldn't have any effect on you. If you're building pandas from source, you should no longer need to install Cython into your build environment before calling pip install pandas.

Backwards incompatible API changes

pandas 0.25.x

In [1]: pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])
Out[2]:
IntervalArray([(0, 1], (2, 3]],
              closed='right',
              dtype='interval[int64]')

pandas 1.0.0

.. ipython:: python

   pd.arrays.IntervalArray.from_tuples([(0, 1), (2, 3)])


Other API changes

Documentation Improvements

Deprecations

  • Index.set_value has been deprecated. For a given index idx, array arr, value in idx of idx_val and a new value of val, idx.set_value(arr, idx_val, val) is equivalent to arr[idx.get_loc(idx_val)] = val, which should be used instead (:issue:`28621`).

Removed SparseSeries and SparseDataFrame

SparseSeries, SparseDataFrame and the DataFrame.to_sparse method have been removed (:issue:`28425`). We recommend using a Series or DataFrame with sparse values instead. See :ref:`sparse.migration` for help with migrating existing code.

Removal of prior version deprecations/changes

Performance improvements

Bug fixes

Categorical

Datetimelike

Timedelta

Timezones

Numeric

Conversion

Strings

Interval

Indexing

Missing

MultiIndex

  • Constructior for :class:`MultiIndex` verifies that the given sortorder is compatible with the actual lexsort_depth if verify_integrity parameter is True (the default) (:issue:`28735`)

I/O

Plotting

Groupby/resample/rolling

Reshaping

Sparse

ExtensionArray

Other

Contributors