diff --git a/doc/source/whatsnew/v0.24.0.rst b/doc/source/whatsnew/v0.24.0.rst index 868c5e280d2c3..3685a24d60e74 100644 --- a/doc/source/whatsnew/v0.24.0.rst +++ b/doc/source/whatsnew/v0.24.0.rst @@ -10,92 +10,16 @@ What's New in 0.24.0 (January XX, 2019) {{ header }} - These are the changes in pandas 0.24.0. See :ref:`release` for a full changelog including other versions of pandas. -.. _whatsnew_0240.enhancements: +Highlights include -New features -~~~~~~~~~~~~ -- :func:`merge` now directly allows merge between objects of type ``DataFrame`` and named ``Series``, without the need to convert the ``Series`` object into a ``DataFrame`` beforehand (:issue:`21220`) -- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`) -- ``FrozenList`` has gained the ``.union()`` and ``.difference()`` methods. This functionality greatly simplifies groupby's that rely on explicitly excluding certain columns. See :ref:`Splitting an object into groups ` for more information (:issue:`15475`, :issue:`15506`). -- :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing - the user to override the engine's default behavior to include or omit the - dataframe's indexes from the resulting Parquet file. (:issue:`20768`) -- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`) -- :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing the user to specify which decimal separator should be used in the output. (:issue:`23614`) -- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`) -- :func:`DataFrame.to_html` now accepts ``render_links`` as an argument, allowing the user to generate HTML with links to any URLs that appear in the DataFrame. - See the :ref:`section on writing HTML ` in the IO docs for example usage. (:issue:`2679`) -- :func:`pandas.read_csv` now supports pandas extension types as an argument to ``dtype``, allowing the user to use pandas extension types when reading CSVs. (:issue:`23228`) -- :meth:`DataFrame.shift` :meth:`Series.shift`, :meth:`ExtensionArray.shift`, :meth:`SparseArray.shift`, :meth:`Period.shift`, :meth:`GroupBy.shift`, :meth:`Categorical.shift`, :meth:`NDFrame.shift` and :meth:`Block.shift` now accept `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`) - -.. _whatsnew_0240.values_api: - -Accessing the values in a Series or Index -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -:attr:`Series.array` and :attr:`Index.array` have been added for extracting the array backing a -``Series`` or ``Index``. (:issue:`19954`, :issue:`23623`) - -.. ipython:: python - - idx = pd.period_range('2000', periods=4) - idx.array - pd.Series(idx).array - -Historically, this would have been done with ``series.values``, but with -``.values`` it was unclear whether the returned value would be the actual array, -some transformation of it, or one of pandas custom arrays (like -``Categorical``). For example, with :class:`PeriodIndex`, ``.values`` generates -a new ndarray of period objects each time. - -.. ipython:: python - - id(idx.values) - id(idx.values) - -If you need an actual NumPy array, use :meth:`Series.to_numpy` or :meth:`Index.to_numpy`. - -.. ipython:: python - - idx.to_numpy() - pd.Series(idx).to_numpy() - -For Series and Indexes backed by normal NumPy arrays, :attr:`Series.array` will return a -new :class:`arrays.PandasArray`, which is a thin (no-copy) wrapper around a -:class:`numpy.ndarray`. :class:`arrays.PandasArray` isn't especially useful on its own, -but it does provide the same interface as any extension array defined in pandas or by -a third-party library. - -.. ipython:: python - - ser = pd.Series([1, 2, 3]) - ser.array - ser.to_numpy() - -We haven't removed or deprecated :attr:`Series.values` or :attr:`DataFrame.values`, but we -highly recommend and using ``.array`` or ``.to_numpy()`` instead. - -See :ref:`Dtypes ` and :ref:`Attributes and Underlying Data ` for more. - -.. _whatsnew_0240.enhancements.extension_array_operators: - -``ExtensionArray`` operator support -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison -operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``: - -1. Define each of the operators on your ``ExtensionArray`` subclass. -2. Use an operator implementation from pandas that depends on operators that are already defined - on the underlying elements (scalars) of the ``ExtensionArray``. - -See the :ref:`ExtensionArray Operator Support -` documentation section for details on both -ways of adding operator support. +* :ref:`Optional Nullable Integer Support ` +* :ref:`New APIs for accessing the array backing a Series or Index ` +* :ref:`A new top-level method for creating arrays ` +* :ref:`Store Interval and Period data in a Series or DataFrame ` +* :ref:`Support for joining on two MultiIndexes ` .. _whatsnew_0240.enhancements.intna: @@ -162,6 +86,57 @@ Reduction and groupby operations such as ``sum`` work. See :ref:`integer_na` for more. + +.. _whatsnew_0240.values_api: + +Accessing the values in a Series or Index +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:attr:`Series.array` and :attr:`Index.array` have been added for extracting the array backing a +``Series`` or ``Index``. (:issue:`19954`, :issue:`23623`) + +.. ipython:: python + + idx = pd.period_range('2000', periods=4) + idx.array + pd.Series(idx).array + +Historically, this would have been done with ``series.values``, but with +``.values`` it was unclear whether the returned value would be the actual array, +some transformation of it, or one of pandas custom arrays (like +``Categorical``). For example, with :class:`PeriodIndex`, ``.values`` generates +a new ndarray of period objects each time. + +.. ipython:: python + + id(idx.values) + id(idx.values) + +If you need an actual NumPy array, use :meth:`Series.to_numpy` or :meth:`Index.to_numpy`. + +.. ipython:: python + + idx.to_numpy() + pd.Series(idx).to_numpy() + +For Series and Indexes backed by normal NumPy arrays, :attr:`Series.array` will return a +new :class:`arrays.PandasArray`, which is a thin (no-copy) wrapper around a +:class:`numpy.ndarray`. :class:`arrays.PandasArray` isn't especially useful on its own, +but it does provide the same interface as any extension array defined in pandas or by +a third-party library. + +.. ipython:: python + + ser = pd.Series([1, 2, 3]) + ser.array + ser.to_numpy() + +We haven't removed or deprecated :attr:`Series.values` or :attr:`DataFrame.values`, but we +highly recommend and using ``.array`` or ``.to_numpy()`` instead. + +See :ref:`Dtypes ` and :ref:`Attributes and Underlying Data ` for more. + + .. _whatsnew_0240.enhancements.array: Array @@ -200,46 +175,6 @@ is a float. pd.array([1, 2, np.nan]) -.. _whatsnew_0240.enhancements.read_html: - -``read_html`` Enhancements -^^^^^^^^^^^^^^^^^^^^^^^^^^ - -:func:`read_html` previously ignored ``colspan`` and ``rowspan`` attributes. -Now it understands them, treating them as sequences of cells with the same -value. (:issue:`17054`) - -.. ipython:: python - - result = pd.read_html(""" - - - - - - - - - - - -
ABC
12
""") - -*Previous Behavior*: - -.. code-block:: ipython - - In [13]: result - Out [13]: - [ A B C - 0 1 2 NaN] - -*New Behavior*: - -.. ipython:: python - - result - .. _whatsnew_0240.enhancements.interval: @@ -286,27 +221,6 @@ from the ``Series``: for more. -New ``Styler.pipe()`` method -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The :class:`~pandas.io.formats.style.Styler` class has gained a -:meth:`~pandas.io.formats.style.Styler.pipe` method. This provides a -convenient way to apply users' predefined styling functions, and can help reduce -"boilerplate" when using DataFrame styling functionality repeatedly within a notebook. (:issue:`23229`) - -.. ipython:: python - - df = pd.DataFrame({'N': [1250, 1500, 1750], 'X': [0.25, 0.35, 0.50]}) - - def format_and_align(styler): - return (styler.format({'N': '{:,}', 'X': '{:.1%}'}) - .set_properties(**{'text-align': 'right'})) - - df.style.pipe(format_and_align).set_caption('Summary of results.') - -Similar methods already exist for other classes in pandas, including :meth:`DataFrame.pipe`, -:meth:`pandas.core.groupby.GroupBy.pipe`, and :meth:`pandas.core.resample.Resampler.pipe`. - - .. _whatsnew_0240.enhancements.join_with_two_multiindexes: Joining with two multi-indexes @@ -343,6 +257,83 @@ For earlier versions this can be done using the following. on=['key'], how='inner').set_index(['key', 'X', 'Y']) +.. _whatsnew_0240.enhancements.extension_array_operators: + +``ExtensionArray`` operator support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A ``Series`` based on an ``ExtensionArray`` now supports arithmetic and comparison +operators (:issue:`19577`). There are two approaches for providing operator support for an ``ExtensionArray``: + +1. Define each of the operators on your ``ExtensionArray`` subclass. +2. Use an operator implementation from pandas that depends on operators that are already defined + on the underlying elements (scalars) of the ``ExtensionArray``. + +See the :ref:`ExtensionArray Operator Support +` documentation section for details on both +ways of adding operator support. + +.. _whatsnew_0240.enhancements.read_html: + +``read_html`` Enhancements +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:func:`read_html` previously ignored ``colspan`` and ``rowspan`` attributes. +Now it understands them, treating them as sequences of cells with the same +value. (:issue:`17054`) + +.. ipython:: python + + result = pd.read_html(""" + + + + + + + + + + + +
ABC
12
""") + +*Previous Behavior*: + +.. code-block:: ipython + + In [13]: result + Out [13]: + [ A B C + 0 1 2 NaN] + +*New Behavior*: + +.. ipython:: python + + result + + +New ``Styler.pipe()`` method +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The :class:`~pandas.io.formats.style.Styler` class has gained a +:meth:`~pandas.io.formats.style.Styler.pipe` method. This provides a +convenient way to apply users' predefined styling functions, and can help reduce +"boilerplate" when using DataFrame styling functionality repeatedly within a notebook. (:issue:`23229`) + +.. ipython:: python + + df = pd.DataFrame({'N': [1250, 1500, 1750], 'X': [0.25, 0.35, 0.50]}) + + def format_and_align(styler): + return (styler.format({'N': '{:,}', 'X': '{:.1%}'}) + .set_properties(**{'text-align': 'right'})) + + df.style.pipe(format_and_align).set_caption('Summary of results.') + +Similar methods already exist for other classes in pandas, including :meth:`DataFrame.pipe`, +:meth:`pandas.core.groupby.GroupBy.pipe`, and :meth:`pandas.core.resample.Resampler.pipe`. + .. _whatsnew_0240.enhancements.rename_axis: Renaming names in a MultiIndex @@ -366,12 +357,24 @@ Example: See the :ref:`Advanced documentation on renaming` for more details. - .. _whatsnew_0240.enhancements.other: Other Enhancements ^^^^^^^^^^^^^^^^^^ +- :func:`merge` now directly allows merge between objects of type ``DataFrame`` and named ``Series``, without the need to convert the ``Series`` object into a ``DataFrame`` beforehand (:issue:`21220`) +- ``ExcelWriter`` now accepts ``mode`` as a keyword argument, enabling append to existing workbooks when using the ``openpyxl`` engine (:issue:`3441`) +- ``FrozenList`` has gained the ``.union()`` and ``.difference()`` methods. This functionality greatly simplifies groupby's that rely on explicitly excluding certain columns. See :ref:`Splitting an object into groups ` for more information (:issue:`15475`, :issue:`15506`). +- :func:`DataFrame.to_parquet` now accepts ``index`` as an argument, allowing + the user to override the engine's default behavior to include or omit the + dataframe's indexes from the resulting Parquet file. (:issue:`20768`) +- :meth:`DataFrame.corr` and :meth:`Series.corr` now accept a callable for generic calculation methods of correlation, e.g. histogram intersection (:issue:`22684`) +- :func:`DataFrame.to_string` now accepts ``decimal`` as an argument, allowing the user to specify which decimal separator should be used in the output. (:issue:`23614`) +- :func:`read_feather` now accepts ``columns`` as an argument, allowing the user to specify which columns should be read. (:issue:`24025`) +- :func:`DataFrame.to_html` now accepts ``render_links`` as an argument, allowing the user to generate HTML with links to any URLs that appear in the DataFrame. + See the :ref:`section on writing HTML ` in the IO docs for example usage. (:issue:`2679`) +- :func:`pandas.read_csv` now supports pandas extension types as an argument to ``dtype``, allowing the user to use pandas extension types when reading CSVs. (:issue:`23228`) +- :meth:`DataFrame.shift` :meth:`Series.shift`, :meth:`ExtensionArray.shift`, :meth:`SparseArray.shift`, :meth:`Period.shift`, :meth:`GroupBy.shift`, :meth:`Categorical.shift`, :meth:`NDFrame.shift` and :meth:`Block.shift` now accept `fill_value` as an argument, allowing the user to specify a value which will be used instead of NA/NaT in the empty periods. (:issue:`15486`) - :func:`to_datetime` now supports the ``%Z`` and ``%z`` directive when passed into ``format`` (:issue:`13486`) - :func:`Series.mode` and :func:`DataFrame.mode` now support the ``dropna`` parameter which can be used to specify whether ``NaN``/``NaT`` values should be considered (:issue:`17534`) - :func:`DataFrame.to_csv` and :func:`Series.to_csv` now support the ``compression`` keyword when a file handle is passed. (:issue:`21227`) @@ -427,42 +430,8 @@ Other Enhancements Backwards incompatible API changes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -- A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`) -- :meth:`Series.str.cat` will now raise if ``others`` is a ``set`` (:issue:`23009`) -- Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`) -- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`) -- :func:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`) -- The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`) -- Incorrectly passing a :class:`DatetimeIndex` to :meth:`MultiIndex.from_tuples`, rather than a sequence of tuples, now raises a ``TypeError`` rather than a ``ValueError`` (:issue:`24024`) -- :func:`pd.offsets.generate_range` argument ``time_rule`` has been removed; use ``offset`` instead (:issue:`24157`) -- In 0.23.x, pandas would raise a ``ValueError`` on a merge of a numeric column (e.g. ``int`` dtyped column) and an ``object`` dtyped column (:issue:`9780`). We have re-enabled the ability to merge ``object`` and other dtypes; pandas will still raise on a merge between a numeric and an ``object`` dtyped column that is composed only of strings (:issue:`21681`) - -Percentage change on groupby -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -Fixed a bug where calling :func:`pancas.core.groupby.SeriesGroupBy.pct_change` or :func:`pandas.core.groupby.DataFrameGroupBy.pct_change` would previously work across groups when calculating the percent change, where it now correctly works per group (:issue:`21200`, :issue:`21235`). - -.. ipython:: python - - df = pd.DataFrame({'grp': ['a', 'a', 'b'], 'foo': [1.0, 1.1, 2.2]}) - df - -Previous behavior: +Pandas 0.24.0 includes a number of API breaking changes. -.. code-block:: ipython - - In [1]: df.groupby('grp').pct_change() - Out[1]: - foo - 0 NaN - 1 0.1 - 2 1.0 - -New behavior: - -.. ipython:: python - - df.groupby('grp').pct_change() .. _whatsnew_0240.api_breaking.deps: @@ -716,8 +685,8 @@ is the case with :attr:`Period.end_time`, for example .. _whatsnew_0240.api_breaking.datetime_unique: -Datetime w/tz and unique -^^^^^^^^^^^^^^^^^^^^^^^^ +Series.unique for Timezone-Aware Data +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The return type of :meth:`Series.unique` for datetime with timezone values has changed from an :class:`numpy.ndarray` of :class:`Timestamp` objects to a :class:`arrays.DatetimeArray` (:issue:`24024`). @@ -1086,57 +1055,6 @@ broadcast. (:issue:`23000`) df + arr[[0], :] # 1 row, 2 columns df + arr[:, [1]] # 1 column, 3 rows - -.. _whatsnew_0240.api.extension: - -ExtensionType Changes -^^^^^^^^^^^^^^^^^^^^^ - - **Equality and Hashability** - -Pandas now requires that extension dtypes be hashable. The base class implements -a default ``__eq__`` and ``__hash__``. If you have a parametrized dtype, you should -update the ``ExtensionDtype._metadata`` tuple to match the signature of your -``__init__`` method. See :class:`pandas.api.extensions.ExtensionDtype` for more (:issue:`22476`). - -**Reshaping changes** - -- :meth:`~pandas.api.types.ExtensionArray.dropna` has been added (:issue:`21185`) -- :meth:`~pandas.api.types.ExtensionArray.repeat` has been added (:issue:`24349`) -- The ``ExtensionArray`` constructor, ``_from_sequence`` now take the keyword arg ``copy=False`` (:issue:`21185`) -- :meth:`pandas.api.extensions.ExtensionArray.shift` added as part of the basic ``ExtensionArray`` interface (:issue:`22387`). -- :meth:`~pandas.api.types.ExtensionArray.searchsorted` has been added (:issue:`24350`) -- Support for reduction operations such as ``sum``, ``mean`` via opt-in base class method override (:issue:`22762`) -- :func:`ExtensionArray.isna` is allowed to return an ``ExtensionArray`` (:issue:`22325`). - -**Dtype changes** - -- ``ExtensionDtype`` has gained the ability to instantiate from string dtypes, e.g. ``decimal`` would instantiate a registered ``DecimalDtype``; furthermore - the ``ExtensionDtype`` has gained the method ``construct_array_type`` (:issue:`21185`) -- Added ``ExtensionDtype._is_numeric`` for controlling whether an extension dtype is considered numeric (:issue:`22290`). -- Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`) -- Updated the ``.type`` attribute for ``PeriodDtype``, ``DatetimeTZDtype``, and ``IntervalDtype`` to be instances of the dtype (``Period``, ``Timestamp``, and ``Interval`` respectively) (:issue:`22938`) - -**Other changes** - -- A default repr for :class:`pandas.api.extensions.ExtensionArray` is now provided (:issue:`23601`). -- An ``ExtensionArray`` with a boolean dtype now works correctly as a boolean indexer. :meth:`pandas.api.types.is_bool_dtype` now properly considers them boolean (:issue:`22326`) - -**Bug Fixes** - -- Bug in :meth:`Series.get` for ``Series`` using ``ExtensionArray`` and integer index (:issue:`21257`) -- :meth:`~Series.shift` now dispatches to :meth:`ExtensionArray.shift` (:issue:`22386`) -- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`) -- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`) -- :meth:`Series.astype` and :meth:`DataFrame.astype` now dispatch to :meth:`ExtensionArray.astype` (:issue:`21185`). -- Slicing a single row of a ``DataFrame`` with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`) -- Bug when concatenating multiple ``Series`` with different extension dtypes not casting to object dtype (:issue:`22994`) -- Series backed by an ``ExtensionArray`` now work with :func:`util.hash_pandas_object` (:issue:`23066`) -- :meth:`DataFrame.stack` no longer converts to object dtype for DataFrames where each column has the same extension dtype. The output Series will have the same dtype as the columns (:issue:`23077`). -- :meth:`Series.unstack` and :meth:`DataFrame.unstack` no longer convert extension arrays to object-dtype ndarrays. Each column in the output ``DataFrame`` will now have the same dtype as the input (:issue:`23077`). -- Bug when grouping :meth:`Dataframe.groupby()` and aggregating on ``ExtensionArray`` it was not returning the actual ``ExtensionArray`` dtype (:issue:`23227`). -- Bug in :func:`pandas.merge` when merging on an extension array-backed column (:issue:`23020`). - .. _whatsnew_0240.api.incompatibilities: Series and Index Data-Dtype Incompatibilities @@ -1212,6 +1130,15 @@ Datetimelike API Changes Other API Changes ^^^^^^^^^^^^^^^^^ +- A newly constructed empty :class:`DataFrame` with integer as the ``dtype`` will now only be cast to ``float64`` if ``index`` is specified (:issue:`22858`) +- :meth:`Series.str.cat` will now raise if ``others`` is a ``set`` (:issue:`23009`) +- Passing scalar values to :class:`DatetimeIndex` or :class:`TimedeltaIndex` will now raise ``TypeError`` instead of ``ValueError`` (:issue:`23539`) +- ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`) +- :func:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`) +- The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`) +- Incorrectly passing a :class:`DatetimeIndex` to :meth:`MultiIndex.from_tuples`, rather than a sequence of tuples, now raises a ``TypeError`` rather than a ``ValueError`` (:issue:`24024`) +- :func:`pd.offsets.generate_range` argument ``time_rule`` has been removed; use ``offset`` instead (:issue:`24157`) +- In 0.23.x, pandas would raise a ``ValueError`` on a merge of a numeric column (e.g. ``int`` dtyped column) and an ``object`` dtyped column (:issue:`9780`). We have re-enabled the ability to merge ``object`` and other dtypes; pandas will still raise on a merge between a numeric and an ``object`` dtyped column that is composed only of strings (:issue:`21681`) - Accessing a level of a ``MultiIndex`` with a duplicate name (e.g. in :meth:`~MultiIndex.get_level_values`) now raises a ``ValueError`` instead of a ``KeyError`` (:issue:`21678`). - Invalid construction of ``IntervalDtype`` will now always raise a ``TypeError`` rather than a ``ValueError`` if the subdtype is invalid (:issue:`21185`) @@ -1233,6 +1160,58 @@ Other API Changes - :func:`Series.to_list` and :func:`Index.to_list` are now aliases of ``Series.tolist`` respectively ``Index.tolist`` (:issue:`8826`) - The result of ``SparseSeries.unstack`` is now a :class:`DataFrame` with sparse values, rather than a :class:`SparseDataFrame` (:issue:`24372`). + +.. _whatsnew_0240.api.extension: + +ExtensionType Changes +^^^^^^^^^^^^^^^^^^^^^ + +**Equality and Hashability** + +Pandas now requires that extension dtypes be hashable. The base class implements +a default ``__eq__`` and ``__hash__``. If you have a parametrized dtype, you should +update the ``ExtensionDtype._metadata`` tuple to match the signature of your +``__init__`` method. See :class:`pandas.api.extensions.ExtensionDtype` for more (:issue:`22476`). + +**Reshaping changes** + +- :meth:`~pandas.api.types.ExtensionArray.dropna` has been added (:issue:`21185`) +- :meth:`~pandas.api.types.ExtensionArray.repeat` has been added (:issue:`24349`) +- The ``ExtensionArray`` constructor, ``_from_sequence`` now take the keyword arg ``copy=False`` (:issue:`21185`) +- :meth:`pandas.api.extensions.ExtensionArray.shift` added as part of the basic ``ExtensionArray`` interface (:issue:`22387`). +- :meth:`~pandas.api.types.ExtensionArray.searchsorted` has been added (:issue:`24350`) +- Support for reduction operations such as ``sum``, ``mean`` via opt-in base class method override (:issue:`22762`) +- :func:`ExtensionArray.isna` is allowed to return an ``ExtensionArray`` (:issue:`22325`). + +**Dtype changes** + +- ``ExtensionDtype`` has gained the ability to instantiate from string dtypes, e.g. ``decimal`` would instantiate a registered ``DecimalDtype``; furthermore + the ``ExtensionDtype`` has gained the method ``construct_array_type`` (:issue:`21185`) +- Added ``ExtensionDtype._is_numeric`` for controlling whether an extension dtype is considered numeric (:issue:`22290`). +- Added :meth:`pandas.api.types.register_extension_dtype` to register an extension type with pandas (:issue:`22664`) +- Updated the ``.type`` attribute for ``PeriodDtype``, ``DatetimeTZDtype``, and ``IntervalDtype`` to be instances of the dtype (``Period``, ``Timestamp``, and ``Interval`` respectively) (:issue:`22938`) + +**Other changes** + +- A default repr for :class:`pandas.api.extensions.ExtensionArray` is now provided (:issue:`23601`). +- An ``ExtensionArray`` with a boolean dtype now works correctly as a boolean indexer. :meth:`pandas.api.types.is_bool_dtype` now properly considers them boolean (:issue:`22326`) + +**Bug Fixes** + +- Bug in :meth:`Series.get` for ``Series`` using ``ExtensionArray`` and integer index (:issue:`21257`) +- :meth:`~Series.shift` now dispatches to :meth:`ExtensionArray.shift` (:issue:`22386`) +- :meth:`Series.combine()` works correctly with :class:`~pandas.api.extensions.ExtensionArray` inside of :class:`Series` (:issue:`20825`) +- :meth:`Series.combine()` with scalar argument now works for any function type (:issue:`21248`) +- :meth:`Series.astype` and :meth:`DataFrame.astype` now dispatch to :meth:`ExtensionArray.astype` (:issue:`21185`). +- Slicing a single row of a ``DataFrame`` with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`) +- Bug when concatenating multiple ``Series`` with different extension dtypes not casting to object dtype (:issue:`22994`) +- Series backed by an ``ExtensionArray`` now work with :func:`util.hash_pandas_object` (:issue:`23066`) +- :meth:`DataFrame.stack` no longer converts to object dtype for DataFrames where each column has the same extension dtype. The output Series will have the same dtype as the columns (:issue:`23077`). +- :meth:`Series.unstack` and :meth:`DataFrame.unstack` no longer convert extension arrays to object-dtype ndarrays. Each column in the output ``DataFrame`` will now have the same dtype as the input (:issue:`23077`). +- Bug when grouping :meth:`Dataframe.groupby()` and aggregating on ``ExtensionArray`` it was not returning the actual ``ExtensionArray`` dtype (:issue:`23227`). +- Bug in :func:`pandas.merge` when merging on an extension array-backed column (:issue:`23020`). + + .. _whatsnew_0240.deprecations: Deprecations @@ -1800,6 +1779,7 @@ Groupby/Resample/Rolling - Calling :meth:`pandas.core.groupby.GroupBy.rank` with empty groups and ``pct=True`` was raising a ``ZeroDivisionError`` (:issue:`22519`) - Bug in :meth:`DataFrame.resample` when resampling ``NaT`` in ``TimeDeltaIndex`` (:issue:`13223`). - Bug in :meth:`DataFrame.groupby` did not respect the ``observed`` argument when selecting a column and instead always used ``observed=False`` (:issue:`23970`) +- Bug in :func:`pandas.core.groupby.SeriesGroupBy.pct_change` or :func:`pandas.core.groupby.DataFrameGroupBy.pct_change` would previously work across groups when calculating the percent change, where it now correctly works per group (:issue:`21200`, :issue:`21235`). - Bug preventing hash table creation with very large number (2^32) of rows (:issue:`22805`) Reshaping