diff --git a/doc/source/whatsnew/v1.5.0.rst b/doc/source/whatsnew/v1.5.0.rst index 358d9447b131d..fc3b01d35861b 100644 --- a/doc/source/whatsnew/v1.5.0.rst +++ b/doc/source/whatsnew/v1.5.0.rst @@ -457,7 +457,7 @@ Performance improvements .. _whatsnew_150.bug_fixes: Bug fixes -~~~~~~~~~ +- Bug in :func:`groupby.last()` that caused numpy arrays ordered with last with pd.NA to have dytype:Int64 type rather than dtype:Bool (:issue:`46409`) Categorical ^^^^^^^^^^^ @@ -477,12 +477,17 @@ Datetimelike Timedelta ^^^^^^^^^ +<<<<<<< HEAD - Bug in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`) +- +======= +- Bug FIXED in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`) +- +>>>>>>> refs/remotes/origin/shiny-new-feature Time Zones ^^^^^^^^^^ - -- Numeric ^^^^^^^ diff --git a/doc/source/whatsnew/v1.5.0_BACKUP_28826.rst b/doc/source/whatsnew/v1.5.0_BACKUP_28826.rst new file mode 100644 index 0000000000000..fc3b01d35861b --- /dev/null +++ b/doc/source/whatsnew/v1.5.0_BACKUP_28826.rst @@ -0,0 +1,652 @@ +.. _whatsnew_150: + +What's new in 1.5.0 (??) +------------------------ + +These are the changes in pandas 1.5.0. See :ref:`release` for a full changelog +including other versions of pandas. + +{{ header }} + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.enhancements: + +Enhancements +~~~~~~~~~~~~ + +.. _whatsnew_150.enhancements.styler: + +Styler +^^^^^^ + + - New method :meth:`.Styler.to_string` for alternative customisable output methods (:issue:`44502`) + - Added the ability to render ``border`` and ``border-{side}`` CSS properties in Excel (:issue:`42276`) + - Added a new method :meth:`.Styler.concat` which allows adding customised footer rows to visualise additional calculations on the data, e.g. totals and counts etc. (:issue:`43875`, :issue:`46186`) + - :meth:`.Styler.highlight_null` now accepts ``color`` consistently with other builtin methods and deprecates ``null_color`` although this remains backwards compatible (:issue:`45907`) + +.. _whatsnew_150.enhancements.resample_group_keys: + +Control of index with ``group_keys`` in :meth:`DataFrame.resample` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The argument ``group_keys`` has been added to the method :meth:`DataFrame.resample`. +As with :meth:`DataFrame.groupby`, this argument controls the whether each group is added +to the index in the resample when :meth:`.Resampler.apply` is used. + +.. warning:: + Not specifying the ``group_keys`` argument will retain the + previous behavior and emit a warning if the result will change + by specifying ``group_keys=False``. In a future version + of pandas, not specifying ``group_keys`` will default to + the same behavior as ``group_keys=False``. + +.. ipython:: python + + df = pd.DataFrame( + {'a': range(6)}, + index=pd.date_range("2021-01-01", periods=6, freq="8H") + ) + df.resample("D", group_keys=True).apply(lambda x: x) + df.resample("D", group_keys=False).apply(lambda x: x) + +Previously, the resulting index would depend upon the values returned by ``apply``, +as seen in the following example. + +.. code-block:: ipython + + In [1]: # pandas 1.3 + In [2]: df.resample("D").apply(lambda x: x) + Out[2]: + a + 2021-01-01 00:00:00 0 + 2021-01-01 08:00:00 1 + 2021-01-01 16:00:00 2 + 2021-01-02 00:00:00 3 + 2021-01-02 08:00:00 4 + 2021-01-02 16:00:00 5 + + In [3]: df.resample("D").apply(lambda x: x.reset_index()) + Out[3]: + index a + 2021-01-01 0 2021-01-01 00:00:00 0 + 1 2021-01-01 08:00:00 1 + 2 2021-01-01 16:00:00 2 + 2021-01-02 0 2021-01-02 00:00:00 3 + 1 2021-01-02 08:00:00 4 + 2 2021-01-02 16:00:00 5 + +.. _whatsnew_150.enhancements.other: + +Other enhancements +^^^^^^^^^^^^^^^^^^ +- :meth:`MultiIndex.to_frame` now supports the argument ``allow_duplicates`` and raises on duplicate labels if it is missing or False (:issue:`45245`) +- :class:`StringArray` now accepts array-likes containing nan-likes (``None``, ``np.nan``) for the ``values`` parameter in its constructor in addition to strings and :attr:`pandas.NA`. (:issue:`40839`) +- Improved the rendering of ``categories`` in :class:`CategoricalIndex` (:issue:`45218`) +- :meth:`to_numeric` now preserves float64 arrays when downcasting would generate values not representable in float32 (:issue:`43693`) +- :meth:`Series.reset_index` and :meth:`DataFrame.reset_index` now support the argument ``allow_duplicates`` (:issue:`44410`) +- :meth:`.GroupBy.min` and :meth:`.GroupBy.max` now supports `Numba `_ execution with the ``engine`` keyword (:issue:`45428`) +- :func:`read_csv` now supports ``defaultdict`` as a ``dtype`` parameter (:issue:`41574`) +- :meth:`DataFrame.rolling` and :meth:`Series.rolling` now support a ``step`` parameter with fixed-length windows (:issue:`15354`) +- Implemented a ``bool``-dtype :class:`Index`, passing a bool-dtype array-like to ``pd.Index`` will now retain ``bool`` dtype instead of casting to ``object`` (:issue:`45061`) +- Implemented a complex-dtype :class:`Index`, passing a complex-dtype array-like to ``pd.Index`` will now retain complex dtype instead of casting to ``object`` (:issue:`45845`) +- :class:`Series` and :class:`DataFrame` with ``IntegerDtype`` now supports bitwise operations (:issue:`34463`) +- Add ``milliseconds`` field support for :class:`~pandas.DateOffset` (:issue:`43371`) +- :meth:`DataFrame.reset_index` now accepts a ``names`` argument which renames the index names (:issue:`6878`) +- :meth:`pd.concat` now raises when ``levels`` is given but ``keys`` is None (:issue:`46653`) +- :meth:`pd.concat` now raises when ``levels`` contains duplicate values (:issue:`46653`) +- Added ``numeric_only`` argument to :meth:`DataFrame.corr`, :meth:`DataFrame.corrwith`, and :meth:`DataFrame.cov` (:issue:`46560`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.notable_bug_fixes: + +Notable bug fixes +~~~~~~~~~~~~~~~~~ + +These are bug fixes that might have notable behavior changes. + +.. _whatsnew_150.notable_bug_fixes.notable_bug_fix1: + +Styler +^^^^^^ + +- Fixed bug in :class:`CSSToExcelConverter` leading to ``TypeError`` when border color provided without border style for ``xlsxwriter`` engine (:issue:`42276`) + +.. _whatsnew_150.notable_bug_fixes.groupby_transform_dropna: + +Using ``dropna=True`` with ``groupby`` transforms +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A transform is an operation whose result has the same size as its input. When the +result is a :class:`DataFrame` or :class:`Series`, it is also required that the +index of the result matches that of the input. In pandas 1.4, using +:meth:`.DataFrameGroupBy.transform` or :meth:`.SeriesGroupBy.transform` with null +values in the groups and ``dropna=True`` gave incorrect results. Demonstrated by the +examples below, the incorrect results either contained incorrect values, or the result +did not have the same index as the input. + +.. ipython:: python + + df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [2, 3, 4]}) + +*Old behavior*: + +.. code-block:: ipython + + In [3]: # Value in the last row should be np.nan + df.groupby('a', dropna=True).transform('sum') + Out[3]: + b + 0 5 + 1 5 + 2 5 + + In [3]: # Should have one additional row with the value np.nan + df.groupby('a', dropna=True).transform(lambda x: x.sum()) + Out[3]: + b + 0 5 + 1 5 + + In [3]: # The value in the last row is np.nan interpreted as an integer + df.groupby('a', dropna=True).transform('ffill') + Out[3]: + b + 0 2 + 1 3 + 2 -9223372036854775808 + + In [3]: # Should have one additional row with the value np.nan + df.groupby('a', dropna=True).transform(lambda x: x) + Out[3]: + b + 0 2 + 1 3 + +*New behavior*: + +.. ipython:: python + + df.groupby('a', dropna=True).transform('sum') + df.groupby('a', dropna=True).transform(lambda x: x.sum()) + df.groupby('a', dropna=True).transform('ffill') + df.groupby('a', dropna=True).transform(lambda x: x) + +.. _whatsnew_150.notable_bug_fixes.visualization: + +Styler +^^^^^^ + +- Fix showing "None" as ylabel in :meth:`Series.plot` when not setting ylabel (:issue:`46129`) + +.. _whatsnew_150.notable_bug_fixes.notable_bug_fix2: + +notable_bug_fix2 +^^^^^^^^^^^^^^^^ + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.api_breaking: + +Backwards incompatible API changes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. _whatsnew_150.api_breaking.read_xml_dtypes: + +read_xml now supports ``dtype``, ``converters``, and ``parse_dates`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Similar to other IO methods, :func:`pandas.read_xml` now supports assigning specific dtypes to columns, +apply converter methods, and parse dates (:issue:`43567`). + +.. ipython:: python + + xml_dates = """ + + + square + 00360 + 4.0 + 2020-01-01 + + + circle + 00360 + + 2021-01-01 + + + triangle + 00180 + 3.0 + 2022-01-01 + + """ + + df = pd.read_xml( + xml_dates, + dtype={'sides': 'Int64'}, + converters={'degrees': str}, + parse_dates=['date'] + ) + df + df.dtypes + +.. _whatsnew_150.read_xml_iterparse: + +read_xml now supports large XML using ``iterparse`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For very large XML files that can range in hundreds of megabytes to gigabytes, :func:`pandas.read_xml` +now supports parsing such sizeable files using `lxml's iterparse`_ and `etree's iterparse`_ +which are memory-efficient methods to iterate through XML trees and extract specific elements +and attributes without holding entire tree in memory (:issue:`#45442`). + +.. code-block:: ipython + + In [1]: df = pd.read_xml( + ... "/path/to/downloaded/enwikisource-latest-pages-articles.xml", + ... iterparse = {"page": ["title", "ns", "id"]}) + ... ) + df + Out[2]: + title ns id + 0 Gettysburg Address 0 21450 + 1 Main Page 0 42950 + 2 Declaration by United Nations 0 8435 + 3 Constitution of the United States of America 0 8435 + 4 Declaration of Independence (Israel) 0 17858 + ... ... ... ... + 3578760 Page:Black cat 1897 07 v2 n10.pdf/17 104 219649 + 3578761 Page:Black cat 1897 07 v2 n10.pdf/43 104 219649 + 3578762 Page:Black cat 1897 07 v2 n10.pdf/44 104 219649 + 3578763 The History of Tom Jones, a Foundling/Book IX 0 12084291 + 3578764 Page:Shakespeare of Stratford (1926) Yale.djvu/91 104 21450 + + [3578765 rows x 3 columns] + + +.. _`lxml's iterparse`: https://lxml.de/3.2/parsing.html#iterparse-and-iterwalk +.. _`etree's iterparse`: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse + +.. _whatsnew_150.api_breaking.api_breaking2: + +api_breaking_change2 +^^^^^^^^^^^^^^^^^^^^ + +.. _whatsnew_150.api_breaking.deps: + +Increased minimum versions for dependencies +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Some minimum supported versions of dependencies were updated. +If installed, we now require: + ++-----------------+-----------------+----------+---------+ +| Package | Minimum Version | Required | Changed | ++=================+=================+==========+=========+ +| mypy (dev) | 0.941 | | X | ++-----------------+-----------------+----------+---------+ + + +For `optional libraries `_ the general recommendation is to use the latest version. +The following table lists the lowest version per library that is currently being tested throughout the development of pandas. +Optional libraries below the lowest tested version may still work, but are not considered supported. + ++-----------------+-----------------+---------+ +| Package | Minimum Version | Changed | ++=================+=================+=========+ +| | | X | ++-----------------+-----------------+---------+ + +See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more. + +.. _whatsnew_150.api_breaking.other: + +Other API changes +^^^^^^^^^^^^^^^^^ + +- BigQuery I/O methods :func:`read_gbq` and :meth:`DataFrame.to_gbq` default to + ``auth_local_webserver = True``. Google has deprecated the + ``auth_local_webserver = False`` `"out of band" (copy-paste) flow + `_. + The ``auth_local_webserver = False`` option is planned to stop working in + October 2022. (:issue:`46312`) +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.deprecations: + +Deprecations +~~~~~~~~~~~~ + +.. _whatsnew_150.deprecations.int_slicing_series: + +In a future version, integer slicing on a :class:`Series` with a :class:`Int64Index` or :class:`RangeIndex` will be treated as *label-based*, not positional. This will make the behavior consistent with other :meth:`Series.__getitem__` and :meth:`Series.__setitem__` behaviors (:issue:`45162`). + +For example: + +.. ipython:: python + + ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11]) + +In the old behavior, ``ser[2:4]`` treats the slice as positional: + +*Old behavior*: + +.. code-block:: ipython + + In [3]: ser[2:4] + Out[3]: + 5 3 + 7 4 + dtype: int64 + +In a future version, this will be treated as label-based: + +*Future behavior*: + +.. code-block:: ipython + + In [4]: ser.loc[2:4] + Out[4]: + 2 1 + 3 2 + dtype: int64 + +To retain the old behavior, use ``series.iloc[i:j]``. To get the future behavior, +use ``series.loc[i:j]``. + +Slicing on a :class:`DataFrame` will not be affected. + +.. _whatsnew_150.deprecations.excel_writer_attributes: + +:class:`ExcelWriter` attributes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +All attributes of :class:`ExcelWriter` were previously documented as not +public. However some third party Excel engines documented accessing +``ExcelWriter.book`` or ``ExcelWriter.sheets``, and users were utilizing these +and possibly other attributes. Previously these attributes were not safe to use; +e.g. modifications to ``ExcelWriter.book`` would not update ``ExcelWriter.sheets`` +and conversely. In order to support this, pandas has made some attributes public +and improved their implementations so that they may now be safely used. (:issue:`45572`) + +The following attributes are now public and considered safe to access. + + - ``book`` + - ``check_extension`` + - ``close`` + - ``date_format`` + - ``datetime_format`` + - ``engine`` + - ``if_sheet_exists`` + - ``sheets`` + - ``supported_extensions`` + +The following attributes have been deprecated. They now raise a ``FutureWarning`` +when accessed and will be removed in a future version. Users should be aware +that their usage is considered unsafe, and can lead to unexpected results. + + - ``cur_sheet`` + - ``handles`` + - ``path`` + - ``save`` + - ``write_cells`` + +See the documentation of :class:`ExcelWriter` for further details. + +.. _whatsnew_150.deprecations.group_keys_in_apply: + +Using ``group_keys`` with transformers in :meth:`.GroupBy.apply` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In previous versions of pandas, if it was inferred that the function passed to +:meth:`.GroupBy.apply` was a transformer (i.e. the resulting index was equal to +the input index), the ``group_keys`` argument of :meth:`DataFrame.groupby` and +:meth:`Series.groupby` was ignored and the group keys would never be added to +the index of the result. In the future, the group keys will be added to the index +when the user specifies ``group_keys=True``. + +As ``group_keys=True`` is the default value of :meth:`DataFrame.groupby` and +:meth:`Series.groupby`, not specifying ``group_keys`` with a transformer will +raise a ``FutureWarning``. This can be silenced and the previous behavior +retained by specifying ``group_keys=False``. + +.. _whatsnew_150.deprecations.other: + +Other Deprecations +^^^^^^^^^^^^^^^^^^ +- Deprecated the keyword ``line_terminator`` in :meth:`DataFrame.to_csv` and :meth:`Series.to_csv`, use ``lineterminator`` instead; this is for consistency with :func:`read_csv` and the standard library 'csv' module (:issue:`9568`) +- Deprecated behavior of :meth:`SparseArray.astype`, :meth:`Series.astype`, and :meth:`DataFrame.astype` with :class:`SparseDtype` when passing a non-sparse ``dtype``. In a future version, this will cast to that non-sparse dtype instead of wrapping it in a :class:`SparseDtype` (:issue:`34457`) +- Deprecated behavior of :meth:`DatetimeIndex.intersection` and :meth:`DatetimeIndex.symmetric_difference` (``union`` behavior was already deprecated in version 1.3.0) with mixed time zones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`, :issue:`45357`) +- Deprecated :meth:`DataFrame.iteritems`, :meth:`Series.iteritems`, :meth:`HDFStore.iteritems` in favor of :meth:`DataFrame.items`, :meth:`Series.items`, :meth:`HDFStore.items` (:issue:`45321`) +- Deprecated :meth:`Series.is_monotonic` and :meth:`Index.is_monotonic` in favor of :meth:`Series.is_monotonic_increasing` and :meth:`Index.is_monotonic_increasing` (:issue:`45422`, :issue:`21335`) +- Deprecated behavior of :meth:`DatetimeIndex.astype`, :meth:`TimedeltaIndex.astype`, :meth:`PeriodIndex.astype` when converting to an integer dtype other than ``int64``. In a future version, these will convert to exactly the specified dtype (instead of always ``int64``) and will raise if the conversion overflows (:issue:`45034`) +- Deprecated the ``__array_wrap__`` method of DataFrame and Series, rely on standard numpy ufuncs instead (:issue:`45451`) +- Deprecated treating float-dtype data as wall-times when passed with a timezone to :class:`Series` or :class:`DatetimeIndex` (:issue:`45573`) +- Deprecated the behavior of :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``timedelta64[ns]`` dtype and incompatible fill value; in a future version this will cast to a common dtype (usually object) instead of raising, matching the behavior of other dtypes (:issue:`45746`) +- Deprecated the ``warn`` parameter in :func:`infer_freq` (:issue:`45947`) +- Deprecated allowing non-keyword arguments in :meth:`ExtensionArray.argsort` (:issue:`46134`) +- Deprecated treating all-bool ``object``-dtype columns as bool-like in :meth:`DataFrame.any` and :meth:`DataFrame.all` with ``bool_only=True``, explicitly cast to bool instead (:issue:`46188`) +- Deprecated behavior of method :meth:`DataFrame.quantile`, attribute ``numeric_only`` will default False. Including datetime/timedelta columns in the result (:issue:`7308`). +- Deprecated :attr:`Timedelta.freq` and :attr:`Timedelta.is_populated` (:issue:`46430`) +- Deprecated :attr:`Timedelta.delta` (:issue:`46476`) +- Deprecated passing arguments as positional in :meth:`DataFrame.any` and :meth:`Series.any` (:issue:`44802`) +- Deprecated the ``closed`` argument in :meth:`interval_range` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`) +- Deprecated the methods :meth:`DataFrame.mad`, :meth:`Series.mad`, and the corresponding groupby methods (:issue:`11787`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.performance: + +Performance improvements +~~~~~~~~~~~~~~~~~~~~~~~~ +- Performance improvement in :meth:`DataFrame.corrwith` for column-wise (axis=0) Pearson and Spearman correlation when other is a :class:`Series` (:issue:`46174`) +- Performance improvement in :meth:`.GroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`) +- Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`) +- Performance improvement in :meth:`.GroupBy.diff` (:issue:`16706`) +- Performance improvement in :meth:`.GroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`) +- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions when only a single group exists (:issue:`44977`) +- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` for tuple-based indexing of a :class:`MultiIndex` (:issue:`45681`, :issue:`46040`, :issue:`46330`) +- Performance improvement in :attr:`MultiIndex.values` when the MultiIndex contains levels of type DatetimeIndex, TimedeltaIndex or ExtensionDtypes (:issue:`46288`) +- Performance improvement in :func:`merge` when left and/or right are empty (:issue:`45838`) +- Performance improvement in :meth:`DataFrame.join` when left and/or right are empty (:issue:`46015`) +- Performance improvement in :meth:`DataFrame.reindex` and :meth:`Series.reindex` when target is a :class:`MultiIndex` (:issue:`46235`) +- Performance improvement when setting values in a pyarrow backed string array (:issue:`46400`) +- Performance improvement in :func:`factorize` (:issue:`46109`) +- Performance improvement in :class:`DataFrame` and :class:`Series` constructors for extension dtype scalars (:issue:`45854`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.bug_fixes: + +Bug fixes +- Bug in :func:`groupby.last()` that caused numpy arrays ordered with last with pd.NA to have dytype:Int64 type rather than dtype:Bool (:issue:`46409`) + +Categorical +^^^^^^^^^^^ +- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`) +- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`) +- + +Datetimelike +^^^^^^^^^^^^ +- Bug in :meth:`DataFrame.quantile` with datetime-like dtypes and no rows incorrectly returning ``float64`` dtype instead of retaining datetime-like dtype (:issue:`41544`) +- Bug in :func:`to_datetime` with sequences of ``np.str_`` objects incorrectly raising (:issue:`32264`) +- Bug in :class:`Timestamp` construction when passing datetime components as positional arguments and ``tzinfo`` as a keyword argument incorrectly raising (:issue:`31929`) +- Bug in :meth:`Index.astype` when casting from object dtype to ``timedelta64[ns]`` dtype incorrectly casting ``np.datetime64("NaT")`` values to ``np.timedelta64("NaT")`` instead of raising (:issue:`45722`) +- Bug in :meth:`SeriesGroupBy.value_counts` index when passing categorical column (:issue:`44324`) +- Bug in :meth:`DatetimeIndex.tz_localize` localizing to UTC failing to make a copy of the underlying data (:issue:`46460`) +- + +Timedelta +^^^^^^^^^ +<<<<<<< HEAD +- Bug in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`) +- +======= +- Bug FIXED in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`) +- +>>>>>>> refs/remotes/origin/shiny-new-feature + +Time Zones +^^^^^^^^^^ +- + +Numeric +^^^^^^^ +- Bug in operations with array-likes with ``dtype="boolean"`` and :attr:`NA` incorrectly altering the array in-place (:issue:`45421`) +- Bug in division, ``pow`` and ``mod`` operations on array-likes with ``dtype="boolean"`` not being like their ``np.bool_`` counterparts (:issue:`46063`) +- Bug in multiplying a :class:`Series` with ``IntegerDtype`` or ``FloatingDtype`` by an array-like with ``timedelta64[ns]`` dtype incorrectly raising (:issue:`45622`) +- + +Conversion +^^^^^^^^^^ +- Bug in :meth:`DataFrame.astype` not preserving subclasses (:issue:`40810`) +- Bug in constructing a :class:`Series` from a float-containing list or a floating-dtype ndarray-like (e.g. ``dask.Array``) and an integer dtype raising instead of casting like we would with an ``np.ndarray`` (:issue:`40110`) +- Bug in :meth:`Float64Index.astype` to unsigned integer dtype incorrectly casting to ``np.int64`` dtype (:issue:`45309`) +- Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`) +- Bug in :func:`array` with ``FloatingDtype`` and values containing float-castable strings incorrectly raising (:issue:`45424`) +- Bug when comparing string and datetime64ns objects causing ``OverflowError`` exception. (:issue:`45506`) +- Bug in metaclass of generic abstract dtypes causing :meth:`DataFrame.apply` and :meth:`Series.apply` to raise for the built-in function ``type`` (:issue:`46684`) + +Strings +^^^^^^^ +- Bug in :meth:`str.startswith` and :meth:`str.endswith` when using other series as parameter _pat_. Now raises ``TypeError`` (:issue:`3485`) +- + +Interval +^^^^^^^^ +- Bug in :meth:`IntervalArray.__setitem__` when setting ``np.nan`` into an integer-backed array raising ``ValueError`` instead of ``TypeError`` (:issue:`45484`) +- + +Indexing +^^^^^^^^ +- Bug in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between ``frame.at[x, y]`` vs ``frame[y].loc[x]`` (:issue:`22372`) +- Bug in :meth:`DataFrame.iloc` where indexing a single row on a :class:`DataFrame` with a single ExtensionDtype column gave a copy instead of a view on the underlying data (:issue:`45241`) +- Bug in :meth:`Series.align` does not create :class:`MultiIndex` with union of levels when both MultiIndexes intersections are identical (:issue:`45224`) +- Bug in setting a NA value (``None`` or ``np.nan``) into a :class:`Series` with int-based :class:`IntervalDtype` incorrectly casting to object dtype instead of a float-based :class:`IntervalDtype` (:issue:`45568`) +- Bug in indexing setting values into an ``ExtensionDtype`` column with ``df.iloc[:, i] = values`` with ``values`` having the same dtype as ``df.iloc[:, i]`` incorrectly inserting a new array instead of setting in-place (:issue:`33457`) +- Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ``ValueError`` was raised instead of casting to a common dtype (:issue:`45070`) +- Bug in :meth:`Series.__setitem__` when setting incompatible values into a ``PeriodDtype`` or ``IntervalDtype`` :class:`Series` raising when indexing with a boolean mask but coercing when indexing with otherwise-equivalent indexers; these now consistently coerce, along with :meth:`Series.mask` and :meth:`Series.where` (:issue:`45768`) +- Bug in :meth:`DataFrame.where` with multiple columns with datetime-like dtypes failing to downcast results consistent with other dtypes (:issue:`45837`) +- Bug in :meth:`Series.loc.__setitem__` and :meth:`Series.loc.__getitem__` not raising when using multiple keys without using a :class:`MultiIndex` (:issue:`13831`) +- Bug in :meth:`Index.reindex` raising ``AssertionError`` when ``level`` was specified but no :class:`MultiIndex` was given; level is ignored now (:issue:`35132`) +- Bug when setting a value too large for a :class:`Series` dtype failing to coerce to a common type (:issue:`26049`, :issue:`32878`) +- Bug in :meth:`loc.__setitem__` treating ``range`` keys as positional instead of label-based (:issue:`45479`) +- Bug in :meth:`Series.__setitem__` when setting ``boolean`` dtype values containing ``NA`` incorrectly raising instead of casting to ``boolean`` dtype (:issue:`45462`) +- Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtpye :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as ``np.nan`` (:issue:`44199`) +- Bug in :meth:`Series.__setitem__` with ``datetime64[ns]`` dtype, an all-``False`` boolean mask, and an incompatible value incorrectly casting to ``object`` instead of retaining ``datetime64[ns]`` dtype (:issue:`45967`) +- Bug in :meth:`Index.__getitem__` raising ``ValueError`` when indexer is from boolean dtype with ``NA`` (:issue:`45806`) +- Bug in :meth:`Series.mask` with ``inplace=True`` or setting values with a boolean mask with small integer dtypes incorrectly raising (:issue:`45750`) +- Bug in :meth:`DataFrame.mask` with ``inplace=True`` and ``ExtensionDtype`` columns incorrectly raising (:issue:`45577`) +- Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`) +- Bug in :meth:`DataFrame.__getattribute__` raising ``AttributeError`` if columns have ``"string"`` dtype (:issue:`46185`) +- Bug in indexing on a :class:`DatetimeIndex` with a ``np.str_`` key incorrectly raising (:issue:`45580`) +- Bug in :meth:`CategoricalIndex.get_indexer` when index contains ``NaN`` values, resulting in elements that are in target but not present in the index to be mapped to the index of the NaN element, instead of -1 (:issue:`45361`) +- Bug in setting large integer values into :class:`Series` with ``float32`` or ``float16`` dtype incorrectly altering these values instead of coercing to ``float64`` dtype (:issue:`45844`) +- Bug in :meth:`Series.asof` and :meth:`DataFrame.asof` incorrectly casting bool-dtype results to ``float64`` dtype (:issue:`16063`) +- + +Missing +^^^^^^^ +- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``downcast`` keyword not being respected in some cases where there are no NA values present (:issue:`45423`) +- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with :class:`IntervalDtype` and incompatible value raising instead of casting to a common (usually object) dtype (:issue:`45796`) +- Bug in :meth:`DataFrame.interpolate` with object-dtype column not returning a copy with ``inplace=False`` (:issue:`45791`) +- + +MultiIndex +^^^^^^^^^^ +- Bug in :meth:`DataFrame.loc` returning empty result when slicing a :class:`MultiIndex` with a negative step size and non-null start/stop values (:issue:`46156`) +- Bug in :meth:`DataFrame.loc` raising when slicing a :class:`MultiIndex` with a negative step size other than -1 (:issue:`46156`) +- Bug in :meth:`DataFrame.loc` raising when slicing a :class:`MultiIndex` with a negative step size and slicing a non-int labeled index level (:issue:`46156`) +- Bug in :meth:`Series.to_numpy` where multiindexed Series could not be converted to numpy arrays when an ``na_value`` was supplied (:issue:`45774`) +- Bug in :class:`MultiIndex.equals` not commutative when only one side has extension array dtype (:issue:`46026`) +- Bug in :meth:`MultiIndex.from_tuples` cannot construct Index of empty tuples (:issue:`45608`) + +I/O +^^^ +- Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains ``-np.inf`` (:issue:`45350`) +- Bug in :func:`read_excel` results in an infinite loop with certain ``skiprows`` callables (:issue:`45585`) +- Bug in :meth:`DataFrame.info` where a new line at the end of the output is omitted when called on an empty :class:`DataFrame` (:issue:`45494`) +- Bug in :func:`read_csv` not recognizing line break for ``on_bad_lines="warn"`` for ``engine="c"`` (:issue:`41710`) +- Bug in :meth:`DataFrame.to_csv` not respecting ``float_format`` for ``Float64`` dtype (:issue:`45991`) +- Bug in :func:`read_csv` not respecting a specified converter to index columns in all cases (:issue:`40589`) +- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`) +- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`) +- Bug in :func:`read_html` where elements surrounding ``
`` were joined without a space between them (:issue:`29528`) +- Bug in Parquet roundtrip for Interval dtype with ``datetime64[ns]`` subtype (:issue:`45881`) +- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements (:issue:`45598`) +- Bug in :func:`read_parquet` when ``engine="fastparquet"`` where the file was not closed on error (:issue:`46555`) + +Period +^^^^^^ +- Bug in subtraction of :class:`Period` from :class:`PeriodArray` returning wrong results (:issue:`45999`) +- Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, directives ``%l`` and ``%u`` were giving wrong results (:issue:`46252`) +- + +Plotting +^^^^^^^^ +- Bug in :meth:`DataFrame.plot.barh` that prevented labeling the x-axis and ``xlabel`` updating the y-axis label (:issue:`45144`) +- Bug in :meth:`DataFrame.plot.box` that prevented labeling the x-axis (:issue:`45463`) +- Bug in :meth:`DataFrame.boxplot` that prevented passing in ``xlabel`` and ``ylabel`` (:issue:`45463`) +- Bug in :meth:`DataFrame.boxplot` that prevented specifying ``vert=False`` (:issue:`36918`) +- Bug in :meth:`DataFrame.plot.scatter` that prevented specifying ``norm`` (:issue:`45809`) +- The function :meth:`DataFrame.plot.scatter` now accepts ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` for consistency to other plotting functions (:issue:`44670`) + +Groupby/resample/rolling +^^^^^^^^^^^^^^^^^^^^^^^^ +- Bug in :meth:`DataFrame.resample` ignoring ``closed="right"`` on :class:`TimedeltaIndex` (:issue:`45414`) +- Bug in :meth:`.DataFrameGroupBy.transform` fails when ``func="size"`` and the input DataFrame has multiple columns (:issue:`27469`) +- Bug in :meth:`.DataFrameGroupBy.size` and :meth:`.DataFrameGroupBy.transform` with ``func="size"`` produced incorrect results when ``axis=1`` (:issue:`45715`) +- Bug in :meth:`.ExponentialMovingWindow.mean` with ``axis=1`` and ``engine='numba'`` when the :class:`DataFrame` has more columns than rows (:issue:`46086`) +- Bug when using ``engine="numba"`` would return the same jitted function when modifying ``engine_kwargs`` (:issue:`46086`) +- Bug in :meth:`.DataFrameGroupby.transform` fails when ``axis=1`` and ``func`` is ``"first"`` or ``"last"`` (:issue:`45986`) +- Bug in :meth:`DataFrameGroupby.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`) +- Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`) +- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`) +- Bug in :meth:`GroupBy.cummax` with ``int64`` dtype with leading value being the smallest possible int64 (:issue:`46382`) +- Bug in :meth:`GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`) +- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`) +- + +Reshaping +^^^^^^^^^ +- Bug in :func:`concat` between a :class:`Series` with integer dtype and another with :class:`CategoricalDtype` with integer categories and containing ``NaN`` values casting to object dtype instead of ``float64`` (:issue:`45359`) +- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`) +- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`) +- Bug in concanenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`) +- Bug in :func:`concat` with identical key leads to error when indexing :class:`MultiIndex` (:issue:`46519`) +- Bug in :meth:`DataFrame.join` with a list when using suffixes to join DataFrames with duplicate column names (:issue:`46396`) +- + +Sparse +^^^^^^ +- Bug in :meth:`Series.where` and :meth:`DataFrame.where` with ``SparseDtype`` failing to retain the array's ``fill_value`` (:issue:`45691`) +- + +ExtensionArray +^^^^^^^^^^^^^^ +- Bug in :meth:`IntegerArray.searchsorted` and :meth:`FloatingArray.searchsorted` returning inconsistent results when acting on ``np.nan`` (:issue:`45255`) +- + +Styler +^^^^^^ +- Bug when attempting to apply styling functions to an empty DataFrame subset (:issue:`45313`) +- + +Metadata +^^^^^^^^ +- Fixed metadata propagation in :meth:`DataFrame.melt` (:issue:`28283`) +- Fixed metadata propagation in :meth:`DataFrame.explode` (:issue:`28283`) +- + +Other +^^^^^ + +.. ***DO NOT USE THIS SECTION*** + +- +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.contributors: + +Contributors +~~~~~~~~~~~~ diff --git a/doc/source/whatsnew/v1.5.0_BASE_28826.rst b/doc/source/whatsnew/v1.5.0_BASE_28826.rst new file mode 100644 index 0000000000000..72490fc67bcc0 --- /dev/null +++ b/doc/source/whatsnew/v1.5.0_BASE_28826.rst @@ -0,0 +1,632 @@ +.. _whatsnew_150: + +What's new in 1.5.0 (??) +------------------------ + +These are the changes in pandas 1.5.0. See :ref:`release` for a full changelog +including other versions of pandas. + +{{ header }} + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.enhancements: + +Enhancements +~~~~~~~~~~~~ + +.. _whatsnew_150.enhancements.styler: + +Styler +^^^^^^ + + - New method :meth:`.Styler.to_string` for alternative customisable output methods (:issue:`44502`) + - Added the ability to render ``border`` and ``border-{side}`` CSS properties in Excel (:issue:`42276`) + - Added a new method :meth:`.Styler.concat` which allows adding customised footer rows to visualise additional calculations on the data, e.g. totals and counts etc. (:issue:`43875`, :issue:`46186`) + - :meth:`.Styler.highlight_null` now accepts ``color`` consistently with other builtin methods and deprecates ``null_color`` although this remains backwards compatible (:issue:`45907`) + +.. _whatsnew_150.enhancements.resample_group_keys: + +Control of index with ``group_keys`` in :meth:`DataFrame.resample` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The argument ``group_keys`` has been added to the method :meth:`DataFrame.resample`. +As with :meth:`DataFrame.groupby`, this argument controls the whether each group is added +to the index in the resample when :meth:`.Resampler.apply` is used. + +.. warning:: + Not specifying the ``group_keys`` argument will retain the + previous behavior and emit a warning if the result will change + by specifying ``group_keys=False``. In a future version + of pandas, not specifying ``group_keys`` will default to + the same behavior as ``group_keys=False``. + +.. ipython:: python + + df = pd.DataFrame( + {'a': range(6)}, + index=pd.date_range("2021-01-01", periods=6, freq="8H") + ) + df.resample("D", group_keys=True).apply(lambda x: x) + df.resample("D", group_keys=False).apply(lambda x: x) + +Previously, the resulting index would depend upon the values returned by ``apply``, +as seen in the following example. + +.. code-block:: ipython + + In [1]: # pandas 1.3 + In [2]: df.resample("D").apply(lambda x: x) + Out[2]: + a + 2021-01-01 00:00:00 0 + 2021-01-01 08:00:00 1 + 2021-01-01 16:00:00 2 + 2021-01-02 00:00:00 3 + 2021-01-02 08:00:00 4 + 2021-01-02 16:00:00 5 + + In [3]: df.resample("D").apply(lambda x: x.reset_index()) + Out[3]: + index a + 2021-01-01 0 2021-01-01 00:00:00 0 + 1 2021-01-01 08:00:00 1 + 2 2021-01-01 16:00:00 2 + 2021-01-02 0 2021-01-02 00:00:00 3 + 1 2021-01-02 08:00:00 4 + 2 2021-01-02 16:00:00 5 + +.. _whatsnew_150.enhancements.other: + +Other enhancements +^^^^^^^^^^^^^^^^^^ +- :meth:`MultiIndex.to_frame` now supports the argument ``allow_duplicates`` and raises on duplicate labels if it is missing or False (:issue:`45245`) +- :class:`StringArray` now accepts array-likes containing nan-likes (``None``, ``np.nan``) for the ``values`` parameter in its constructor in addition to strings and :attr:`pandas.NA`. (:issue:`40839`) +- Improved the rendering of ``categories`` in :class:`CategoricalIndex` (:issue:`45218`) +- :meth:`to_numeric` now preserves float64 arrays when downcasting would generate values not representable in float32 (:issue:`43693`) +- :meth:`Series.reset_index` and :meth:`DataFrame.reset_index` now support the argument ``allow_duplicates`` (:issue:`44410`) +- :meth:`.GroupBy.min` and :meth:`.GroupBy.max` now supports `Numba `_ execution with the ``engine`` keyword (:issue:`45428`) +- :func:`read_csv` now supports ``defaultdict`` as a ``dtype`` parameter (:issue:`41574`) +- :meth:`DataFrame.rolling` and :meth:`Series.rolling` now support a ``step`` parameter with fixed-length windows (:issue:`15354`) +- Implemented a ``bool``-dtype :class:`Index`, passing a bool-dtype array-like to ``pd.Index`` will now retain ``bool`` dtype instead of casting to ``object`` (:issue:`45061`) +- Implemented a complex-dtype :class:`Index`, passing a complex-dtype array-like to ``pd.Index`` will now retain complex dtype instead of casting to ``object`` (:issue:`45845`) +- :class:`Series` and :class:`DataFrame` with ``IntegerDtype`` now supports bitwise operations (:issue:`34463`) +- Add ``milliseconds`` field support for :class:`~pandas.DateOffset` (:issue:`43371`) +- :meth:`DataFrame.reset_index` now accepts a ``names`` argument which renames the index names (:issue:`6878`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.notable_bug_fixes: + +Notable bug fixes +~~~~~~~~~~~~~~~~~ + +These are bug fixes that might have notable behavior changes. + +.. _whatsnew_150.notable_bug_fixes.notable_bug_fix1: + +Styler +^^^^^^ + +- Fixed bug in :class:`CSSToExcelConverter` leading to ``TypeError`` when border color provided without border style for ``xlsxwriter`` engine (:issue:`42276`) + +.. _whatsnew_150.notable_bug_fixes.groupby_transform_dropna: + +Using ``dropna=True`` with ``groupby`` transforms +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A transform is an operation whose result has the same size as its input. When the +result is a :class:`DataFrame` or :class:`Series`, it is also required that the +index of the result matches that of the input. In pandas 1.4, using +:meth:`.DataFrameGroupBy.transform` or :meth:`.SeriesGroupBy.transform` with null +values in the groups and ``dropna=True`` gave incorrect results. Demonstrated by the +examples below, the incorrect results either contained incorrect values, or the result +did not have the same index as the input. + +.. ipython:: python + + df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [2, 3, 4]}) + +*Old behavior*: + +.. code-block:: ipython + + In [3]: # Value in the last row should be np.nan + df.groupby('a', dropna=True).transform('sum') + Out[3]: + b + 0 5 + 1 5 + 2 5 + + In [3]: # Should have one additional row with the value np.nan + df.groupby('a', dropna=True).transform(lambda x: x.sum()) + Out[3]: + b + 0 5 + 1 5 + + In [3]: # The value in the last row is np.nan interpreted as an integer + df.groupby('a', dropna=True).transform('ffill') + Out[3]: + b + 0 2 + 1 3 + 2 -9223372036854775808 + + In [3]: # Should have one additional row with the value np.nan + df.groupby('a', dropna=True).transform(lambda x: x) + Out[3]: + b + 0 2 + 1 3 + +*New behavior*: + +.. ipython:: python + + df.groupby('a', dropna=True).transform('sum') + df.groupby('a', dropna=True).transform(lambda x: x.sum()) + df.groupby('a', dropna=True).transform('ffill') + df.groupby('a', dropna=True).transform(lambda x: x) + +.. _whatsnew_150.notable_bug_fixes.visualization: + +Styler +^^^^^^ + +- Fix showing "None" as ylabel in :meth:`Series.plot` when not setting ylabel (:issue:`46129`) + +.. _whatsnew_150.notable_bug_fixes.notable_bug_fix2: + +notable_bug_fix2 +^^^^^^^^^^^^^^^^ + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.api_breaking: + +Backwards incompatible API changes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. _whatsnew_150.api_breaking.read_xml_dtypes: + +read_xml now supports ``dtype``, ``converters``, and ``parse_dates`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Similar to other IO methods, :func:`pandas.read_xml` now supports assigning specific dtypes to columns, +apply converter methods, and parse dates (:issue:`43567`). + +.. ipython:: python + + xml_dates = """ + + + square + 00360 + 4.0 + 2020-01-01 + + + circle + 00360 + + 2021-01-01 + + + triangle + 00180 + 3.0 + 2022-01-01 + + """ + + df = pd.read_xml( + xml_dates, + dtype={'sides': 'Int64'}, + converters={'degrees': str}, + parse_dates=['date'] + ) + df + df.dtypes + +.. _whatsnew_150.read_xml_iterparse: + +read_xml now supports large XML using ``iterparse`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For very large XML files that can range in hundreds of megabytes to gigabytes, :func:`pandas.read_xml` +now supports parsing such sizeable files using `lxml's iterparse`_ and `etree's iterparse`_ +which are memory-efficient methods to iterate through XML trees and extract specific elements +and attributes without holding entire tree in memory (:issue:`#45442`). + +.. code-block:: ipython + + In [1]: df = pd.read_xml( + ... "/path/to/downloaded/enwikisource-latest-pages-articles.xml", + ... iterparse = {"page": ["title", "ns", "id"]}) + ... ) + df + Out[2]: + title ns id + 0 Gettysburg Address 0 21450 + 1 Main Page 0 42950 + 2 Declaration by United Nations 0 8435 + 3 Constitution of the United States of America 0 8435 + 4 Declaration of Independence (Israel) 0 17858 + ... ... ... ... + 3578760 Page:Black cat 1897 07 v2 n10.pdf/17 104 219649 + 3578761 Page:Black cat 1897 07 v2 n10.pdf/43 104 219649 + 3578762 Page:Black cat 1897 07 v2 n10.pdf/44 104 219649 + 3578763 The History of Tom Jones, a Foundling/Book IX 0 12084291 + 3578764 Page:Shakespeare of Stratford (1926) Yale.djvu/91 104 21450 + + [3578765 rows x 3 columns] + + +.. _`lxml's iterparse`: https://lxml.de/3.2/parsing.html#iterparse-and-iterwalk +.. _`etree's iterparse`: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse + +.. _whatsnew_150.api_breaking.api_breaking2: + +api_breaking_change2 +^^^^^^^^^^^^^^^^^^^^ + +.. _whatsnew_150.api_breaking.deps: + +Increased minimum versions for dependencies +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Some minimum supported versions of dependencies were updated. +If installed, we now require: + ++-----------------+-----------------+----------+---------+ +| Package | Minimum Version | Required | Changed | ++=================+=================+==========+=========+ +| mypy (dev) | 0.941 | | X | ++-----------------+-----------------+----------+---------+ + + +For `optional libraries `_ the general recommendation is to use the latest version. +The following table lists the lowest version per library that is currently being tested throughout the development of pandas. +Optional libraries below the lowest tested version may still work, but are not considered supported. + ++-----------------+-----------------+---------+ +| Package | Minimum Version | Changed | ++=================+=================+=========+ +| | | X | ++-----------------+-----------------+---------+ + +See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more. + +.. _whatsnew_150.api_breaking.other: + +Other API changes +^^^^^^^^^^^^^^^^^ + +- BigQuery I/O methods :func:`read_gbq` and :meth:`DataFrame.to_gbq` default to + ``auth_local_webserver = True``. Google has deprecated the + ``auth_local_webserver = False`` `"out of band" (copy-paste) flow + `_. + The ``auth_local_webserver = False`` option is planned to stop working in + October 2022. (:issue:`46312`) +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.deprecations: + +Deprecations +~~~~~~~~~~~~ + +.. _whatsnew_150.deprecations.int_slicing_series: + +In a future version, integer slicing on a :class:`Series` with a :class:`Int64Index` or :class:`RangeIndex` will be treated as *label-based*, not positional. This will make the behavior consistent with other :meth:`Series.__getitem__` and :meth:`Series.__setitem__` behaviors (:issue:`45162`). + +For example: + +.. ipython:: python + + ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11]) + +In the old behavior, ``ser[2:4]`` treats the slice as positional: + +*Old behavior*: + +.. code-block:: ipython + + In [3]: ser[2:4] + Out[3]: + 5 3 + 7 4 + dtype: int64 + +In a future version, this will be treated as label-based: + +*Future behavior*: + +.. code-block:: ipython + + In [4]: ser.loc[2:4] + Out[4]: + 2 1 + 3 2 + dtype: int64 + +To retain the old behavior, use ``series.iloc[i:j]``. To get the future behavior, +use ``series.loc[i:j]``. + +Slicing on a :class:`DataFrame` will not be affected. + +.. _whatsnew_150.deprecations.excel_writer_attributes: + +:class:`ExcelWriter` attributes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +All attributes of :class:`ExcelWriter` were previously documented as not +public. However some third party Excel engines documented accessing +``ExcelWriter.book`` or ``ExcelWriter.sheets``, and users were utilizing these +and possibly other attributes. Previously these attributes were not safe to use; +e.g. modifications to ``ExcelWriter.book`` would not update ``ExcelWriter.sheets`` +and conversely. In order to support this, pandas has made some attributes public +and improved their implementations so that they may now be safely used. (:issue:`45572`) + +The following attributes are now public and considered safe to access. + + - ``book`` + - ``check_extension`` + - ``close`` + - ``date_format`` + - ``datetime_format`` + - ``engine`` + - ``if_sheet_exists`` + - ``sheets`` + - ``supported_extensions`` + +The following attributes have been deprecated. They now raise a ``FutureWarning`` +when accessed and will be removed in a future version. Users should be aware +that their usage is considered unsafe, and can lead to unexpected results. + + - ``cur_sheet`` + - ``handles`` + - ``path`` + - ``save`` + - ``write_cells`` + +See the documentation of :class:`ExcelWriter` for further details. + +.. _whatsnew_150.deprecations.group_keys_in_apply: + +Using ``group_keys`` with transformers in :meth:`.GroupBy.apply` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In previous versions of pandas, if it was inferred that the function passed to +:meth:`.GroupBy.apply` was a transformer (i.e. the resulting index was equal to +the input index), the ``group_keys`` argument of :meth:`DataFrame.groupby` and +:meth:`Series.groupby` was ignored and the group keys would never be added to +the index of the result. In the future, the group keys will be added to the index +when the user specifies ``group_keys=True``. + +As ``group_keys=True`` is the default value of :meth:`DataFrame.groupby` and +:meth:`Series.groupby`, not specifying ``group_keys`` with a transformer will +raise a ``FutureWarning``. This can be silenced and the previous behavior +retained by specifying ``group_keys=False``. + +.. _whatsnew_150.deprecations.other: + +Other Deprecations +^^^^^^^^^^^^^^^^^^ +- Deprecated the keyword ``line_terminator`` in :meth:`DataFrame.to_csv` and :meth:`Series.to_csv`, use ``lineterminator`` instead; this is for consistency with :func:`read_csv` and the standard library 'csv' module (:issue:`9568`) +- Deprecated behavior of :meth:`SparseArray.astype`, :meth:`Series.astype`, and :meth:`DataFrame.astype` with :class:`SparseDtype` when passing a non-sparse ``dtype``. In a future version, this will cast to that non-sparse dtype instead of wrapping it in a :class:`SparseDtype` (:issue:`34457`) +- Deprecated behavior of :meth:`DatetimeIndex.intersection` and :meth:`DatetimeIndex.symmetric_difference` (``union`` behavior was already deprecated in version 1.3.0) with mixed time zones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`, :issue:`45357`) +- Deprecated :meth:`DataFrame.iteritems`, :meth:`Series.iteritems`, :meth:`HDFStore.iteritems` in favor of :meth:`DataFrame.items`, :meth:`Series.items`, :meth:`HDFStore.items` (:issue:`45321`) +- Deprecated :meth:`Series.is_monotonic` and :meth:`Index.is_monotonic` in favor of :meth:`Series.is_monotonic_increasing` and :meth:`Index.is_monotonic_increasing` (:issue:`45422`, :issue:`21335`) +- Deprecated behavior of :meth:`DatetimeIndex.astype`, :meth:`TimedeltaIndex.astype`, :meth:`PeriodIndex.astype` when converting to an integer dtype other than ``int64``. In a future version, these will convert to exactly the specified dtype (instead of always ``int64``) and will raise if the conversion overflows (:issue:`45034`) +- Deprecated the ``__array_wrap__`` method of DataFrame and Series, rely on standard numpy ufuncs instead (:issue:`45451`) +- Deprecated treating float-dtype data as wall-times when passed with a timezone to :class:`Series` or :class:`DatetimeIndex` (:issue:`45573`) +- Deprecated the behavior of :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``timedelta64[ns]`` dtype and incompatible fill value; in a future version this will cast to a common dtype (usually object) instead of raising, matching the behavior of other dtypes (:issue:`45746`) +- Deprecated the ``warn`` parameter in :func:`infer_freq` (:issue:`45947`) +- Deprecated allowing non-keyword arguments in :meth:`ExtensionArray.argsort` (:issue:`46134`) +- Deprecated treating all-bool ``object``-dtype columns as bool-like in :meth:`DataFrame.any` and :meth:`DataFrame.all` with ``bool_only=True``, explicitly cast to bool instead (:issue:`46188`) +- Deprecated behavior of method :meth:`DataFrame.quantile`, attribute ``numeric_only`` will default False. Including datetime/timedelta columns in the result (:issue:`7308`). +- Deprecated :attr:`Timedelta.freq` and :attr:`Timedelta.is_populated` (:issue:`46430`) +- Deprecated :attr:`Timedelta.delta` (:issue:`46476`) +- Deprecated the ``closed`` argument in :meth:`interval_range` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`) +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.performance: + +Performance improvements +~~~~~~~~~~~~~~~~~~~~~~~~ +- Performance improvement in :meth:`DataFrame.corrwith` for column-wise (axis=0) Pearson and Spearman correlation when other is a :class:`Series` (:issue:`46174`) +- Performance improvement in :meth:`.GroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`) +- Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`) +- Performance improvement in :meth:`.GroupBy.diff` (:issue:`16706`) +- Performance improvement in :meth:`.GroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`) +- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions when only a single group exists (:issue:`44977`) +- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` for tuple-based indexing of a :class:`MultiIndex` (:issue:`45681`, :issue:`46040`, :issue:`46330`) +- Performance improvement in :attr:`MultiIndex.values` when the MultiIndex contains levels of type DatetimeIndex, TimedeltaIndex or ExtensionDtypes (:issue:`46288`) +- Performance improvement in :func:`merge` when left and/or right are empty (:issue:`45838`) +- Performance improvement in :meth:`DataFrame.join` when left and/or right are empty (:issue:`46015`) +- Performance improvement in :meth:`DataFrame.reindex` and :meth:`Series.reindex` when target is a :class:`MultiIndex` (:issue:`46235`) +- Performance improvement when setting values in a pyarrow backed string array (:issue:`46400`) +- Performance improvement in :func:`factorize` (:issue:`46109`) +- Performance improvement in :class:`DataFrame` and :class:`Series` constructors for extension dtype scalars (:issue:`45854`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.bug_fixes: + +Bug fixes +- Bug in :func:`_call_cython_op` that caused numpy arrays ordered with last with pd.NA to have dytype:Int64 type rather than dtype:Bool (:issue:`46409`) + +Categorical +^^^^^^^^^^^ +- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`) +- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`) +- + +Datetimelike +^^^^^^^^^^^^ +- Bug in :meth:`DataFrame.quantile` with datetime-like dtypes and no rows incorrectly returning ``float64`` dtype instead of retaining datetime-like dtype (:issue:`41544`) +- Bug in :func:`to_datetime` with sequences of ``np.str_`` objects incorrectly raising (:issue:`32264`) +- Bug in :class:`Timestamp` construction when passing datetime components as positional arguments and ``tzinfo`` as a keyword argument incorrectly raising (:issue:`31929`) +- Bug in :meth:`Index.astype` when casting from object dtype to ``timedelta64[ns]`` dtype incorrectly casting ``np.datetime64("NaT")`` values to ``np.timedelta64("NaT")`` instead of raising (:issue:`45722`) +- Bug in :meth:`SeriesGroupBy.value_counts` index when passing categorical column (:issue:`44324`) +- Bug in :meth:`DatetimeIndex.tz_localize` localizing to UTC failing to make a copy of the underlying data (:issue:`46460`) +- + +Timedelta +^^^^^^^^^ +- Bug FIXED in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`) +- + +Time Zones +^^^^^^^^^^ +- + +Numeric +^^^^^^^ +- Bug in operations with array-likes with ``dtype="boolean"`` and :attr:`NA` incorrectly altering the array in-place (:issue:`45421`) +- Bug in division, ``pow`` and ``mod`` operations on array-likes with ``dtype="boolean"`` not being like their ``np.bool_`` counterparts (:issue:`46063`) +- Bug in multiplying a :class:`Series` with ``IntegerDtype`` or ``FloatingDtype`` by an array-like with ``timedelta64[ns]`` dtype incorrectly raising (:issue:`45622`) +- + +Conversion +^^^^^^^^^^ +- Bug in :meth:`DataFrame.astype` not preserving subclasses (:issue:`40810`) +- Bug in constructing a :class:`Series` from a float-containing list or a floating-dtype ndarray-like (e.g. ``dask.Array``) and an integer dtype raising instead of casting like we would with an ``np.ndarray`` (:issue:`40110`) +- Bug in :meth:`Float64Index.astype` to unsigned integer dtype incorrectly casting to ``np.int64`` dtype (:issue:`45309`) +- Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`) +- Bug in :func:`array` with ``FloatingDtype`` and values containing float-castable strings incorrectly raising (:issue:`45424`) +- Bug when comparing string and datetime64ns objects causing ``OverflowError`` exception. (:issue:`45506`) + +Strings +^^^^^^^ +- Bug in :meth:`str.startswith` and :meth:`str.endswith` when using other series as parameter _pat_. Now raises ``TypeError`` (:issue:`3485`) +- + +Interval +^^^^^^^^ +- Bug in :meth:`IntervalArray.__setitem__` when setting ``np.nan`` into an integer-backed array raising ``ValueError`` instead of ``TypeError`` (:issue:`45484`) +- + +Indexing +^^^^^^^^ +- Bug in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between ``frame.at[x, y]`` vs ``frame[y].loc[x]`` (:issue:`22372`) +- Bug in :meth:`DataFrame.iloc` where indexing a single row on a :class:`DataFrame` with a single ExtensionDtype column gave a copy instead of a view on the underlying data (:issue:`45241`) +- Bug in :meth:`Series.align` does not create :class:`MultiIndex` with union of levels when both MultiIndexes intersections are identical (:issue:`45224`) +- Bug in setting a NA value (``None`` or ``np.nan``) into a :class:`Series` with int-based :class:`IntervalDtype` incorrectly casting to object dtype instead of a float-based :class:`IntervalDtype` (:issue:`45568`) +- Bug in indexing setting values into an ``ExtensionDtype`` column with ``df.iloc[:, i] = values`` with ``values`` having the same dtype as ``df.iloc[:, i]`` incorrectly inserting a new array instead of setting in-place (:issue:`33457`) +- Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ``ValueError`` was raised instead of casting to a common dtype (:issue:`45070`) +- Bug in :meth:`Series.__setitem__` when setting incompatible values into a ``PeriodDtype`` or ``IntervalDtype`` :class:`Series` raising when indexing with a boolean mask but coercing when indexing with otherwise-equivalent indexers; these now consistently coerce, along with :meth:`Series.mask` and :meth:`Series.where` (:issue:`45768`) +- Bug in :meth:`DataFrame.where` with multiple columns with datetime-like dtypes failing to downcast results consistent with other dtypes (:issue:`45837`) +- Bug in :meth:`Series.loc.__setitem__` and :meth:`Series.loc.__getitem__` not raising when using multiple keys without using a :class:`MultiIndex` (:issue:`13831`) +- Bug in :meth:`Index.reindex` raising ``AssertionError`` when ``level`` was specified but no :class:`MultiIndex` was given; level is ignored now (:issue:`35132`) +- Bug when setting a value too large for a :class:`Series` dtype failing to coerce to a common type (:issue:`26049`, :issue:`32878`) +- Bug in :meth:`loc.__setitem__` treating ``range`` keys as positional instead of label-based (:issue:`45479`) +- Bug in :meth:`Series.__setitem__` when setting ``boolean`` dtype values containing ``NA`` incorrectly raising instead of casting to ``boolean`` dtype (:issue:`45462`) +- Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtpye :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as ``np.nan`` (:issue:`44199`) +- Bug in :meth:`Series.__setitem__` with ``datetime64[ns]`` dtype, an all-``False`` boolean mask, and an incompatible value incorrectly casting to ``object`` instead of retaining ``datetime64[ns]`` dtype (:issue:`45967`) +- Bug in :meth:`Index.__getitem__` raising ``ValueError`` when indexer is from boolean dtype with ``NA`` (:issue:`45806`) +- Bug in :meth:`Series.mask` with ``inplace=True`` or setting values with a boolean mask with small integer dtypes incorrectly raising (:issue:`45750`) +- Bug in :meth:`DataFrame.mask` with ``inplace=True`` and ``ExtensionDtype`` columns incorrectly raising (:issue:`45577`) +- Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`) +- Bug in :meth:`DataFrame.__getattribute__` raising ``AttributeError`` if columns have ``"string"`` dtype (:issue:`46185`) +- Bug in indexing on a :class:`DatetimeIndex` with a ``np.str_`` key incorrectly raising (:issue:`45580`) +- Bug in :meth:`CategoricalIndex.get_indexer` when index contains ``NaN`` values, resulting in elements that are in target but not present in the index to be mapped to the index of the NaN element, instead of -1 (:issue:`45361`) +- Bug in setting large integer values into :class:`Series` with ``float32`` or ``float16`` dtype incorrectly altering these values instead of coercing to ``float64`` dtype (:issue:`45844`) +- Bug in :meth:`Series.asof` and :meth:`DataFrame.asof` incorrectly casting bool-dtype results to ``float64`` dtype (:issue:`16063`) +- + +Missing +^^^^^^^ +- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``downcast`` keyword not being respected in some cases where there are no NA values present (:issue:`45423`) +- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with :class:`IntervalDtype` and incompatible value raising instead of casting to a common (usually object) dtype (:issue:`45796`) +- Bug in :meth:`DataFrame.interpolate` with object-dtype column not returning a copy with ``inplace=False`` (:issue:`45791`) +- + +MultiIndex +^^^^^^^^^^ +- Bug in :meth:`DataFrame.loc` returning empty result when slicing a :class:`MultiIndex` with a negative step size and non-null start/stop values (:issue:`46156`) +- Bug in :meth:`DataFrame.loc` raising when slicing a :class:`MultiIndex` with a negative step size other than -1 (:issue:`46156`) +- Bug in :meth:`DataFrame.loc` raising when slicing a :class:`MultiIndex` with a negative step size and slicing a non-int labeled index level (:issue:`46156`) +- Bug in :meth:`Series.to_numpy` where multiindexed Series could not be converted to numpy arrays when an ``na_value`` was supplied (:issue:`45774`) +- Bug in :class:`MultiIndex.equals` not commutative when only one side has extension array dtype (:issue:`46026`) +- Bug in :meth:`MultiIndex.from_tuples` cannot construct Index of empty tuples (:issue:`45608`) + +I/O +^^^ +- Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains ``-np.inf`` (:issue:`45350`) +- Bug in :func:`read_excel` results in an infinite loop with certain ``skiprows`` callables (:issue:`45585`) +- Bug in :meth:`DataFrame.info` where a new line at the end of the output is omitted when called on an empty :class:`DataFrame` (:issue:`45494`) +- Bug in :func:`read_csv` not recognizing line break for ``on_bad_lines="warn"`` for ``engine="c"`` (:issue:`41710`) +- Bug in :meth:`DataFrame.to_csv` not respecting ``float_format`` for ``Float64`` dtype (:issue:`45991`) +- Bug in :func:`read_csv` not respecting a specified converter to index columns in all cases (:issue:`40589`) +- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`) +- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`) +- Bug in Parquet roundtrip for Interval dtype with ``datetime64[ns]`` subtype (:issue:`45881`) +- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements (:issue:`45598`) +- Bug in :func:`read_parquet` when ``engine="fastparquet"`` where the file was not closed on error (:issue:`46555`) + +Period +^^^^^^ +- Bug in subtraction of :class:`Period` from :class:`PeriodArray` returning wrong results (:issue:`45999`) +- Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, directives ``%l`` and ``%u`` were giving wrong results (:issue:`46252`) +- + +Plotting +^^^^^^^^ +- Bug in :meth:`DataFrame.plot.barh` that prevented labeling the x-axis and ``xlabel`` updating the y-axis label (:issue:`45144`) +- Bug in :meth:`DataFrame.plot.box` that prevented labeling the x-axis (:issue:`45463`) +- Bug in :meth:`DataFrame.boxplot` that prevented passing in ``xlabel`` and ``ylabel`` (:issue:`45463`) +- Bug in :meth:`DataFrame.boxplot` that prevented specifying ``vert=False`` (:issue:`36918`) +- Bug in :meth:`DataFrame.plot.scatter` that prevented specifying ``norm`` (:issue:`45809`) + +Groupby/resample/rolling +^^^^^^^^^^^^^^^^^^^^^^^^ +- Bug in :meth:`DataFrame.resample` ignoring ``closed="right"`` on :class:`TimedeltaIndex` (:issue:`45414`) +- Bug in :meth:`.DataFrameGroupBy.transform` fails when ``func="size"`` and the input DataFrame has multiple columns (:issue:`27469`) +- Bug in :meth:`.DataFrameGroupBy.size` and :meth:`.DataFrameGroupBy.transform` with ``func="size"`` produced incorrect results when ``axis=1`` (:issue:`45715`) +- Bug in :meth:`.ExponentialMovingWindow.mean` with ``axis=1`` and ``engine='numba'`` when the :class:`DataFrame` has more columns than rows (:issue:`46086`) +- Bug when using ``engine="numba"`` would return the same jitted function when modifying ``engine_kwargs`` (:issue:`46086`) +- Bug in :meth:`.DataFrameGroupby.transform` fails when ``axis=1`` and ``func`` is ``"first"`` or ``"last"`` (:issue:`45986`) +- Bug in :meth:`DataFrameGroupby.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`) +- Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`) +- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`) +- Bug in :meth:`GroupBy.cummax` with ``int64`` dtype with leading value being the smallest possible int64 (:issue:`46382`) +- Bug in :meth:`GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`) +- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`) +- + +Reshaping +^^^^^^^^^ +- Bug in :func:`concat` between a :class:`Series` with integer dtype and another with :class:`CategoricalDtype` with integer categories and containing ``NaN`` values casting to object dtype instead of ``float64`` (:issue:`45359`) +- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`) +- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`) +- Bug in concanenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`) +- + +Sparse +^^^^^^ +- Bug in :meth:`Series.where` and :meth:`DataFrame.where` with ``SparseDtype`` failing to retain the array's ``fill_value`` (:issue:`45691`) +- + +ExtensionArray +^^^^^^^^^^^^^^ +- Bug in :meth:`IntegerArray.searchsorted` and :meth:`FloatingArray.searchsorted` returning inconsistent results when acting on ``np.nan`` (:issue:`45255`) +- + +Styler +^^^^^^ +- Bug when attempting to apply styling functions to an empty DataFrame subset (:issue:`45313`) +- + +Other +^^^^^ + +.. ***DO NOT USE THIS SECTION*** + +- +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.contributors: + +Contributors +~~~~~~~~~~~~ diff --git a/doc/source/whatsnew/v1.5.0_LOCAL_28826.rst b/doc/source/whatsnew/v1.5.0_LOCAL_28826.rst new file mode 100644 index 0000000000000..fe655c1aa4bc6 --- /dev/null +++ b/doc/source/whatsnew/v1.5.0_LOCAL_28826.rst @@ -0,0 +1,632 @@ +.. _whatsnew_150: + +What's new in 1.5.0 (??) +------------------------ + +These are the changes in pandas 1.5.0. See :ref:`release` for a full changelog +including other versions of pandas. + +{{ header }} + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.enhancements: + +Enhancements +~~~~~~~~~~~~ + +.. _whatsnew_150.enhancements.styler: + +Styler +^^^^^^ + + - New method :meth:`.Styler.to_string` for alternative customisable output methods (:issue:`44502`) + - Added the ability to render ``border`` and ``border-{side}`` CSS properties in Excel (:issue:`42276`) + - Added a new method :meth:`.Styler.concat` which allows adding customised footer rows to visualise additional calculations on the data, e.g. totals and counts etc. (:issue:`43875`, :issue:`46186`) + - :meth:`.Styler.highlight_null` now accepts ``color`` consistently with other builtin methods and deprecates ``null_color`` although this remains backwards compatible (:issue:`45907`) + +.. _whatsnew_150.enhancements.resample_group_keys: + +Control of index with ``group_keys`` in :meth:`DataFrame.resample` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The argument ``group_keys`` has been added to the method :meth:`DataFrame.resample`. +As with :meth:`DataFrame.groupby`, this argument controls the whether each group is added +to the index in the resample when :meth:`.Resampler.apply` is used. + +.. warning:: + Not specifying the ``group_keys`` argument will retain the + previous behavior and emit a warning if the result will change + by specifying ``group_keys=False``. In a future version + of pandas, not specifying ``group_keys`` will default to + the same behavior as ``group_keys=False``. + +.. ipython:: python + + df = pd.DataFrame( + {'a': range(6)}, + index=pd.date_range("2021-01-01", periods=6, freq="8H") + ) + df.resample("D", group_keys=True).apply(lambda x: x) + df.resample("D", group_keys=False).apply(lambda x: x) + +Previously, the resulting index would depend upon the values returned by ``apply``, +as seen in the following example. + +.. code-block:: ipython + + In [1]: # pandas 1.3 + In [2]: df.resample("D").apply(lambda x: x) + Out[2]: + a + 2021-01-01 00:00:00 0 + 2021-01-01 08:00:00 1 + 2021-01-01 16:00:00 2 + 2021-01-02 00:00:00 3 + 2021-01-02 08:00:00 4 + 2021-01-02 16:00:00 5 + + In [3]: df.resample("D").apply(lambda x: x.reset_index()) + Out[3]: + index a + 2021-01-01 0 2021-01-01 00:00:00 0 + 1 2021-01-01 08:00:00 1 + 2 2021-01-01 16:00:00 2 + 2021-01-02 0 2021-01-02 00:00:00 3 + 1 2021-01-02 08:00:00 4 + 2 2021-01-02 16:00:00 5 + +.. _whatsnew_150.enhancements.other: + +Other enhancements +^^^^^^^^^^^^^^^^^^ +- :meth:`MultiIndex.to_frame` now supports the argument ``allow_duplicates`` and raises on duplicate labels if it is missing or False (:issue:`45245`) +- :class:`StringArray` now accepts array-likes containing nan-likes (``None``, ``np.nan``) for the ``values`` parameter in its constructor in addition to strings and :attr:`pandas.NA`. (:issue:`40839`) +- Improved the rendering of ``categories`` in :class:`CategoricalIndex` (:issue:`45218`) +- :meth:`to_numeric` now preserves float64 arrays when downcasting would generate values not representable in float32 (:issue:`43693`) +- :meth:`Series.reset_index` and :meth:`DataFrame.reset_index` now support the argument ``allow_duplicates`` (:issue:`44410`) +- :meth:`.GroupBy.min` and :meth:`.GroupBy.max` now supports `Numba `_ execution with the ``engine`` keyword (:issue:`45428`) +- :func:`read_csv` now supports ``defaultdict`` as a ``dtype`` parameter (:issue:`41574`) +- :meth:`DataFrame.rolling` and :meth:`Series.rolling` now support a ``step`` parameter with fixed-length windows (:issue:`15354`) +- Implemented a ``bool``-dtype :class:`Index`, passing a bool-dtype array-like to ``pd.Index`` will now retain ``bool`` dtype instead of casting to ``object`` (:issue:`45061`) +- Implemented a complex-dtype :class:`Index`, passing a complex-dtype array-like to ``pd.Index`` will now retain complex dtype instead of casting to ``object`` (:issue:`45845`) +- :class:`Series` and :class:`DataFrame` with ``IntegerDtype`` now supports bitwise operations (:issue:`34463`) +- Add ``milliseconds`` field support for :class:`~pandas.DateOffset` (:issue:`43371`) +- :meth:`DataFrame.reset_index` now accepts a ``names`` argument which renames the index names (:issue:`6878`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.notable_bug_fixes: + +Notable bug fixes +~~~~~~~~~~~~~~~~~ + +These are bug fixes that might have notable behavior changes. + +.. _whatsnew_150.notable_bug_fixes.notable_bug_fix1: + +Styler +^^^^^^ + +- Fixed bug in :class:`CSSToExcelConverter` leading to ``TypeError`` when border color provided without border style for ``xlsxwriter`` engine (:issue:`42276`) + +.. _whatsnew_150.notable_bug_fixes.groupby_transform_dropna: + +Using ``dropna=True`` with ``groupby`` transforms +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A transform is an operation whose result has the same size as its input. When the +result is a :class:`DataFrame` or :class:`Series`, it is also required that the +index of the result matches that of the input. In pandas 1.4, using +:meth:`.DataFrameGroupBy.transform` or :meth:`.SeriesGroupBy.transform` with null +values in the groups and ``dropna=True`` gave incorrect results. Demonstrated by the +examples below, the incorrect results either contained incorrect values, or the result +did not have the same index as the input. + +.. ipython:: python + + df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [2, 3, 4]}) + +*Old behavior*: + +.. code-block:: ipython + + In [3]: # Value in the last row should be np.nan + df.groupby('a', dropna=True).transform('sum') + Out[3]: + b + 0 5 + 1 5 + 2 5 + + In [3]: # Should have one additional row with the value np.nan + df.groupby('a', dropna=True).transform(lambda x: x.sum()) + Out[3]: + b + 0 5 + 1 5 + + In [3]: # The value in the last row is np.nan interpreted as an integer + df.groupby('a', dropna=True).transform('ffill') + Out[3]: + b + 0 2 + 1 3 + 2 -9223372036854775808 + + In [3]: # Should have one additional row with the value np.nan + df.groupby('a', dropna=True).transform(lambda x: x) + Out[3]: + b + 0 2 + 1 3 + +*New behavior*: + +.. ipython:: python + + df.groupby('a', dropna=True).transform('sum') + df.groupby('a', dropna=True).transform(lambda x: x.sum()) + df.groupby('a', dropna=True).transform('ffill') + df.groupby('a', dropna=True).transform(lambda x: x) + +.. _whatsnew_150.notable_bug_fixes.visualization: + +Styler +^^^^^^ + +- Fix showing "None" as ylabel in :meth:`Series.plot` when not setting ylabel (:issue:`46129`) + +.. _whatsnew_150.notable_bug_fixes.notable_bug_fix2: + +notable_bug_fix2 +^^^^^^^^^^^^^^^^ + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.api_breaking: + +Backwards incompatible API changes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. _whatsnew_150.api_breaking.read_xml_dtypes: + +read_xml now supports ``dtype``, ``converters``, and ``parse_dates`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Similar to other IO methods, :func:`pandas.read_xml` now supports assigning specific dtypes to columns, +apply converter methods, and parse dates (:issue:`43567`). + +.. ipython:: python + + xml_dates = """ + + + square + 00360 + 4.0 + 2020-01-01 + + + circle + 00360 + + 2021-01-01 + + + triangle + 00180 + 3.0 + 2022-01-01 + + """ + + df = pd.read_xml( + xml_dates, + dtype={'sides': 'Int64'}, + converters={'degrees': str}, + parse_dates=['date'] + ) + df + df.dtypes + +.. _whatsnew_150.read_xml_iterparse: + +read_xml now supports large XML using ``iterparse`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For very large XML files that can range in hundreds of megabytes to gigabytes, :func:`pandas.read_xml` +now supports parsing such sizeable files using `lxml's iterparse`_ and `etree's iterparse`_ +which are memory-efficient methods to iterate through XML trees and extract specific elements +and attributes without holding entire tree in memory (:issue:`#45442`). + +.. code-block:: ipython + + In [1]: df = pd.read_xml( + ... "/path/to/downloaded/enwikisource-latest-pages-articles.xml", + ... iterparse = {"page": ["title", "ns", "id"]}) + ... ) + df + Out[2]: + title ns id + 0 Gettysburg Address 0 21450 + 1 Main Page 0 42950 + 2 Declaration by United Nations 0 8435 + 3 Constitution of the United States of America 0 8435 + 4 Declaration of Independence (Israel) 0 17858 + ... ... ... ... + 3578760 Page:Black cat 1897 07 v2 n10.pdf/17 104 219649 + 3578761 Page:Black cat 1897 07 v2 n10.pdf/43 104 219649 + 3578762 Page:Black cat 1897 07 v2 n10.pdf/44 104 219649 + 3578763 The History of Tom Jones, a Foundling/Book IX 0 12084291 + 3578764 Page:Shakespeare of Stratford (1926) Yale.djvu/91 104 21450 + + [3578765 rows x 3 columns] + + +.. _`lxml's iterparse`: https://lxml.de/3.2/parsing.html#iterparse-and-iterwalk +.. _`etree's iterparse`: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse + +.. _whatsnew_150.api_breaking.api_breaking2: + +api_breaking_change2 +^^^^^^^^^^^^^^^^^^^^ + +.. _whatsnew_150.api_breaking.deps: + +Increased minimum versions for dependencies +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Some minimum supported versions of dependencies were updated. +If installed, we now require: + ++-----------------+-----------------+----------+---------+ +| Package | Minimum Version | Required | Changed | ++=================+=================+==========+=========+ +| mypy (dev) | 0.941 | | X | ++-----------------+-----------------+----------+---------+ + + +For `optional libraries `_ the general recommendation is to use the latest version. +The following table lists the lowest version per library that is currently being tested throughout the development of pandas. +Optional libraries below the lowest tested version may still work, but are not considered supported. + ++-----------------+-----------------+---------+ +| Package | Minimum Version | Changed | ++=================+=================+=========+ +| | | X | ++-----------------+-----------------+---------+ + +See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more. + +.. _whatsnew_150.api_breaking.other: + +Other API changes +^^^^^^^^^^^^^^^^^ + +- BigQuery I/O methods :func:`read_gbq` and :meth:`DataFrame.to_gbq` default to + ``auth_local_webserver = True``. Google has deprecated the + ``auth_local_webserver = False`` `"out of band" (copy-paste) flow + `_. + The ``auth_local_webserver = False`` option is planned to stop working in + October 2022. (:issue:`46312`) +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.deprecations: + +Deprecations +~~~~~~~~~~~~ + +.. _whatsnew_150.deprecations.int_slicing_series: + +In a future version, integer slicing on a :class:`Series` with a :class:`Int64Index` or :class:`RangeIndex` will be treated as *label-based*, not positional. This will make the behavior consistent with other :meth:`Series.__getitem__` and :meth:`Series.__setitem__` behaviors (:issue:`45162`). + +For example: + +.. ipython:: python + + ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11]) + +In the old behavior, ``ser[2:4]`` treats the slice as positional: + +*Old behavior*: + +.. code-block:: ipython + + In [3]: ser[2:4] + Out[3]: + 5 3 + 7 4 + dtype: int64 + +In a future version, this will be treated as label-based: + +*Future behavior*: + +.. code-block:: ipython + + In [4]: ser.loc[2:4] + Out[4]: + 2 1 + 3 2 + dtype: int64 + +To retain the old behavior, use ``series.iloc[i:j]``. To get the future behavior, +use ``series.loc[i:j]``. + +Slicing on a :class:`DataFrame` will not be affected. + +.. _whatsnew_150.deprecations.excel_writer_attributes: + +:class:`ExcelWriter` attributes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +All attributes of :class:`ExcelWriter` were previously documented as not +public. However some third party Excel engines documented accessing +``ExcelWriter.book`` or ``ExcelWriter.sheets``, and users were utilizing these +and possibly other attributes. Previously these attributes were not safe to use; +e.g. modifications to ``ExcelWriter.book`` would not update ``ExcelWriter.sheets`` +and conversely. In order to support this, pandas has made some attributes public +and improved their implementations so that they may now be safely used. (:issue:`45572`) + +The following attributes are now public and considered safe to access. + + - ``book`` + - ``check_extension`` + - ``close`` + - ``date_format`` + - ``datetime_format`` + - ``engine`` + - ``if_sheet_exists`` + - ``sheets`` + - ``supported_extensions`` + +The following attributes have been deprecated. They now raise a ``FutureWarning`` +when accessed and will be removed in a future version. Users should be aware +that their usage is considered unsafe, and can lead to unexpected results. + + - ``cur_sheet`` + - ``handles`` + - ``path`` + - ``save`` + - ``write_cells`` + +See the documentation of :class:`ExcelWriter` for further details. + +.. _whatsnew_150.deprecations.group_keys_in_apply: + +Using ``group_keys`` with transformers in :meth:`.GroupBy.apply` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In previous versions of pandas, if it was inferred that the function passed to +:meth:`.GroupBy.apply` was a transformer (i.e. the resulting index was equal to +the input index), the ``group_keys`` argument of :meth:`DataFrame.groupby` and +:meth:`Series.groupby` was ignored and the group keys would never be added to +the index of the result. In the future, the group keys will be added to the index +when the user specifies ``group_keys=True``. + +As ``group_keys=True`` is the default value of :meth:`DataFrame.groupby` and +:meth:`Series.groupby`, not specifying ``group_keys`` with a transformer will +raise a ``FutureWarning``. This can be silenced and the previous behavior +retained by specifying ``group_keys=False``. + +.. _whatsnew_150.deprecations.other: + +Other Deprecations +^^^^^^^^^^^^^^^^^^ +- Deprecated the keyword ``line_terminator`` in :meth:`DataFrame.to_csv` and :meth:`Series.to_csv`, use ``lineterminator`` instead; this is for consistency with :func:`read_csv` and the standard library 'csv' module (:issue:`9568`) +- Deprecated behavior of :meth:`SparseArray.astype`, :meth:`Series.astype`, and :meth:`DataFrame.astype` with :class:`SparseDtype` when passing a non-sparse ``dtype``. In a future version, this will cast to that non-sparse dtype instead of wrapping it in a :class:`SparseDtype` (:issue:`34457`) +- Deprecated behavior of :meth:`DatetimeIndex.intersection` and :meth:`DatetimeIndex.symmetric_difference` (``union`` behavior was already deprecated in version 1.3.0) with mixed time zones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`, :issue:`45357`) +- Deprecated :meth:`DataFrame.iteritems`, :meth:`Series.iteritems`, :meth:`HDFStore.iteritems` in favor of :meth:`DataFrame.items`, :meth:`Series.items`, :meth:`HDFStore.items` (:issue:`45321`) +- Deprecated :meth:`Series.is_monotonic` and :meth:`Index.is_monotonic` in favor of :meth:`Series.is_monotonic_increasing` and :meth:`Index.is_monotonic_increasing` (:issue:`45422`, :issue:`21335`) +- Deprecated behavior of :meth:`DatetimeIndex.astype`, :meth:`TimedeltaIndex.astype`, :meth:`PeriodIndex.astype` when converting to an integer dtype other than ``int64``. In a future version, these will convert to exactly the specified dtype (instead of always ``int64``) and will raise if the conversion overflows (:issue:`45034`) +- Deprecated the ``__array_wrap__`` method of DataFrame and Series, rely on standard numpy ufuncs instead (:issue:`45451`) +- Deprecated treating float-dtype data as wall-times when passed with a timezone to :class:`Series` or :class:`DatetimeIndex` (:issue:`45573`) +- Deprecated the behavior of :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``timedelta64[ns]`` dtype and incompatible fill value; in a future version this will cast to a common dtype (usually object) instead of raising, matching the behavior of other dtypes (:issue:`45746`) +- Deprecated the ``warn`` parameter in :func:`infer_freq` (:issue:`45947`) +- Deprecated allowing non-keyword arguments in :meth:`ExtensionArray.argsort` (:issue:`46134`) +- Deprecated treating all-bool ``object``-dtype columns as bool-like in :meth:`DataFrame.any` and :meth:`DataFrame.all` with ``bool_only=True``, explicitly cast to bool instead (:issue:`46188`) +- Deprecated behavior of method :meth:`DataFrame.quantile`, attribute ``numeric_only`` will default False. Including datetime/timedelta columns in the result (:issue:`7308`). +- Deprecated :attr:`Timedelta.freq` and :attr:`Timedelta.is_populated` (:issue:`46430`) +- Deprecated :attr:`Timedelta.delta` (:issue:`46476`) +- Deprecated the ``closed`` argument in :meth:`interval_range` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`) +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.performance: + +Performance improvements +~~~~~~~~~~~~~~~~~~~~~~~~ +- Performance improvement in :meth:`DataFrame.corrwith` for column-wise (axis=0) Pearson and Spearman correlation when other is a :class:`Series` (:issue:`46174`) +- Performance improvement in :meth:`.GroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`) +- Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`) +- Performance improvement in :meth:`.GroupBy.diff` (:issue:`16706`) +- Performance improvement in :meth:`.GroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`) +- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions when only a single group exists (:issue:`44977`) +- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` for tuple-based indexing of a :class:`MultiIndex` (:issue:`45681`, :issue:`46040`, :issue:`46330`) +- Performance improvement in :attr:`MultiIndex.values` when the MultiIndex contains levels of type DatetimeIndex, TimedeltaIndex or ExtensionDtypes (:issue:`46288`) +- Performance improvement in :func:`merge` when left and/or right are empty (:issue:`45838`) +- Performance improvement in :meth:`DataFrame.join` when left and/or right are empty (:issue:`46015`) +- Performance improvement in :meth:`DataFrame.reindex` and :meth:`Series.reindex` when target is a :class:`MultiIndex` (:issue:`46235`) +- Performance improvement when setting values in a pyarrow backed string array (:issue:`46400`) +- Performance improvement in :func:`factorize` (:issue:`46109`) +- Performance improvement in :class:`DataFrame` and :class:`Series` constructors for extension dtype scalars (:issue:`45854`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.bug_fixes: + +Bug fixes +- Bug in :func:`groupby.last()` that caused numpy arrays ordered with last with pd.NA to have dytype:Int64 type rather than dtype:Bool (:issue:`46409`) + +Categorical +^^^^^^^^^^^ +- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`) +- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`) +- + +Datetimelike +^^^^^^^^^^^^ +- Bug in :meth:`DataFrame.quantile` with datetime-like dtypes and no rows incorrectly returning ``float64`` dtype instead of retaining datetime-like dtype (:issue:`41544`) +- Bug in :func:`to_datetime` with sequences of ``np.str_`` objects incorrectly raising (:issue:`32264`) +- Bug in :class:`Timestamp` construction when passing datetime components as positional arguments and ``tzinfo`` as a keyword argument incorrectly raising (:issue:`31929`) +- Bug in :meth:`Index.astype` when casting from object dtype to ``timedelta64[ns]`` dtype incorrectly casting ``np.datetime64("NaT")`` values to ``np.timedelta64("NaT")`` instead of raising (:issue:`45722`) +- Bug in :meth:`SeriesGroupBy.value_counts` index when passing categorical column (:issue:`44324`) +- Bug in :meth:`DatetimeIndex.tz_localize` localizing to UTC failing to make a copy of the underlying data (:issue:`46460`) +- + +Timedelta +^^^^^^^^^ +- Bug in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`) +- + +Time Zones +^^^^^^^^^^ +- + +Numeric +^^^^^^^ +- Bug in operations with array-likes with ``dtype="boolean"`` and :attr:`NA` incorrectly altering the array in-place (:issue:`45421`) +- Bug in division, ``pow`` and ``mod`` operations on array-likes with ``dtype="boolean"`` not being like their ``np.bool_`` counterparts (:issue:`46063`) +- Bug in multiplying a :class:`Series` with ``IntegerDtype`` or ``FloatingDtype`` by an array-like with ``timedelta64[ns]`` dtype incorrectly raising (:issue:`45622`) +- + +Conversion +^^^^^^^^^^ +- Bug in :meth:`DataFrame.astype` not preserving subclasses (:issue:`40810`) +- Bug in constructing a :class:`Series` from a float-containing list or a floating-dtype ndarray-like (e.g. ``dask.Array``) and an integer dtype raising instead of casting like we would with an ``np.ndarray`` (:issue:`40110`) +- Bug in :meth:`Float64Index.astype` to unsigned integer dtype incorrectly casting to ``np.int64`` dtype (:issue:`45309`) +- Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`) +- Bug in :func:`array` with ``FloatingDtype`` and values containing float-castable strings incorrectly raising (:issue:`45424`) +- Bug when comparing string and datetime64ns objects causing ``OverflowError`` exception. (:issue:`45506`) + +Strings +^^^^^^^ +- Bug in :meth:`str.startswith` and :meth:`str.endswith` when using other series as parameter _pat_. Now raises ``TypeError`` (:issue:`3485`) +- + +Interval +^^^^^^^^ +- Bug in :meth:`IntervalArray.__setitem__` when setting ``np.nan`` into an integer-backed array raising ``ValueError`` instead of ``TypeError`` (:issue:`45484`) +- + +Indexing +^^^^^^^^ +- Bug in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between ``frame.at[x, y]`` vs ``frame[y].loc[x]`` (:issue:`22372`) +- Bug in :meth:`DataFrame.iloc` where indexing a single row on a :class:`DataFrame` with a single ExtensionDtype column gave a copy instead of a view on the underlying data (:issue:`45241`) +- Bug in :meth:`Series.align` does not create :class:`MultiIndex` with union of levels when both MultiIndexes intersections are identical (:issue:`45224`) +- Bug in setting a NA value (``None`` or ``np.nan``) into a :class:`Series` with int-based :class:`IntervalDtype` incorrectly casting to object dtype instead of a float-based :class:`IntervalDtype` (:issue:`45568`) +- Bug in indexing setting values into an ``ExtensionDtype`` column with ``df.iloc[:, i] = values`` with ``values`` having the same dtype as ``df.iloc[:, i]`` incorrectly inserting a new array instead of setting in-place (:issue:`33457`) +- Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ``ValueError`` was raised instead of casting to a common dtype (:issue:`45070`) +- Bug in :meth:`Series.__setitem__` when setting incompatible values into a ``PeriodDtype`` or ``IntervalDtype`` :class:`Series` raising when indexing with a boolean mask but coercing when indexing with otherwise-equivalent indexers; these now consistently coerce, along with :meth:`Series.mask` and :meth:`Series.where` (:issue:`45768`) +- Bug in :meth:`DataFrame.where` with multiple columns with datetime-like dtypes failing to downcast results consistent with other dtypes (:issue:`45837`) +- Bug in :meth:`Series.loc.__setitem__` and :meth:`Series.loc.__getitem__` not raising when using multiple keys without using a :class:`MultiIndex` (:issue:`13831`) +- Bug in :meth:`Index.reindex` raising ``AssertionError`` when ``level`` was specified but no :class:`MultiIndex` was given; level is ignored now (:issue:`35132`) +- Bug when setting a value too large for a :class:`Series` dtype failing to coerce to a common type (:issue:`26049`, :issue:`32878`) +- Bug in :meth:`loc.__setitem__` treating ``range`` keys as positional instead of label-based (:issue:`45479`) +- Bug in :meth:`Series.__setitem__` when setting ``boolean`` dtype values containing ``NA`` incorrectly raising instead of casting to ``boolean`` dtype (:issue:`45462`) +- Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtpye :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as ``np.nan`` (:issue:`44199`) +- Bug in :meth:`Series.__setitem__` with ``datetime64[ns]`` dtype, an all-``False`` boolean mask, and an incompatible value incorrectly casting to ``object`` instead of retaining ``datetime64[ns]`` dtype (:issue:`45967`) +- Bug in :meth:`Index.__getitem__` raising ``ValueError`` when indexer is from boolean dtype with ``NA`` (:issue:`45806`) +- Bug in :meth:`Series.mask` with ``inplace=True`` or setting values with a boolean mask with small integer dtypes incorrectly raising (:issue:`45750`) +- Bug in :meth:`DataFrame.mask` with ``inplace=True`` and ``ExtensionDtype`` columns incorrectly raising (:issue:`45577`) +- Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`) +- Bug in :meth:`DataFrame.__getattribute__` raising ``AttributeError`` if columns have ``"string"`` dtype (:issue:`46185`) +- Bug in indexing on a :class:`DatetimeIndex` with a ``np.str_`` key incorrectly raising (:issue:`45580`) +- Bug in :meth:`CategoricalIndex.get_indexer` when index contains ``NaN`` values, resulting in elements that are in target but not present in the index to be mapped to the index of the NaN element, instead of -1 (:issue:`45361`) +- Bug in setting large integer values into :class:`Series` with ``float32`` or ``float16`` dtype incorrectly altering these values instead of coercing to ``float64`` dtype (:issue:`45844`) +- Bug in :meth:`Series.asof` and :meth:`DataFrame.asof` incorrectly casting bool-dtype results to ``float64`` dtype (:issue:`16063`) +- + +Missing +^^^^^^^ +- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``downcast`` keyword not being respected in some cases where there are no NA values present (:issue:`45423`) +- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with :class:`IntervalDtype` and incompatible value raising instead of casting to a common (usually object) dtype (:issue:`45796`) +- Bug in :meth:`DataFrame.interpolate` with object-dtype column not returning a copy with ``inplace=False`` (:issue:`45791`) +- + +MultiIndex +^^^^^^^^^^ +- Bug in :meth:`DataFrame.loc` returning empty result when slicing a :class:`MultiIndex` with a negative step size and non-null start/stop values (:issue:`46156`) +- Bug in :meth:`DataFrame.loc` raising when slicing a :class:`MultiIndex` with a negative step size other than -1 (:issue:`46156`) +- Bug in :meth:`DataFrame.loc` raising when slicing a :class:`MultiIndex` with a negative step size and slicing a non-int labeled index level (:issue:`46156`) +- Bug in :meth:`Series.to_numpy` where multiindexed Series could not be converted to numpy arrays when an ``na_value`` was supplied (:issue:`45774`) +- Bug in :class:`MultiIndex.equals` not commutative when only one side has extension array dtype (:issue:`46026`) +- Bug in :meth:`MultiIndex.from_tuples` cannot construct Index of empty tuples (:issue:`45608`) + +I/O +^^^ +- Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains ``-np.inf`` (:issue:`45350`) +- Bug in :func:`read_excel` results in an infinite loop with certain ``skiprows`` callables (:issue:`45585`) +- Bug in :meth:`DataFrame.info` where a new line at the end of the output is omitted when called on an empty :class:`DataFrame` (:issue:`45494`) +- Bug in :func:`read_csv` not recognizing line break for ``on_bad_lines="warn"`` for ``engine="c"`` (:issue:`41710`) +- Bug in :meth:`DataFrame.to_csv` not respecting ``float_format`` for ``Float64`` dtype (:issue:`45991`) +- Bug in :func:`read_csv` not respecting a specified converter to index columns in all cases (:issue:`40589`) +- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`) +- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`) +- Bug in Parquet roundtrip for Interval dtype with ``datetime64[ns]`` subtype (:issue:`45881`) +- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements (:issue:`45598`) +- Bug in :func:`read_parquet` when ``engine="fastparquet"`` where the file was not closed on error (:issue:`46555`) + +Period +^^^^^^ +- Bug in subtraction of :class:`Period` from :class:`PeriodArray` returning wrong results (:issue:`45999`) +- Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, directives ``%l`` and ``%u`` were giving wrong results (:issue:`46252`) +- + +Plotting +^^^^^^^^ +- Bug in :meth:`DataFrame.plot.barh` that prevented labeling the x-axis and ``xlabel`` updating the y-axis label (:issue:`45144`) +- Bug in :meth:`DataFrame.plot.box` that prevented labeling the x-axis (:issue:`45463`) +- Bug in :meth:`DataFrame.boxplot` that prevented passing in ``xlabel`` and ``ylabel`` (:issue:`45463`) +- Bug in :meth:`DataFrame.boxplot` that prevented specifying ``vert=False`` (:issue:`36918`) +- Bug in :meth:`DataFrame.plot.scatter` that prevented specifying ``norm`` (:issue:`45809`) + +Groupby/resample/rolling +^^^^^^^^^^^^^^^^^^^^^^^^ +- Bug in :meth:`DataFrame.resample` ignoring ``closed="right"`` on :class:`TimedeltaIndex` (:issue:`45414`) +- Bug in :meth:`.DataFrameGroupBy.transform` fails when ``func="size"`` and the input DataFrame has multiple columns (:issue:`27469`) +- Bug in :meth:`.DataFrameGroupBy.size` and :meth:`.DataFrameGroupBy.transform` with ``func="size"`` produced incorrect results when ``axis=1`` (:issue:`45715`) +- Bug in :meth:`.ExponentialMovingWindow.mean` with ``axis=1`` and ``engine='numba'`` when the :class:`DataFrame` has more columns than rows (:issue:`46086`) +- Bug when using ``engine="numba"`` would return the same jitted function when modifying ``engine_kwargs`` (:issue:`46086`) +- Bug in :meth:`.DataFrameGroupby.transform` fails when ``axis=1`` and ``func`` is ``"first"`` or ``"last"`` (:issue:`45986`) +- Bug in :meth:`DataFrameGroupby.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`) +- Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`) +- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`) +- Bug in :meth:`GroupBy.cummax` with ``int64`` dtype with leading value being the smallest possible int64 (:issue:`46382`) +- Bug in :meth:`GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`) +- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`) +- + +Reshaping +^^^^^^^^^ +- Bug in :func:`concat` between a :class:`Series` with integer dtype and another with :class:`CategoricalDtype` with integer categories and containing ``NaN`` values casting to object dtype instead of ``float64`` (:issue:`45359`) +- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`) +- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`) +- Bug in concanenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`) +- + +Sparse +^^^^^^ +- Bug in :meth:`Series.where` and :meth:`DataFrame.where` with ``SparseDtype`` failing to retain the array's ``fill_value`` (:issue:`45691`) +- + +ExtensionArray +^^^^^^^^^^^^^^ +- Bug in :meth:`IntegerArray.searchsorted` and :meth:`FloatingArray.searchsorted` returning inconsistent results when acting on ``np.nan`` (:issue:`45255`) +- + +Styler +^^^^^^ +- Bug when attempting to apply styling functions to an empty DataFrame subset (:issue:`45313`) +- + +Other +^^^^^ + +.. ***DO NOT USE THIS SECTION*** + +- +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.contributors: + +Contributors +~~~~~~~~~~~~ diff --git a/doc/source/whatsnew/v1.5.0_REMOTE_28826.rst b/doc/source/whatsnew/v1.5.0_REMOTE_28826.rst new file mode 100644 index 0000000000000..5fdec123ec6be --- /dev/null +++ b/doc/source/whatsnew/v1.5.0_REMOTE_28826.rst @@ -0,0 +1,647 @@ +.. _whatsnew_150: + +What's new in 1.5.0 (??) +------------------------ + +These are the changes in pandas 1.5.0. See :ref:`release` for a full changelog +including other versions of pandas. + +{{ header }} + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.enhancements: + +Enhancements +~~~~~~~~~~~~ + +.. _whatsnew_150.enhancements.styler: + +Styler +^^^^^^ + + - New method :meth:`.Styler.to_string` for alternative customisable output methods (:issue:`44502`) + - Added the ability to render ``border`` and ``border-{side}`` CSS properties in Excel (:issue:`42276`) + - Added a new method :meth:`.Styler.concat` which allows adding customised footer rows to visualise additional calculations on the data, e.g. totals and counts etc. (:issue:`43875`, :issue:`46186`) + - :meth:`.Styler.highlight_null` now accepts ``color`` consistently with other builtin methods and deprecates ``null_color`` although this remains backwards compatible (:issue:`45907`) + +.. _whatsnew_150.enhancements.resample_group_keys: + +Control of index with ``group_keys`` in :meth:`DataFrame.resample` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The argument ``group_keys`` has been added to the method :meth:`DataFrame.resample`. +As with :meth:`DataFrame.groupby`, this argument controls the whether each group is added +to the index in the resample when :meth:`.Resampler.apply` is used. + +.. warning:: + Not specifying the ``group_keys`` argument will retain the + previous behavior and emit a warning if the result will change + by specifying ``group_keys=False``. In a future version + of pandas, not specifying ``group_keys`` will default to + the same behavior as ``group_keys=False``. + +.. ipython:: python + + df = pd.DataFrame( + {'a': range(6)}, + index=pd.date_range("2021-01-01", periods=6, freq="8H") + ) + df.resample("D", group_keys=True).apply(lambda x: x) + df.resample("D", group_keys=False).apply(lambda x: x) + +Previously, the resulting index would depend upon the values returned by ``apply``, +as seen in the following example. + +.. code-block:: ipython + + In [1]: # pandas 1.3 + In [2]: df.resample("D").apply(lambda x: x) + Out[2]: + a + 2021-01-01 00:00:00 0 + 2021-01-01 08:00:00 1 + 2021-01-01 16:00:00 2 + 2021-01-02 00:00:00 3 + 2021-01-02 08:00:00 4 + 2021-01-02 16:00:00 5 + + In [3]: df.resample("D").apply(lambda x: x.reset_index()) + Out[3]: + index a + 2021-01-01 0 2021-01-01 00:00:00 0 + 1 2021-01-01 08:00:00 1 + 2 2021-01-01 16:00:00 2 + 2021-01-02 0 2021-01-02 00:00:00 3 + 1 2021-01-02 08:00:00 4 + 2 2021-01-02 16:00:00 5 + +.. _whatsnew_150.enhancements.other: + +Other enhancements +^^^^^^^^^^^^^^^^^^ +- :meth:`MultiIndex.to_frame` now supports the argument ``allow_duplicates`` and raises on duplicate labels if it is missing or False (:issue:`45245`) +- :class:`StringArray` now accepts array-likes containing nan-likes (``None``, ``np.nan``) for the ``values`` parameter in its constructor in addition to strings and :attr:`pandas.NA`. (:issue:`40839`) +- Improved the rendering of ``categories`` in :class:`CategoricalIndex` (:issue:`45218`) +- :meth:`to_numeric` now preserves float64 arrays when downcasting would generate values not representable in float32 (:issue:`43693`) +- :meth:`Series.reset_index` and :meth:`DataFrame.reset_index` now support the argument ``allow_duplicates`` (:issue:`44410`) +- :meth:`.GroupBy.min` and :meth:`.GroupBy.max` now supports `Numba `_ execution with the ``engine`` keyword (:issue:`45428`) +- :func:`read_csv` now supports ``defaultdict`` as a ``dtype`` parameter (:issue:`41574`) +- :meth:`DataFrame.rolling` and :meth:`Series.rolling` now support a ``step`` parameter with fixed-length windows (:issue:`15354`) +- Implemented a ``bool``-dtype :class:`Index`, passing a bool-dtype array-like to ``pd.Index`` will now retain ``bool`` dtype instead of casting to ``object`` (:issue:`45061`) +- Implemented a complex-dtype :class:`Index`, passing a complex-dtype array-like to ``pd.Index`` will now retain complex dtype instead of casting to ``object`` (:issue:`45845`) +- :class:`Series` and :class:`DataFrame` with ``IntegerDtype`` now supports bitwise operations (:issue:`34463`) +- Add ``milliseconds`` field support for :class:`~pandas.DateOffset` (:issue:`43371`) +- :meth:`DataFrame.reset_index` now accepts a ``names`` argument which renames the index names (:issue:`6878`) +- :meth:`pd.concat` now raises when ``levels`` is given but ``keys`` is None (:issue:`46653`) +- :meth:`pd.concat` now raises when ``levels`` contains duplicate values (:issue:`46653`) +- Added ``numeric_only`` argument to :meth:`DataFrame.corr`, :meth:`DataFrame.corrwith`, and :meth:`DataFrame.cov` (:issue:`46560`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.notable_bug_fixes: + +Notable bug fixes +~~~~~~~~~~~~~~~~~ + +These are bug fixes that might have notable behavior changes. + +.. _whatsnew_150.notable_bug_fixes.notable_bug_fix1: + +Styler +^^^^^^ + +- Fixed bug in :class:`CSSToExcelConverter` leading to ``TypeError`` when border color provided without border style for ``xlsxwriter`` engine (:issue:`42276`) + +.. _whatsnew_150.notable_bug_fixes.groupby_transform_dropna: + +Using ``dropna=True`` with ``groupby`` transforms +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +A transform is an operation whose result has the same size as its input. When the +result is a :class:`DataFrame` or :class:`Series`, it is also required that the +index of the result matches that of the input. In pandas 1.4, using +:meth:`.DataFrameGroupBy.transform` or :meth:`.SeriesGroupBy.transform` with null +values in the groups and ``dropna=True`` gave incorrect results. Demonstrated by the +examples below, the incorrect results either contained incorrect values, or the result +did not have the same index as the input. + +.. ipython:: python + + df = pd.DataFrame({'a': [1, 1, np.nan], 'b': [2, 3, 4]}) + +*Old behavior*: + +.. code-block:: ipython + + In [3]: # Value in the last row should be np.nan + df.groupby('a', dropna=True).transform('sum') + Out[3]: + b + 0 5 + 1 5 + 2 5 + + In [3]: # Should have one additional row with the value np.nan + df.groupby('a', dropna=True).transform(lambda x: x.sum()) + Out[3]: + b + 0 5 + 1 5 + + In [3]: # The value in the last row is np.nan interpreted as an integer + df.groupby('a', dropna=True).transform('ffill') + Out[3]: + b + 0 2 + 1 3 + 2 -9223372036854775808 + + In [3]: # Should have one additional row with the value np.nan + df.groupby('a', dropna=True).transform(lambda x: x) + Out[3]: + b + 0 2 + 1 3 + +*New behavior*: + +.. ipython:: python + + df.groupby('a', dropna=True).transform('sum') + df.groupby('a', dropna=True).transform(lambda x: x.sum()) + df.groupby('a', dropna=True).transform('ffill') + df.groupby('a', dropna=True).transform(lambda x: x) + +.. _whatsnew_150.notable_bug_fixes.visualization: + +Styler +^^^^^^ + +- Fix showing "None" as ylabel in :meth:`Series.plot` when not setting ylabel (:issue:`46129`) + +.. _whatsnew_150.notable_bug_fixes.notable_bug_fix2: + +notable_bug_fix2 +^^^^^^^^^^^^^^^^ + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.api_breaking: + +Backwards incompatible API changes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. _whatsnew_150.api_breaking.read_xml_dtypes: + +read_xml now supports ``dtype``, ``converters``, and ``parse_dates`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Similar to other IO methods, :func:`pandas.read_xml` now supports assigning specific dtypes to columns, +apply converter methods, and parse dates (:issue:`43567`). + +.. ipython:: python + + xml_dates = """ + + + square + 00360 + 4.0 + 2020-01-01 + + + circle + 00360 + + 2021-01-01 + + + triangle + 00180 + 3.0 + 2022-01-01 + + """ + + df = pd.read_xml( + xml_dates, + dtype={'sides': 'Int64'}, + converters={'degrees': str}, + parse_dates=['date'] + ) + df + df.dtypes + +.. _whatsnew_150.read_xml_iterparse: + +read_xml now supports large XML using ``iterparse`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +For very large XML files that can range in hundreds of megabytes to gigabytes, :func:`pandas.read_xml` +now supports parsing such sizeable files using `lxml's iterparse`_ and `etree's iterparse`_ +which are memory-efficient methods to iterate through XML trees and extract specific elements +and attributes without holding entire tree in memory (:issue:`#45442`). + +.. code-block:: ipython + + In [1]: df = pd.read_xml( + ... "/path/to/downloaded/enwikisource-latest-pages-articles.xml", + ... iterparse = {"page": ["title", "ns", "id"]}) + ... ) + df + Out[2]: + title ns id + 0 Gettysburg Address 0 21450 + 1 Main Page 0 42950 + 2 Declaration by United Nations 0 8435 + 3 Constitution of the United States of America 0 8435 + 4 Declaration of Independence (Israel) 0 17858 + ... ... ... ... + 3578760 Page:Black cat 1897 07 v2 n10.pdf/17 104 219649 + 3578761 Page:Black cat 1897 07 v2 n10.pdf/43 104 219649 + 3578762 Page:Black cat 1897 07 v2 n10.pdf/44 104 219649 + 3578763 The History of Tom Jones, a Foundling/Book IX 0 12084291 + 3578764 Page:Shakespeare of Stratford (1926) Yale.djvu/91 104 21450 + + [3578765 rows x 3 columns] + + +.. _`lxml's iterparse`: https://lxml.de/3.2/parsing.html#iterparse-and-iterwalk +.. _`etree's iterparse`: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse + +.. _whatsnew_150.api_breaking.api_breaking2: + +api_breaking_change2 +^^^^^^^^^^^^^^^^^^^^ + +.. _whatsnew_150.api_breaking.deps: + +Increased minimum versions for dependencies +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Some minimum supported versions of dependencies were updated. +If installed, we now require: + ++-----------------+-----------------+----------+---------+ +| Package | Minimum Version | Required | Changed | ++=================+=================+==========+=========+ +| mypy (dev) | 0.941 | | X | ++-----------------+-----------------+----------+---------+ + + +For `optional libraries `_ the general recommendation is to use the latest version. +The following table lists the lowest version per library that is currently being tested throughout the development of pandas. +Optional libraries below the lowest tested version may still work, but are not considered supported. + ++-----------------+-----------------+---------+ +| Package | Minimum Version | Changed | ++=================+=================+=========+ +| | | X | ++-----------------+-----------------+---------+ + +See :ref:`install.dependencies` and :ref:`install.optional_dependencies` for more. + +.. _whatsnew_150.api_breaking.other: + +Other API changes +^^^^^^^^^^^^^^^^^ + +- BigQuery I/O methods :func:`read_gbq` and :meth:`DataFrame.to_gbq` default to + ``auth_local_webserver = True``. Google has deprecated the + ``auth_local_webserver = False`` `"out of band" (copy-paste) flow + `_. + The ``auth_local_webserver = False`` option is planned to stop working in + October 2022. (:issue:`46312`) +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.deprecations: + +Deprecations +~~~~~~~~~~~~ + +.. _whatsnew_150.deprecations.int_slicing_series: + +In a future version, integer slicing on a :class:`Series` with a :class:`Int64Index` or :class:`RangeIndex` will be treated as *label-based*, not positional. This will make the behavior consistent with other :meth:`Series.__getitem__` and :meth:`Series.__setitem__` behaviors (:issue:`45162`). + +For example: + +.. ipython:: python + + ser = pd.Series([1, 2, 3, 4, 5], index=[2, 3, 5, 7, 11]) + +In the old behavior, ``ser[2:4]`` treats the slice as positional: + +*Old behavior*: + +.. code-block:: ipython + + In [3]: ser[2:4] + Out[3]: + 5 3 + 7 4 + dtype: int64 + +In a future version, this will be treated as label-based: + +*Future behavior*: + +.. code-block:: ipython + + In [4]: ser.loc[2:4] + Out[4]: + 2 1 + 3 2 + dtype: int64 + +To retain the old behavior, use ``series.iloc[i:j]``. To get the future behavior, +use ``series.loc[i:j]``. + +Slicing on a :class:`DataFrame` will not be affected. + +.. _whatsnew_150.deprecations.excel_writer_attributes: + +:class:`ExcelWriter` attributes +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +All attributes of :class:`ExcelWriter` were previously documented as not +public. However some third party Excel engines documented accessing +``ExcelWriter.book`` or ``ExcelWriter.sheets``, and users were utilizing these +and possibly other attributes. Previously these attributes were not safe to use; +e.g. modifications to ``ExcelWriter.book`` would not update ``ExcelWriter.sheets`` +and conversely. In order to support this, pandas has made some attributes public +and improved their implementations so that they may now be safely used. (:issue:`45572`) + +The following attributes are now public and considered safe to access. + + - ``book`` + - ``check_extension`` + - ``close`` + - ``date_format`` + - ``datetime_format`` + - ``engine`` + - ``if_sheet_exists`` + - ``sheets`` + - ``supported_extensions`` + +The following attributes have been deprecated. They now raise a ``FutureWarning`` +when accessed and will be removed in a future version. Users should be aware +that their usage is considered unsafe, and can lead to unexpected results. + + - ``cur_sheet`` + - ``handles`` + - ``path`` + - ``save`` + - ``write_cells`` + +See the documentation of :class:`ExcelWriter` for further details. + +.. _whatsnew_150.deprecations.group_keys_in_apply: + +Using ``group_keys`` with transformers in :meth:`.GroupBy.apply` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In previous versions of pandas, if it was inferred that the function passed to +:meth:`.GroupBy.apply` was a transformer (i.e. the resulting index was equal to +the input index), the ``group_keys`` argument of :meth:`DataFrame.groupby` and +:meth:`Series.groupby` was ignored and the group keys would never be added to +the index of the result. In the future, the group keys will be added to the index +when the user specifies ``group_keys=True``. + +As ``group_keys=True`` is the default value of :meth:`DataFrame.groupby` and +:meth:`Series.groupby`, not specifying ``group_keys`` with a transformer will +raise a ``FutureWarning``. This can be silenced and the previous behavior +retained by specifying ``group_keys=False``. + +.. _whatsnew_150.deprecations.other: + +Other Deprecations +^^^^^^^^^^^^^^^^^^ +- Deprecated the keyword ``line_terminator`` in :meth:`DataFrame.to_csv` and :meth:`Series.to_csv`, use ``lineterminator`` instead; this is for consistency with :func:`read_csv` and the standard library 'csv' module (:issue:`9568`) +- Deprecated behavior of :meth:`SparseArray.astype`, :meth:`Series.astype`, and :meth:`DataFrame.astype` with :class:`SparseDtype` when passing a non-sparse ``dtype``. In a future version, this will cast to that non-sparse dtype instead of wrapping it in a :class:`SparseDtype` (:issue:`34457`) +- Deprecated behavior of :meth:`DatetimeIndex.intersection` and :meth:`DatetimeIndex.symmetric_difference` (``union`` behavior was already deprecated in version 1.3.0) with mixed time zones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`, :issue:`45357`) +- Deprecated :meth:`DataFrame.iteritems`, :meth:`Series.iteritems`, :meth:`HDFStore.iteritems` in favor of :meth:`DataFrame.items`, :meth:`Series.items`, :meth:`HDFStore.items` (:issue:`45321`) +- Deprecated :meth:`Series.is_monotonic` and :meth:`Index.is_monotonic` in favor of :meth:`Series.is_monotonic_increasing` and :meth:`Index.is_monotonic_increasing` (:issue:`45422`, :issue:`21335`) +- Deprecated behavior of :meth:`DatetimeIndex.astype`, :meth:`TimedeltaIndex.astype`, :meth:`PeriodIndex.astype` when converting to an integer dtype other than ``int64``. In a future version, these will convert to exactly the specified dtype (instead of always ``int64``) and will raise if the conversion overflows (:issue:`45034`) +- Deprecated the ``__array_wrap__`` method of DataFrame and Series, rely on standard numpy ufuncs instead (:issue:`45451`) +- Deprecated treating float-dtype data as wall-times when passed with a timezone to :class:`Series` or :class:`DatetimeIndex` (:issue:`45573`) +- Deprecated the behavior of :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``timedelta64[ns]`` dtype and incompatible fill value; in a future version this will cast to a common dtype (usually object) instead of raising, matching the behavior of other dtypes (:issue:`45746`) +- Deprecated the ``warn`` parameter in :func:`infer_freq` (:issue:`45947`) +- Deprecated allowing non-keyword arguments in :meth:`ExtensionArray.argsort` (:issue:`46134`) +- Deprecated treating all-bool ``object``-dtype columns as bool-like in :meth:`DataFrame.any` and :meth:`DataFrame.all` with ``bool_only=True``, explicitly cast to bool instead (:issue:`46188`) +- Deprecated behavior of method :meth:`DataFrame.quantile`, attribute ``numeric_only`` will default False. Including datetime/timedelta columns in the result (:issue:`7308`). +- Deprecated :attr:`Timedelta.freq` and :attr:`Timedelta.is_populated` (:issue:`46430`) +- Deprecated :attr:`Timedelta.delta` (:issue:`46476`) +- Deprecated passing arguments as positional in :meth:`DataFrame.any` and :meth:`Series.any` (:issue:`44802`) +- Deprecated the ``closed`` argument in :meth:`interval_range` in favor of ``inclusive`` argument; In a future version passing ``closed`` will raise (:issue:`40245`) +- Deprecated the methods :meth:`DataFrame.mad`, :meth:`Series.mad`, and the corresponding groupby methods (:issue:`11787`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.performance: + +Performance improvements +~~~~~~~~~~~~~~~~~~~~~~~~ +- Performance improvement in :meth:`DataFrame.corrwith` for column-wise (axis=0) Pearson and Spearman correlation when other is a :class:`Series` (:issue:`46174`) +- Performance improvement in :meth:`.GroupBy.transform` for some user-defined DataFrame -> Series functions (:issue:`45387`) +- Performance improvement in :meth:`DataFrame.duplicated` when subset consists of only one column (:issue:`45236`) +- Performance improvement in :meth:`.GroupBy.diff` (:issue:`16706`) +- Performance improvement in :meth:`.GroupBy.transform` when broadcasting values for user-defined functions (:issue:`45708`) +- Performance improvement in :meth:`.GroupBy.transform` for user-defined functions when only a single group exists (:issue:`44977`) +- Performance improvement in :meth:`DataFrame.loc` and :meth:`Series.loc` for tuple-based indexing of a :class:`MultiIndex` (:issue:`45681`, :issue:`46040`, :issue:`46330`) +- Performance improvement in :attr:`MultiIndex.values` when the MultiIndex contains levels of type DatetimeIndex, TimedeltaIndex or ExtensionDtypes (:issue:`46288`) +- Performance improvement in :func:`merge` when left and/or right are empty (:issue:`45838`) +- Performance improvement in :meth:`DataFrame.join` when left and/or right are empty (:issue:`46015`) +- Performance improvement in :meth:`DataFrame.reindex` and :meth:`Series.reindex` when target is a :class:`MultiIndex` (:issue:`46235`) +- Performance improvement when setting values in a pyarrow backed string array (:issue:`46400`) +- Performance improvement in :func:`factorize` (:issue:`46109`) +- Performance improvement in :class:`DataFrame` and :class:`Series` constructors for extension dtype scalars (:issue:`45854`) + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.bug_fixes: + +Bug fixes +- Bug in :func:`_call_cython_op` that caused numpy arrays ordered with last with pd.NA to have dytype:Int64 type rather than dtype:Bool (:issue:`46409`) + +Categorical +^^^^^^^^^^^ +- Bug in :meth:`Categorical.view` not accepting integer dtypes (:issue:`25464`) +- Bug in :meth:`CategoricalIndex.union` when the index's categories are integer-dtype and the index contains ``NaN`` values incorrectly raising instead of casting to ``float64`` (:issue:`45362`) +- + +Datetimelike +^^^^^^^^^^^^ +- Bug in :meth:`DataFrame.quantile` with datetime-like dtypes and no rows incorrectly returning ``float64`` dtype instead of retaining datetime-like dtype (:issue:`41544`) +- Bug in :func:`to_datetime` with sequences of ``np.str_`` objects incorrectly raising (:issue:`32264`) +- Bug in :class:`Timestamp` construction when passing datetime components as positional arguments and ``tzinfo`` as a keyword argument incorrectly raising (:issue:`31929`) +- Bug in :meth:`Index.astype` when casting from object dtype to ``timedelta64[ns]`` dtype incorrectly casting ``np.datetime64("NaT")`` values to ``np.timedelta64("NaT")`` instead of raising (:issue:`45722`) +- Bug in :meth:`SeriesGroupBy.value_counts` index when passing categorical column (:issue:`44324`) +- Bug in :meth:`DatetimeIndex.tz_localize` localizing to UTC failing to make a copy of the underlying data (:issue:`46460`) +- + +Timedelta +^^^^^^^^^ +- Bug FIXED in :func:`astype_nansafe` astype("timedelta64[ns]") fails when np.nan is included (:issue:`45798`) +- + +Time Zones +^^^^^^^^^^ +- + +Numeric +^^^^^^^ +- Bug in operations with array-likes with ``dtype="boolean"`` and :attr:`NA` incorrectly altering the array in-place (:issue:`45421`) +- Bug in division, ``pow`` and ``mod`` operations on array-likes with ``dtype="boolean"`` not being like their ``np.bool_`` counterparts (:issue:`46063`) +- Bug in multiplying a :class:`Series` with ``IntegerDtype`` or ``FloatingDtype`` by an array-like with ``timedelta64[ns]`` dtype incorrectly raising (:issue:`45622`) +- + +Conversion +^^^^^^^^^^ +- Bug in :meth:`DataFrame.astype` not preserving subclasses (:issue:`40810`) +- Bug in constructing a :class:`Series` from a float-containing list or a floating-dtype ndarray-like (e.g. ``dask.Array``) and an integer dtype raising instead of casting like we would with an ``np.ndarray`` (:issue:`40110`) +- Bug in :meth:`Float64Index.astype` to unsigned integer dtype incorrectly casting to ``np.int64`` dtype (:issue:`45309`) +- Bug in :meth:`Series.astype` and :meth:`DataFrame.astype` from floating dtype to unsigned integer dtype failing to raise in the presence of negative values (:issue:`45151`) +- Bug in :func:`array` with ``FloatingDtype`` and values containing float-castable strings incorrectly raising (:issue:`45424`) +- Bug when comparing string and datetime64ns objects causing ``OverflowError`` exception. (:issue:`45506`) +- Bug in metaclass of generic abstract dtypes causing :meth:`DataFrame.apply` and :meth:`Series.apply` to raise for the built-in function ``type`` (:issue:`46684`) + +Strings +^^^^^^^ +- Bug in :meth:`str.startswith` and :meth:`str.endswith` when using other series as parameter _pat_. Now raises ``TypeError`` (:issue:`3485`) +- + +Interval +^^^^^^^^ +- Bug in :meth:`IntervalArray.__setitem__` when setting ``np.nan`` into an integer-backed array raising ``ValueError`` instead of ``TypeError`` (:issue:`45484`) +- + +Indexing +^^^^^^^^ +- Bug in :meth:`loc.__getitem__` with a list of keys causing an internal inconsistency that could lead to a disconnect between ``frame.at[x, y]`` vs ``frame[y].loc[x]`` (:issue:`22372`) +- Bug in :meth:`DataFrame.iloc` where indexing a single row on a :class:`DataFrame` with a single ExtensionDtype column gave a copy instead of a view on the underlying data (:issue:`45241`) +- Bug in :meth:`Series.align` does not create :class:`MultiIndex` with union of levels when both MultiIndexes intersections are identical (:issue:`45224`) +- Bug in setting a NA value (``None`` or ``np.nan``) into a :class:`Series` with int-based :class:`IntervalDtype` incorrectly casting to object dtype instead of a float-based :class:`IntervalDtype` (:issue:`45568`) +- Bug in indexing setting values into an ``ExtensionDtype`` column with ``df.iloc[:, i] = values`` with ``values`` having the same dtype as ``df.iloc[:, i]`` incorrectly inserting a new array instead of setting in-place (:issue:`33457`) +- Bug in :meth:`Series.__setitem__` with a non-integer :class:`Index` when using an integer key to set a value that cannot be set inplace where a ``ValueError`` was raised instead of casting to a common dtype (:issue:`45070`) +- Bug in :meth:`Series.__setitem__` when setting incompatible values into a ``PeriodDtype`` or ``IntervalDtype`` :class:`Series` raising when indexing with a boolean mask but coercing when indexing with otherwise-equivalent indexers; these now consistently coerce, along with :meth:`Series.mask` and :meth:`Series.where` (:issue:`45768`) +- Bug in :meth:`DataFrame.where` with multiple columns with datetime-like dtypes failing to downcast results consistent with other dtypes (:issue:`45837`) +- Bug in :meth:`Series.loc.__setitem__` and :meth:`Series.loc.__getitem__` not raising when using multiple keys without using a :class:`MultiIndex` (:issue:`13831`) +- Bug in :meth:`Index.reindex` raising ``AssertionError`` when ``level`` was specified but no :class:`MultiIndex` was given; level is ignored now (:issue:`35132`) +- Bug when setting a value too large for a :class:`Series` dtype failing to coerce to a common type (:issue:`26049`, :issue:`32878`) +- Bug in :meth:`loc.__setitem__` treating ``range`` keys as positional instead of label-based (:issue:`45479`) +- Bug in :meth:`Series.__setitem__` when setting ``boolean`` dtype values containing ``NA`` incorrectly raising instead of casting to ``boolean`` dtype (:issue:`45462`) +- Bug in :meth:`Series.__setitem__` where setting :attr:`NA` into a numeric-dtpye :class:`Series` would incorrectly upcast to object-dtype rather than treating the value as ``np.nan`` (:issue:`44199`) +- Bug in :meth:`Series.__setitem__` with ``datetime64[ns]`` dtype, an all-``False`` boolean mask, and an incompatible value incorrectly casting to ``object`` instead of retaining ``datetime64[ns]`` dtype (:issue:`45967`) +- Bug in :meth:`Index.__getitem__` raising ``ValueError`` when indexer is from boolean dtype with ``NA`` (:issue:`45806`) +- Bug in :meth:`Series.mask` with ``inplace=True`` or setting values with a boolean mask with small integer dtypes incorrectly raising (:issue:`45750`) +- Bug in :meth:`DataFrame.mask` with ``inplace=True`` and ``ExtensionDtype`` columns incorrectly raising (:issue:`45577`) +- Bug in getting a column from a DataFrame with an object-dtype row index with datetime-like values: the resulting Series now preserves the exact object-dtype Index from the parent DataFrame (:issue:`42950`) +- Bug in :meth:`DataFrame.__getattribute__` raising ``AttributeError`` if columns have ``"string"`` dtype (:issue:`46185`) +- Bug in indexing on a :class:`DatetimeIndex` with a ``np.str_`` key incorrectly raising (:issue:`45580`) +- Bug in :meth:`CategoricalIndex.get_indexer` when index contains ``NaN`` values, resulting in elements that are in target but not present in the index to be mapped to the index of the NaN element, instead of -1 (:issue:`45361`) +- Bug in setting large integer values into :class:`Series` with ``float32`` or ``float16`` dtype incorrectly altering these values instead of coercing to ``float64`` dtype (:issue:`45844`) +- Bug in :meth:`Series.asof` and :meth:`DataFrame.asof` incorrectly casting bool-dtype results to ``float64`` dtype (:issue:`16063`) +- + +Missing +^^^^^^^ +- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with ``downcast`` keyword not being respected in some cases where there are no NA values present (:issue:`45423`) +- Bug in :meth:`Series.fillna` and :meth:`DataFrame.fillna` with :class:`IntervalDtype` and incompatible value raising instead of casting to a common (usually object) dtype (:issue:`45796`) +- Bug in :meth:`DataFrame.interpolate` with object-dtype column not returning a copy with ``inplace=False`` (:issue:`45791`) +- + +MultiIndex +^^^^^^^^^^ +- Bug in :meth:`DataFrame.loc` returning empty result when slicing a :class:`MultiIndex` with a negative step size and non-null start/stop values (:issue:`46156`) +- Bug in :meth:`DataFrame.loc` raising when slicing a :class:`MultiIndex` with a negative step size other than -1 (:issue:`46156`) +- Bug in :meth:`DataFrame.loc` raising when slicing a :class:`MultiIndex` with a negative step size and slicing a non-int labeled index level (:issue:`46156`) +- Bug in :meth:`Series.to_numpy` where multiindexed Series could not be converted to numpy arrays when an ``na_value`` was supplied (:issue:`45774`) +- Bug in :class:`MultiIndex.equals` not commutative when only one side has extension array dtype (:issue:`46026`) +- Bug in :meth:`MultiIndex.from_tuples` cannot construct Index of empty tuples (:issue:`45608`) + +I/O +^^^ +- Bug in :meth:`DataFrame.to_stata` where no error is raised if the :class:`DataFrame` contains ``-np.inf`` (:issue:`45350`) +- Bug in :func:`read_excel` results in an infinite loop with certain ``skiprows`` callables (:issue:`45585`) +- Bug in :meth:`DataFrame.info` where a new line at the end of the output is omitted when called on an empty :class:`DataFrame` (:issue:`45494`) +- Bug in :func:`read_csv` not recognizing line break for ``on_bad_lines="warn"`` for ``engine="c"`` (:issue:`41710`) +- Bug in :meth:`DataFrame.to_csv` not respecting ``float_format`` for ``Float64`` dtype (:issue:`45991`) +- Bug in :func:`read_csv` not respecting a specified converter to index columns in all cases (:issue:`40589`) +- Bug in :func:`read_parquet` when ``engine="pyarrow"`` which caused partial write to disk when column of unsupported datatype was passed (:issue:`44914`) +- Bug in :func:`DataFrame.to_excel` and :class:`ExcelWriter` would raise when writing an empty DataFrame to a ``.ods`` file (:issue:`45793`) +- Bug in :func:`read_html` where elements surrounding ``
`` were joined without a space between them (:issue:`29528`) +- Bug in Parquet roundtrip for Interval dtype with ``datetime64[ns]`` subtype (:issue:`45881`) +- Bug in :func:`read_excel` when reading a ``.ods`` file with newlines between xml elements (:issue:`45598`) +- Bug in :func:`read_parquet` when ``engine="fastparquet"`` where the file was not closed on error (:issue:`46555`) + +Period +^^^^^^ +- Bug in subtraction of :class:`Period` from :class:`PeriodArray` returning wrong results (:issue:`45999`) +- Bug in :meth:`Period.strftime` and :meth:`PeriodIndex.strftime`, directives ``%l`` and ``%u`` were giving wrong results (:issue:`46252`) +- + +Plotting +^^^^^^^^ +- Bug in :meth:`DataFrame.plot.barh` that prevented labeling the x-axis and ``xlabel`` updating the y-axis label (:issue:`45144`) +- Bug in :meth:`DataFrame.plot.box` that prevented labeling the x-axis (:issue:`45463`) +- Bug in :meth:`DataFrame.boxplot` that prevented passing in ``xlabel`` and ``ylabel`` (:issue:`45463`) +- Bug in :meth:`DataFrame.boxplot` that prevented specifying ``vert=False`` (:issue:`36918`) +- Bug in :meth:`DataFrame.plot.scatter` that prevented specifying ``norm`` (:issue:`45809`) +- The function :meth:`DataFrame.plot.scatter` now accepts ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` for consistency to other plotting functions (:issue:`44670`) + +Groupby/resample/rolling +^^^^^^^^^^^^^^^^^^^^^^^^ +- Bug in :meth:`DataFrame.resample` ignoring ``closed="right"`` on :class:`TimedeltaIndex` (:issue:`45414`) +- Bug in :meth:`.DataFrameGroupBy.transform` fails when ``func="size"`` and the input DataFrame has multiple columns (:issue:`27469`) +- Bug in :meth:`.DataFrameGroupBy.size` and :meth:`.DataFrameGroupBy.transform` with ``func="size"`` produced incorrect results when ``axis=1`` (:issue:`45715`) +- Bug in :meth:`.ExponentialMovingWindow.mean` with ``axis=1`` and ``engine='numba'`` when the :class:`DataFrame` has more columns than rows (:issue:`46086`) +- Bug when using ``engine="numba"`` would return the same jitted function when modifying ``engine_kwargs`` (:issue:`46086`) +- Bug in :meth:`.DataFrameGroupby.transform` fails when ``axis=1`` and ``func`` is ``"first"`` or ``"last"`` (:issue:`45986`) +- Bug in :meth:`DataFrameGroupby.cumsum` with ``skipna=False`` giving incorrect results (:issue:`46216`) +- Bug in :meth:`.GroupBy.cumsum` with ``timedelta64[ns]`` dtype failing to recognize ``NaT`` as a null value (:issue:`46216`) +- Bug in :meth:`GroupBy.cummin` and :meth:`GroupBy.cummax` with nullable dtypes incorrectly altering the original data in place (:issue:`46220`) +- Bug in :meth:`GroupBy.cummax` with ``int64`` dtype with leading value being the smallest possible int64 (:issue:`46382`) +- Bug in :meth:`GroupBy.max` with empty groups and ``uint64`` dtype incorrectly raising ``RuntimeError`` (:issue:`46408`) +- Bug in :meth:`.GroupBy.apply` would fail when ``func`` was a string and args or kwargs were supplied (:issue:`46479`) +- + +Reshaping +^^^^^^^^^ +- Bug in :func:`concat` between a :class:`Series` with integer dtype and another with :class:`CategoricalDtype` with integer categories and containing ``NaN`` values casting to object dtype instead of ``float64`` (:issue:`45359`) +- Bug in :func:`get_dummies` that selected object and categorical dtypes but not string (:issue:`44965`) +- Bug in :meth:`DataFrame.align` when aligning a :class:`MultiIndex` to a :class:`Series` with another :class:`MultiIndex` (:issue:`46001`) +- Bug in concanenation with ``IntegerDtype``, or ``FloatingDtype`` arrays where the resulting dtype did not mirror the behavior of the non-nullable dtypes (:issue:`46379`) +- Bug in :func:`concat` with identical key leads to error when indexing :class:`MultiIndex` (:issue:`46519`) +- Bug in :meth:`DataFrame.join` with a list when using suffixes to join DataFrames with duplicate column names (:issue:`46396`) +- + +Sparse +^^^^^^ +- Bug in :meth:`Series.where` and :meth:`DataFrame.where` with ``SparseDtype`` failing to retain the array's ``fill_value`` (:issue:`45691`) +- + +ExtensionArray +^^^^^^^^^^^^^^ +- Bug in :meth:`IntegerArray.searchsorted` and :meth:`FloatingArray.searchsorted` returning inconsistent results when acting on ``np.nan`` (:issue:`45255`) +- + +Styler +^^^^^^ +- Bug when attempting to apply styling functions to an empty DataFrame subset (:issue:`45313`) +- + +Metadata +^^^^^^^^ +- Fixed metadata propagation in :meth:`DataFrame.melt` (:issue:`28283`) +- Fixed metadata propagation in :meth:`DataFrame.explode` (:issue:`28283`) +- + +Other +^^^^^ + +.. ***DO NOT USE THIS SECTION*** + +- +- + +.. --------------------------------------------------------------------------- +.. _whatsnew_150.contributors: + +Contributors +~~~~~~~~~~~~ diff --git a/pandas/core/groupby/ops.py b/pandas/core/groupby/ops.py index 09954bd6be4e4..42364bebfd659 100644 --- a/pandas/core/groupby/ops.py +++ b/pandas/core/groupby/ops.py @@ -571,6 +571,9 @@ def _call_cython_op( **kwargs, ) + if self.how == "last" and dtype == bool: + result = values + if self.kind == "aggregate": # i.e. counts is defined. Locations where count