Skip to content

Commit 657f9ce

Browse files
committed
Merge remote-tracking branch 'upstream/master' into deprecate-nonkeyword-args-reset_index
2 parents 5bdd6b7 + 3457359 commit 657f9ce

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

78 files changed

+1553
-867
lines changed

asv_bench/benchmarks/io/style.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -20,19 +20,19 @@ def setup(self, cols, rows):
2020

2121
def time_apply_render(self, cols, rows):
2222
self._style_apply()
23-
self.st._render_html()
23+
self.st._render_html(True, True)
2424

2525
def peakmem_apply_render(self, cols, rows):
2626
self._style_apply()
27-
self.st._render_html()
27+
self.st._render_html(True, True)
2828

2929
def time_classes_render(self, cols, rows):
3030
self._style_classes()
31-
self.st._render_html()
31+
self.st._render_html(True, True)
3232

3333
def peakmem_classes_render(self, cols, rows):
3434
self._style_classes()
35-
self.st._render_html()
35+
self.st._render_html(True, True)
3636

3737
def time_format_render(self, cols, rows):
3838
self._style_format()

ci/deps/actions-37-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ dependencies:
1717
- bottleneck=1.2.1
1818
- jinja2=2.10
1919
- numba=0.46.0
20-
- numexpr=2.6.8
20+
- numexpr=2.7.0
2121
- numpy=1.17.3
2222
- openpyxl=3.0.0
2323
- pytables=3.5.1

doc/source/getting_started/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -234,7 +234,7 @@ Recommended dependencies
234234

235235
* `numexpr <https://github.com/pydata/numexpr>`__: for accelerating certain numerical operations.
236236
``numexpr`` uses multiple cores as well as smart chunking and caching to achieve large speedups.
237-
If installed, must be Version 2.6.8 or higher.
237+
If installed, must be Version 2.7.0 or higher.
238238

239239
* `bottleneck <https://github.com/pydata/bottleneck>`__: for accelerating certain types of ``nan``
240240
evaluations. ``bottleneck`` uses specialized cython routines to achieve large speedups. If installed,

doc/source/user_guide/options.rst

+5
Original file line numberDiff line numberDiff line change
@@ -482,6 +482,11 @@ plotting.backend matplotlib Change the plotting backend
482482
like Bokeh, Altair, etc.
483483
plotting.matplotlib.register_converters True Register custom converters with
484484
matplotlib. Set to False to de-register.
485+
styler.sparse.index True "Sparsify" MultiIndex display for rows
486+
in Styler output (don't display repeated
487+
elements in outer levels within groups).
488+
styler.sparse.columns True "Sparsify" MultiIndex display for columns
489+
in Styler output.
485490
======================================= ============ ==================================
486491

487492

doc/source/whatsnew/v1.2.0.rst

+3
Original file line numberDiff line numberDiff line change
@@ -381,6 +381,7 @@ this pathological behavior (:issue:`37827`):
381381
*New behavior*:
382382

383383
.. ipython:: python
384+
:okwarning:
384385
385386
df.mean()
386387
@@ -394,6 +395,7 @@ instead of casting to a NumPy array which may have different semantics (:issue:`
394395
:issue:`28949`, :issue:`21020`).
395396

396397
.. ipython:: python
398+
:okwarning:
397399
398400
ser = pd.Series([0, 1], dtype="category", name="A")
399401
df = ser.to_frame()
@@ -411,6 +413,7 @@ instead of casting to a NumPy array which may have different semantics (:issue:`
411413
*New behavior*:
412414

413415
.. ipython:: python
416+
:okwarning:
414417
415418
df.any()
416419

doc/source/whatsnew/v1.3.0.rst

+82-6
Original file line numberDiff line numberDiff line change
@@ -139,6 +139,7 @@ precision, and perform HTML escaping (:issue:`40437` :issue:`40134`). There have
139139
properly format HTML and eliminate some inconsistencies (:issue:`39942` :issue:`40356` :issue:`39807` :issue:`39889` :issue:`39627`)
140140

141141
:class:`.Styler` has also been compatible with non-unique index or columns, at least for as many features as are fully compatible, others made only partially compatible (:issue:`41269`).
142+
One also has greater control of the display through separate sparsification of the index or columns, using the new 'styler' options context (:issue:`41142`).
142143

143144
Documentation has also seen major revisions in light of new features (:issue:`39720` :issue:`39317` :issue:`40493`)
144145

@@ -197,7 +198,7 @@ Other enhancements
197198
- Improved integer type mapping from pandas to SQLAlchemy when using :meth:`DataFrame.to_sql` (:issue:`35076`)
198199
- :func:`to_numeric` now supports downcasting of nullable ``ExtensionDtype`` objects (:issue:`33013`)
199200
- Add support for dict-like names in :class:`MultiIndex.set_names` and :class:`MultiIndex.rename` (:issue:`20421`)
200-
- :func:`pandas.read_excel` can now auto detect .xlsb files (:issue:`35416`)
201+
- :func:`pandas.read_excel` can now auto detect .xlsb files and older .xls files (:issue:`35416`, :issue:`41225`)
201202
- :class:`pandas.ExcelWriter` now accepts an ``if_sheet_exists`` parameter to control the behaviour of append mode when writing to existing sheets (:issue:`40230`)
202203
- :meth:`.Rolling.sum`, :meth:`.Expanding.sum`, :meth:`.Rolling.mean`, :meth:`.Expanding.mean`, :meth:`.ExponentialMovingWindow.mean`, :meth:`.Rolling.median`, :meth:`.Expanding.median`, :meth:`.Rolling.max`, :meth:`.Expanding.max`, :meth:`.Rolling.min`, and :meth:`.Expanding.min` now support ``Numba`` execution with the ``engine`` keyword (:issue:`38895`, :issue:`41267`)
203204
- :meth:`DataFrame.apply` can now accept NumPy unary operators as strings, e.g. ``df.apply("sqrt")``, which was already the case for :meth:`Series.apply` (:issue:`39116`)
@@ -333,6 +334,31 @@ values as measured by ``np.allclose``. Now no such casting occurs.
333334
334335
df.groupby('key').agg(lambda x: x.sum())
335336
337+
``float`` result for :meth:`.GroupBy.mean`, :meth:`.GroupBy.median`, and :meth:`.GroupBy.var`
338+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
339+
340+
Previously, these methods could result in different dtypes depending on the input values.
341+
Now, these methods will always return a float dtype. (:issue:`41137`)
342+
343+
.. ipython:: python
344+
345+
df = pd.DataFrame({'a': [True], 'b': [1], 'c': [1.0]})
346+
347+
*pandas 1.2.x*
348+
349+
.. code-block:: ipython
350+
351+
In [5]: df.groupby(df.index).mean()
352+
Out[5]:
353+
a b c
354+
0 True 1 1.0
355+
356+
*pandas 1.3.0*
357+
358+
.. ipython:: python
359+
360+
df.groupby(df.index).mean()
361+
336362
Try operating inplace when setting values with ``loc`` and ``iloc``
337363
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
338364

@@ -547,7 +573,7 @@ If installed, we now require:
547573
+-----------------+-----------------+----------+---------+
548574
| bottleneck | 1.2.1 | | |
549575
+-----------------+-----------------+----------+---------+
550-
| numexpr | 2.6.8 | | |
576+
| numexpr | 2.7.0 | | X |
551577
+-----------------+-----------------+----------+---------+
552578
| pytest (dev) | 6.0 | | X |
553579
+-----------------+-----------------+----------+---------+
@@ -648,9 +674,52 @@ Deprecations
648674
- Deprecated setting :attr:`Categorical._codes`, create a new :class:`Categorical` with the desired codes instead (:issue:`40606`)
649675
- Deprecated behavior of :meth:`DatetimeIndex.union` with mixed timezones; in a future version both will be cast to UTC instead of object dtype (:issue:`39328`)
650676
- Deprecated using ``usecols`` with out of bounds indices for ``read_csv`` with ``engine="c"`` (:issue:`25623`)
651-
- Deprecated passing arguments as positional in (:issue:`41485`) :
652-
- :meth:`DataFrame.interpolate` (other than ``"method"``) and :meth:`Series.interpolate`
653-
- :meth:`DataFrame.reset_index` (other than ``"level"``) and :meth:`Series.reset_index`
677+
- Deprecated special treatment of lists with first element a Categorical in the :class:`DataFrame` constructor; pass as ``pd.DataFrame({col: categorical, ...})`` instead (:issue:`38845`)
678+
- Deprecated passing arguments as positional (except for ``"method"``) in :meth:`DataFrame.interpolate` and :meth:`Series.interpolate` (:issue:`41485`)
679+
- Deprecated passing arguments (apart from ``value``) as positional in :meth:`DataFrame.fillna` and :meth:`Series.fillna` (:issue:`41485`)
680+
- Deprecated passing arguments as positional in :meth:`DataFrame.reset_index` (other than ``"level"``) and :meth:`Series.reset_index` (:issue:`41485`)
681+
- Deprecated construction of :class:`Series` or :class:`DataFrame` with ``DatetimeTZDtype`` data and ``datetime64[ns]`` dtype. Use ``Series(data).dt.tz_localize(None)`` instead (:issue:`41555`,:issue:`33401`)
682+
683+
.. _whatsnew_130.deprecations.nuisance_columns:
684+
685+
Deprecated Dropping Nuisance Columns in DataFrame Reductions and DataFrameGroupBy Operations
686+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
687+
The default of calling a reduction (.min, .max, .sum, ...) on a :class:`DataFrame` with
688+
``numeric_only=None`` (the default, columns on which the reduction raises ``TypeError``
689+
are silently ignored and dropped from the result.
690+
691+
This behavior is deprecated. In a future version, the ``TypeError`` will be raised,
692+
and users will need to select only valid columns before calling the function.
693+
694+
For example:
695+
696+
.. ipython:: python
697+
698+
df = pd.DataFrame({"A": [1, 2, 3, 4], "B": pd.date_range("2016-01-01", periods=4)})
699+
df
700+
701+
*Old behavior*:
702+
703+
.. code-block:: ipython
704+
705+
In [3]: df.prod()
706+
Out[3]:
707+
Out[3]:
708+
A 24
709+
dtype: int64
710+
711+
*Future behavior*:
712+
713+
.. code-block:: ipython
714+
715+
In [4]: df.prod()
716+
...
717+
TypeError: 'DatetimeArray' does not implement reduction 'prod'
718+
719+
In [5]: df[["A"]].prod()
720+
Out[5]:
721+
A 24
722+
dtype: int64
654723
655724
.. ---------------------------------------------------------------------------
656725
@@ -751,6 +820,8 @@ Conversion
751820
- Bug in :func:`factorize` where, when given an array with a numeric numpy dtype lower than int64, uint64 and float64, the unique values did not keep their original dtype (:issue:`41132`)
752821
- Bug in :class:`DataFrame` construction with a dictionary containing an arraylike with ``ExtensionDtype`` and ``copy=True`` failing to make a copy (:issue:`38939`)
753822
- Bug in :meth:`qcut` raising error when taking ``Float64DType`` as input (:issue:`40730`)
823+
- Bug in :class:`DataFrame` and :class:`Series` construction with ``datetime64[ns]`` data and ``dtype=object`` resulting in ``datetime`` objects instead of :class:`Timestamp` objects (:issue:`41599`)
824+
- Bug in :class:`DataFrame` and :class:`Series` construction with ``timedelta64[ns]`` data and ``dtype=object`` resulting in ``np.timedelta64`` objects instead of :class:`Timedelta` objects (:issue:`41599`)
754825

755826
Strings
756827
^^^^^^^
@@ -810,6 +881,7 @@ Missing
810881
- Bug in :func:`isna`, and :meth:`Series.isna`, :meth:`Index.isna`, :meth:`DataFrame.isna` (and the corresponding ``notna`` functions) not recognizing ``Decimal("NaN")`` objects (:issue:`39409`)
811882
- Bug in :meth:`DataFrame.fillna` not accepting dictionary for ``downcast`` keyword (:issue:`40809`)
812883
- Bug in :func:`isna` not returning a copy of the mask for nullable types, causing any subsequent mask modification to change the original array (:issue:`40935`)
884+
- Bug in :class:`DataFrame` construction with float data containing ``NaN`` and an integer ``dtype`` casting instead of retaining the ``NaN`` (:issue:`26919`)
813885

814886
MultiIndex
815887
^^^^^^^^^^
@@ -846,12 +918,15 @@ I/O
846918
- Bug in :func:`read_excel` dropping empty values from single-column spreadsheets (:issue:`39808`)
847919
- Bug in :func:`read_excel` loading trailing empty rows/columns for some filetypes (:issue:`41167`)
848920
- Bug in :func:`read_excel` raising ``AttributeError`` with ``MultiIndex`` header followed by two empty rows and no index, and bug affecting :func:`read_excel`, :func:`read_csv`, :func:`read_table`, :func:`read_fwf`, and :func:`read_clipboard` where one blank row after a ``MultiIndex`` header with no index would be dropped (:issue:`40442`)
849-
- Bug in :meth:`DataFrame.to_string` misplacing the truncation column when ``index=False`` (:issue:`40907`)
921+
- Bug in :meth:`DataFrame.to_string` misplacing the truncation column when ``index=False`` (:issue:`40904`)
922+
- Bug in :meth:`DataFrame.to_string` adding an extra dot and misaligning the truncation row when ``index=False`` (:issue:`40904`)
850923
- Bug in :func:`read_orc` always raising ``AttributeError`` (:issue:`40918`)
851924
- Bug in :func:`read_csv` and :func:`read_table` silently ignoring ``prefix`` if ``names`` and ``prefix`` are defined, now raising ``ValueError`` (:issue:`39123`)
852925
- Bug in :func:`read_csv` and :func:`read_excel` not respecting dtype for duplicated column name when ``mangle_dupe_cols`` is set to ``True`` (:issue:`35211`)
853926
- Bug in :func:`read_csv` and :func:`read_table` misinterpreting arguments when ``sys.setprofile`` had been previously called (:issue:`41069`)
854927
- Bug in the conversion from pyarrow to pandas (e.g. for reading Parquet) with nullable dtypes and a pyarrow array whose data buffer size is not a multiple of dtype size (:issue:`40896`)
928+
- Bug in :func:`read_excel` would raise an error when pandas could not determine the file type, even when user specified the ``engine`` argument (:issue:`41225`)
929+
-
855930

856931
Period
857932
^^^^^^
@@ -975,6 +1050,7 @@ Other
9751050
- Bug in :meth:`DataFrame.equals`, :meth:`Series.equals`, :meth:`Index.equals` with object-dtype containing ``np.datetime64("NaT")`` or ``np.timedelta64("NaT")`` (:issue:`39650`)
9761051
- Bug in :func:`pandas.util.show_versions` where console JSON output was not proper JSON (:issue:`39701`)
9771052
- Bug in :meth:`DataFrame.convert_dtypes` incorrectly raised ValueError when called on an empty DataFrame (:issue:`40393`)
1053+
- Bug in :meth:`DataFrame.agg()` not sorting the aggregated axis in the order of the provided aggragation functions when one or more aggregation function fails to produce results (:issue:`33634`)
9781054
- Bug in :meth:`DataFrame.clip` not interpreting missing values as no threshold (:issue:`40420`)
9791055
- Bug in :class:`Series` backed by :class:`DatetimeArray` or :class:`TimedeltaArray` sometimes failing to set the array's ``freq`` to ``None`` (:issue:`41425`)
9801056

environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ dependencies:
8181
- ipython>=7.11.1
8282
- jinja2<3.0.0 # pandas.Styler
8383
- matplotlib>=2.2.2 # pandas.plotting, Series.plot, DataFrame.plot
84-
- numexpr>=2.6.8
84+
- numexpr>=2.7.0
8585
- scipy>=1.2
8686
- numba>=0.46.0
8787

pandas/_libs/reduction.pyx

+5-8
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
from copy import copy
21

32
from libc.stdlib cimport (
43
free,
@@ -307,13 +306,11 @@ cpdef inline extract_result(object res):
307306
# Preserve EA
308307
res = res._values
309308
if res.ndim == 1 and len(res) == 1:
309+
# see test_agg_lambda_with_timezone, test_resampler_grouper.py::test_apply
310310
res = res[0]
311-
if hasattr(res, 'values') and is_array(res.values):
312-
res = res.values
313311
if is_array(res):
314-
if res.ndim == 0:
315-
res = res.item()
316-
elif res.ndim == 1 and len(res) == 1:
312+
if res.ndim == 1 and len(res) == 1:
313+
# see test_resampler_grouper.py::test_apply
317314
res = res[0]
318315
return res
319316

@@ -386,7 +383,7 @@ def apply_frame_axis0(object frame, object f, object names,
386383
# Need to infer if low level index slider will cause segfaults
387384
require_slow_apply = i == 0 and piece is chunk
388385
try:
389-
if not piece.index is chunk.index:
386+
if piece.index is not chunk.index:
390387
mutated = True
391388
except AttributeError:
392389
# `piece` might not have an index, could be e.g. an int
@@ -397,7 +394,7 @@ def apply_frame_axis0(object frame, object f, object names,
397394
try:
398395
piece = piece.copy(deep="all")
399396
except (TypeError, AttributeError):
400-
piece = copy(piece)
397+
pass
401398

402399
results.append(piece)
403400

pandas/compat/_optional.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@
1717
"gcsfs": "0.6.0",
1818
"lxml.etree": "4.3.0",
1919
"matplotlib": "2.2.3",
20-
"numexpr": "2.6.8",
20+
"numexpr": "2.7.0",
2121
"odfpy": "1.3.0",
2222
"openpyxl": "3.0.0",
2323
"pandas_gbq": "0.12.0",

pandas/core/apply.py

+11-3
Original file line numberDiff line numberDiff line change
@@ -376,12 +376,10 @@ def agg_list_like(self) -> FrameOrSeriesUnion:
376376
raise ValueError("no results")
377377

378378
try:
379-
return concat(results, keys=keys, axis=1, sort=False)
379+
concatenated = concat(results, keys=keys, axis=1, sort=False)
380380
except TypeError as err:
381-
382381
# we are concatting non-NDFrame objects,
383382
# e.g. a list of scalars
384-
385383
from pandas import Series
386384

387385
result = Series(results, index=keys, name=obj.name)
@@ -390,6 +388,16 @@ def agg_list_like(self) -> FrameOrSeriesUnion:
390388
"cannot combine transform and aggregation operations"
391389
) from err
392390
return result
391+
else:
392+
# Concat uses the first index to determine the final indexing order.
393+
# The union of a shorter first index with the other indices causes
394+
# the index sorting to be different from the order of the aggregating
395+
# functions. Reindex if this is the case.
396+
index_size = concatenated.index.size
397+
full_ordered_index = next(
398+
result.index for result in results if result.index.size == index_size
399+
)
400+
return concatenated.reindex(full_ordered_index, copy=False)
393401

394402
def agg_dict_like(self) -> FrameOrSeriesUnion:
395403
"""

pandas/core/arrays/base.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -493,8 +493,7 @@ def size(self) -> int:
493493
"""
494494
The number of elements in the array.
495495
"""
496-
# error: Incompatible return value type (got "number", expected "int")
497-
return np.prod(self.shape) # type: ignore[return-value]
496+
return np.prod(self.shape)
498497

499498
@property
500499
def ndim(self) -> int:

pandas/core/arrays/sparse/accessor.py

+1-2
Original file line numberDiff line numberDiff line change
@@ -354,9 +354,8 @@ def density(self) -> float:
354354
"""
355355
Ratio of non-sparse points to total (dense) data points.
356356
"""
357-
# error: Incompatible return value type (got "number", expected "float")
358357
tmp = np.mean([column.array.density for _, column in self._parent.items()])
359-
return tmp # type: ignore[return-value]
358+
return tmp
360359

361360
@staticmethod
362361
def _prep_index(data, index, columns):

pandas/core/computation/ops.py

+1-12
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@
2727
result_type_many,
2828
)
2929
from pandas.core.computation.scope import DEFAULT_GLOBALS
30-
from pandas.util.version import Version
3130

3231
from pandas.io.formats.printing import (
3332
pprint_thing,
@@ -616,18 +615,8 @@ def __repr__(self) -> str:
616615

617616
class FuncNode:
618617
def __init__(self, name: str):
619-
from pandas.core.computation.check import (
620-
NUMEXPR_INSTALLED,
621-
NUMEXPR_VERSION,
622-
)
623-
624-
if name not in MATHOPS or (
625-
NUMEXPR_INSTALLED
626-
and Version(NUMEXPR_VERSION) < Version("2.6.9")
627-
and name in ("floor", "ceil")
628-
):
618+
if name not in MATHOPS:
629619
raise ValueError(f'"{name}" is not a supported function')
630-
631620
self.name = name
632621
self.func = getattr(np, name)
633622

pandas/core/config_init.py

+23
Original file line numberDiff line numberDiff line change
@@ -726,3 +726,26 @@ def register_converter_cb(key):
726726
validator=is_one_of_factory(["auto", True, False]),
727727
cb=register_converter_cb,
728728
)
729+
730+
# ------
731+
# Styler
732+
# ------
733+
734+
styler_sparse_index_doc = """
735+
: bool
736+
Whether to sparsify the display of a hierarchical index. Setting to False will
737+
display each explicit level element in a hierarchical key for each row.
738+
"""
739+
740+
styler_sparse_columns_doc = """
741+
: bool
742+
Whether to sparsify the display of hierarchical columns. Setting to False will
743+
display each explicit level element in a hierarchical key for each column.
744+
"""
745+
746+
with cf.config_prefix("styler"):
747+
cf.register_option("sparse.index", True, styler_sparse_index_doc, validator=bool)
748+
749+
cf.register_option(
750+
"sparse.columns", True, styler_sparse_columns_doc, validator=bool
751+
)

0 commit comments

Comments
 (0)