Skip to content

Commit f26a7fc

Browse files
authored
Merge branch 'main' into selectn_series_perf_impact
2 parents d10aa50 + 78acf94 commit f26a7fc

File tree

18 files changed

+99
-212
lines changed

18 files changed

+99
-212
lines changed

doc/source/development/contributing_codebase.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -537,7 +537,7 @@ Preferred ``pytest`` idioms
537537
test and does not check if the test will fail. If this is the behavior you desire, use ``pytest.skip`` instead.
538538

539539
If a test is known to fail but the manner in which it fails
540-
is not meant to be captured, use ``pytest.mark.xfail`` It is common to use this method for a test that
540+
is not meant to be captured, use ``pytest.mark.xfail``. It is common to use this method for a test that
541541
exhibits buggy behavior or a non-implemented feature. If
542542
the failing test has flaky behavior, use the argument ``strict=False``. This
543543
will make it so pytest does not fail if the test happens to pass. Using ``strict=False`` is highly undesirable, please use it only as a last resort.

doc/source/user_guide/basics.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@ of elements to display is five, but you may pass a custom number.
3636
Attributes and underlying data
3737
------------------------------
3838

39-
pandas objects have a number of attributes enabling you to access the metadata
39+
pandas objects have a number of attributes enabling you to access the metadata.
4040

4141
* **shape**: gives the axis dimensions of the object, consistent with ndarray
4242
* Axis labels
@@ -59,7 +59,7 @@ NumPy's type system to add support for custom arrays
5959
(see :ref:`basics.dtypes`).
6060

6161
To get the actual data inside a :class:`Index` or :class:`Series`, use
62-
the ``.array`` property
62+
the ``.array`` property.
6363

6464
.. ipython:: python
6565
@@ -88,18 +88,18 @@ NumPy doesn't have a dtype to represent timezone-aware datetimes, so there
8888
are two possibly useful representations:
8989

9090
1. An object-dtype :class:`numpy.ndarray` with :class:`Timestamp` objects, each
91-
with the correct ``tz``
91+
with the correct ``tz``.
9292
2. A ``datetime64[ns]`` -dtype :class:`numpy.ndarray`, where the values have
93-
been converted to UTC and the timezone discarded
93+
been converted to UTC and the timezone discarded.
9494

95-
Timezones may be preserved with ``dtype=object``
95+
Timezones may be preserved with ``dtype=object``:
9696

9797
.. ipython:: python
9898
9999
ser = pd.Series(pd.date_range("2000", periods=2, tz="CET"))
100100
ser.to_numpy(dtype=object)
101101
102-
Or thrown away with ``dtype='datetime64[ns]'``
102+
Or thrown away with ``dtype='datetime64[ns]'``:
103103

104104
.. ipython:: python
105105

doc/source/whatsnew/v3.0.0.rst

+2
Original file line numberDiff line numberDiff line change
@@ -775,6 +775,7 @@ Groupby/resample/rolling
775775
- Bug in :meth:`.DataFrameGroupBy.quantile` when ``interpolation="nearest"`` is inconsistent with :meth:`DataFrame.quantile` (:issue:`47942`)
776776
- Bug in :meth:`.Resampler.interpolate` on a :class:`DataFrame` with non-uniform sampling and/or indices not aligning with the resulting resampled index would result in wrong interpolation (:issue:`21351`)
777777
- Bug in :meth:`DataFrame.ewm` and :meth:`Series.ewm` when passed ``times`` and aggregation functions other than mean (:issue:`51695`)
778+
- Bug in :meth:`DataFrame.resample` changing index type to :class:`MultiIndex` when the dataframe is empty and using an upsample method (:issue:`55572`)
778779
- Bug in :meth:`DataFrameGroupBy.agg` that raises ``AttributeError`` when there is dictionary input and duplicated columns, instead of returning a DataFrame with the aggregation of all duplicate columns. (:issue:`55041`)
779780
- Bug in :meth:`DataFrameGroupBy.apply` and :meth:`SeriesGroupBy.apply` for empty data frame with ``group_keys=False`` still creating output index using group keys. (:issue:`60471`)
780781
- Bug in :meth:`DataFrameGroupBy.apply` that was returning a completely empty DataFrame when all return values of ``func`` were ``None`` instead of returning an empty DataFrame with the original columns and dtypes. (:issue:`57775`)
@@ -841,6 +842,7 @@ Other
841842
- Bug in :meth:`DataFrame.where` where using a non-bool type array in the function would return a ``ValueError`` instead of a ``TypeError`` (:issue:`56330`)
842843
- Bug in :meth:`Index.sort_values` when passing a key function that turns values into tuples, e.g. ``key=natsort.natsort_key``, would raise ``TypeError`` (:issue:`56081`)
843844
- Bug in :meth:`MultiIndex.fillna` error message was referring to ``isna`` instead of ``fillna`` (:issue:`60974`)
845+
- Bug in :meth:`Series.describe` where median percentile was always included when the ``percentiles`` argument was passed (:issue:`60550`).
844846
- Bug in :meth:`Series.diff` allowing non-integer values for the ``periods`` argument. (:issue:`56607`)
845847
- Bug in :meth:`Series.dt` methods in :class:`ArrowDtype` that were returning incorrect values. (:issue:`57355`)
846848
- Bug in :meth:`Series.isin` raising ``TypeError`` when series is large (>10**6) and ``values`` contains NA (:issue:`60678`)

pandas/_libs/tslibs/period.pyx

+2-5
Original file line numberDiff line numberDiff line change
@@ -1752,9 +1752,6 @@ cdef class _Period(PeriodMixin):
17521752
def __cinit__(self, int64_t ordinal, BaseOffset freq):
17531753
self.ordinal = ordinal
17541754
self.freq = freq
1755-
# Note: this is more performant than PeriodDtype.from_date_offset(freq)
1756-
# because from_date_offset cannot be made a cdef method (until cython
1757-
# supported cdef classmethods)
17581755
self._dtype = PeriodDtypeBase(freq._period_dtype_code, freq.n)
17591756

17601757
@classmethod
@@ -1913,7 +1910,7 @@ cdef class _Period(PeriodMixin):
19131910

19141911
Parameters
19151912
----------
1916-
freq : str, BaseOffset
1913+
freq : str, DateOffset
19171914
The target frequency to convert the Period object to.
19181915
If a string is provided,
19191916
it must be a valid :ref:`period alias <timeseries.period_aliases>`.
@@ -2599,7 +2596,7 @@ cdef class _Period(PeriodMixin):
25992596
26002597
Parameters
26012598
----------
2602-
freq : str, BaseOffset
2599+
freq : str, DateOffset
26032600
Frequency to use for the returned period.
26042601
26052602
See Also

pandas/core/generic.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -10818,9 +10818,8 @@ def describe(
1081810818
----------
1081910819
percentiles : list-like of numbers, optional
1082010820
The percentiles to include in the output. All should
10821-
fall between 0 and 1. The default is
10822-
``[.25, .5, .75]``, which returns the 25th, 50th, and
10823-
75th percentiles.
10821+
fall between 0 and 1. The default, ``None``, will automatically
10822+
return the 25th, 50th, and 75th percentiles.
1082410823
include : 'all', list-like of dtypes or None (default), optional
1082510824
A white list of data types to include in the result. Ignored
1082610825
for ``Series``. Here are the options:

pandas/core/methods/describe.py

+6-5
Original file line numberDiff line numberDiff line change
@@ -229,10 +229,15 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
229229

230230
formatted_percentiles = format_percentiles(percentiles)
231231

232+
if len(percentiles) == 0:
233+
quantiles = []
234+
else:
235+
quantiles = series.quantile(percentiles).tolist()
236+
232237
stat_index = ["count", "mean", "std", "min"] + formatted_percentiles + ["max"]
233238
d = (
234239
[series.count(), series.mean(), series.std(), series.min()]
235-
+ series.quantile(percentiles).tolist()
240+
+ quantiles
236241
+ [series.max()]
237242
)
238243
# GH#48340 - always return float on non-complex numeric data
@@ -354,10 +359,6 @@ def _refine_percentiles(
354359
# get them all to be in [0, 1]
355360
validate_percentile(percentiles)
356361

357-
# median should always be included
358-
if 0.5 not in percentiles:
359-
percentiles.append(0.5)
360-
361362
percentiles = np.asarray(percentiles)
362363

363364
# sort and check for duplicates

pandas/core/resample.py

+12-11
Original file line numberDiff line numberDiff line change
@@ -507,22 +507,12 @@ def _wrap_result(self, result):
507507
"""
508508
Potentially wrap any results.
509509
"""
510-
# GH 47705
511-
obj = self.obj
512-
if (
513-
isinstance(result, ABCDataFrame)
514-
and len(result) == 0
515-
and not isinstance(result.index, PeriodIndex)
516-
):
517-
result = result.set_index(
518-
_asfreq_compat(obj.index[:0], freq=self.freq), append=True
519-
)
520-
521510
if isinstance(result, ABCSeries) and self._selection is not None:
522511
result.name = self._selection
523512

524513
if isinstance(result, ABCSeries) and result.empty:
525514
# When index is all NaT, result is empty but index is not
515+
obj = self.obj
526516
result.index = _asfreq_compat(obj.index[:0], freq=self.freq)
527517
result.name = getattr(obj, "name", None)
528518

@@ -1756,6 +1746,17 @@ def func(x):
17561746
return x.apply(f, *args, **kwargs)
17571747

17581748
result = self._groupby.apply(func)
1749+
1750+
# GH 47705
1751+
if (
1752+
isinstance(result, ABCDataFrame)
1753+
and len(result) == 0
1754+
and not isinstance(result.index, PeriodIndex)
1755+
):
1756+
result = result.set_index(
1757+
_asfreq_compat(self.obj.index[:0], freq=self.freq), append=True
1758+
)
1759+
17591760
return self._wrap_result(result)
17601761

17611762
_upsample = _apply

pandas/io/excel/_base.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -197,7 +197,7 @@
197197
False otherwise. An example of a valid callable argument would be ``lambda
198198
x: x in [0, 2]``.
199199
nrows : int, default None
200-
Number of rows to parse.
200+
Number of rows to parse. Does not include header rows.
201201
na_values : scalar, str, list-like, or dict, default None
202202
Additional strings to recognize as NA/NaN. If dict passed, specific
203203
per-column NA values. By default the following values are interpreted

pandas/io/formats/format.py

+3
Original file line numberDiff line numberDiff line change
@@ -1565,6 +1565,9 @@ def format_percentiles(
15651565
>>> format_percentiles([0, 0.5, 0.02001, 0.5, 0.666666, 0.9999])
15661566
['0%', '50%', '2.0%', '50%', '66.67%', '99.99%']
15671567
"""
1568+
if len(percentiles) == 0:
1569+
return []
1570+
15681571
percentiles = np.asarray(percentiles)
15691572

15701573
# It checks for np.nan as well

pandas/tests/frame/methods/test_describe.py

+41
Original file line numberDiff line numberDiff line change
@@ -413,3 +413,44 @@ def test_describe_exclude_pa_dtype(self):
413413
dtype=pd.ArrowDtype(pa.float64()),
414414
)
415415
tm.assert_frame_equal(result, expected)
416+
417+
@pytest.mark.parametrize("percentiles", [None, [], [0.2]])
418+
def test_refine_percentiles(self, percentiles):
419+
"""
420+
Test that the percentiles are returned correctly depending on the `percentiles`
421+
argument.
422+
- The default behavior is to return the 25th, 50th, and 75 percentiles
423+
- If `percentiles` is an empty list, no percentiles are returned
424+
- If `percentiles` is a non-empty list, only those percentiles are returned
425+
"""
426+
# GH#60550
427+
df = DataFrame({"a": np.arange(0, 10, 1)})
428+
429+
result = df.describe(percentiles=percentiles)
430+
431+
if percentiles is None:
432+
percentiles = [0.25, 0.5, 0.75]
433+
434+
expected = DataFrame(
435+
[
436+
len(df.a),
437+
df.a.mean(),
438+
df.a.std(),
439+
df.a.min(),
440+
*[df.a.quantile(p) for p in percentiles],
441+
df.a.max(),
442+
],
443+
index=pd.Index(
444+
[
445+
"count",
446+
"mean",
447+
"std",
448+
"min",
449+
*[f"{p:.0%}" for p in percentiles],
450+
"max",
451+
]
452+
),
453+
columns=["a"],
454+
)
455+
456+
tm.assert_frame_equal(result, expected)

pandas/tests/groupby/methods/test_describe.py

+3-3
Original file line numberDiff line numberDiff line change
@@ -202,15 +202,15 @@ def test_describe_duplicate_columns():
202202
gb = df.groupby(df[1])
203203
result = gb.describe(percentiles=[])
204204

205-
columns = ["count", "mean", "std", "min", "50%", "max"]
205+
columns = ["count", "mean", "std", "min", "max"]
206206
frames = [
207-
DataFrame([[1.0, val, np.nan, val, val, val]], index=[1], columns=columns)
207+
DataFrame([[1.0, val, np.nan, val, val]], index=[1], columns=columns)
208208
for val in (0.0, 2.0, 3.0)
209209
]
210210
expected = pd.concat(frames, axis=1)
211211
expected.columns = MultiIndex(
212212
levels=[[0, 2], columns],
213-
codes=[6 * [0] + 6 * [1] + 6 * [0], 3 * list(range(6))],
213+
codes=[5 * [0] + 5 * [1] + 5 * [0], 3 * list(range(5))],
214214
)
215215
expected.index.names = [1]
216216
tm.assert_frame_equal(result, expected)

pandas/tests/resample/test_base.py

+18
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,24 @@ def test_resample_size_empty_dataframe(freq, index):
438438
tm.assert_series_equal(result, expected)
439439

440440

441+
@pytest.mark.parametrize("index", [DatetimeIndex([]), TimedeltaIndex([])])
442+
@pytest.mark.parametrize("freq", ["D", "h"])
443+
@pytest.mark.parametrize(
444+
"method", ["ffill", "bfill", "nearest", "asfreq", "interpolate", "mean"]
445+
)
446+
def test_resample_apply_empty_dataframe(index, freq, method):
447+
# GH#55572
448+
empty_frame_dti = DataFrame(index=index)
449+
450+
rs = empty_frame_dti.resample(freq)
451+
result = rs.apply(getattr(rs, method))
452+
453+
expected_index = _asfreq_compat(empty_frame_dti.index, freq)
454+
expected = DataFrame([], index=expected_index)
455+
456+
tm.assert_frame_equal(result, expected)
457+
458+
441459
@pytest.mark.parametrize(
442460
"index",
443461
[

web/pandas/config.yml

-20
Original file line numberDiff line numberDiff line change
@@ -146,16 +146,6 @@ sponsors:
146146
url: https://numfocus.org/
147147
logo: static/img/partners/numfocus.svg
148148
kind: numfocus
149-
- name: "Two Sigma"
150-
url: https://www.twosigma.com/
151-
logo: static/img/partners/two_sigma.svg
152-
kind: partner
153-
description: "Jeff Reback"
154-
- name: "Voltron Data"
155-
url: https://voltrondata.com/
156-
logo: static/img/partners/voltron_data.svg
157-
kind: partner
158-
description: "Joris Van den Bossche"
159149
- name: "Coiled"
160150
url: https://www.coiled.io
161151
logo: static/img/partners/coiled.svg
@@ -171,21 +161,11 @@ sponsors:
171161
logo: static/img/partners/nvidia.svg
172162
kind: partner
173163
description: "Matthew Roeschke"
174-
- name: "Intel"
175-
url: https://www.intel.com/
176-
logo: /static/img/partners/intel.svg
177-
kind: partner
178-
description: "Brock Mendel"
179164
- name: "Tidelift"
180165
url: https://tidelift.com
181166
logo: static/img/partners/tidelift.svg
182167
kind: regular
183168
description: "<i>pandas</i> is part of the <a href=\"https://tidelift.com/subscription/pkg/pypi-pandas?utm_source=pypi-pandas&utm_medium=referral&utm_campaign=readme\">Tidelift subscription</a>. You can support pandas by becoming a Tidelift subscriber."
184-
- name: "Chan Zuckerberg Initiative"
185-
url: https://chanzuckerberg.com/
186-
logo: static/img/partners/czi.svg
187-
kind: regular
188-
description: "<i>pandas</i> is funded by the Essential Open Source Software for Science program of the Chan Zuckerberg Initiative. The funding is used for general maintenance, improve extension types, and a efficient string type."
189169
- name: "Bodo"
190170
url: https://www.bodo.ai/
191171
logo: static/img/partners/bodo.svg

web/pandas/index.html

+2-2
Original file line numberDiff line numberDiff line change
@@ -46,10 +46,10 @@ <h5>With the support of:</h5>
4646
{% for row in sponsors.active | batch(6, "") %}
4747
<div class="row mx-auto h-100">
4848
{% for company in row %}
49-
<div class="col-6 col-md-2">
49+
<div class="col-6 col-md-2 d-flex align-items-center justify-content-center">
5050
{% if company %}
5151
<a href="{{ company.url }}" target="_blank">
52-
<img class="img-fluid img-thumbnail py-5 mx-auto" alt="{{ company.name }}" src="{{ base_url }}{{ company.logo }}"/>
52+
<img class="img-fluid w-100" alt="{{ company.name }}" src="{{ base_url }}{{ company.logo }}"/>
5353
</a>
5454
{% endif %}
5555
</div>

0 commit comments

Comments
 (0)