Skip to content

Commit 80ce02e

Browse files
committed
Merge branch 'master' into 42916
2 parents 90af576 + 00e10a5 commit 80ce02e

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

53 files changed

+971
-468
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
- [ ] closes #xxxx
22
- [ ] tests added / passed
3-
- [ ] Ensure all linting tests pass, see [here](https://pandas.pydata.org/pandas-docs/dev/development/contributing.html#code-standards) for how to run them
3+
- [ ] Ensure all linting tests pass, see [here](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit) for how to run them
44
- [ ] whatsnew entry

doc/source/development/contributing.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -331,7 +331,12 @@ can comment::
331331

332332
@github-actions pre-commit
333333

334-
on that pull request. This will trigger a workflow which will autofix formatting errors.
334+
on that pull request. This will trigger a workflow which will autofix formatting
335+
errors.
336+
337+
To automatically fix formatting errors on each commit you make, you can
338+
set up pre-commit yourself. First, create a Python :ref:`environment
339+
<contributing_environment>` and then set up :ref:`pre-commit <contributing.pre-commit>`.
335340

336341
Delete your merged branch (optional)
337342
------------------------------------

doc/source/development/contributing_environment.rst

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,6 @@ compiler installation instructions.
133133

134134
Let us know if you have any difficulties by opening an issue or reaching out on `Gitter <https://gitter.im/pydata/pandas/>`_.
135135

136-
137136
Creating a Python environment
138137
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
139138

doc/source/ecosystem.rst

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -575,3 +575,17 @@ Library Accessor Classes Description
575575
.. _composeml: https://github.com/alteryx/compose
576576
.. _datatest: https://datatest.readthedocs.io/
577577
.. _woodwork: https://github.com/alteryx/woodwork
578+
579+
Development tools
580+
----------------------------
581+
582+
`pandas-stubs <https://github.com/VirtusLab/pandas-stubs>`__
583+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
584+
585+
While pandas repository is partially typed, the package itself doesn't expose this information for external use.
586+
Install pandas-stubs to enable basic type coverage of pandas API.
587+
588+
Learn more by reading through these issues `14468 <https://github.com/pandas-dev/pandas/issues/14468>`_,
589+
`26766 <https://github.com/pandas-dev/pandas/issues/26766>`_, `28142 <https://github.com/pandas-dev/pandas/issues/28142>`_.
590+
591+
See installation and usage instructions on the `github page <https://github.com/VirtusLab/pandas-stubs>`__.

doc/source/user_guide/options.rst

Lines changed: 19 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,11 @@ and so passing in a substring will work - as long as it is unambiguous:
3838

3939
.. ipython:: python
4040
41-
pd.get_option("display.max_rows")
42-
pd.set_option("display.max_rows", 101)
43-
pd.get_option("display.max_rows")
44-
pd.set_option("max_r", 102)
45-
pd.get_option("display.max_rows")
41+
pd.get_option("display.chop_threshold")
42+
pd.set_option("display.chop_threshold", 2)
43+
pd.get_option("display.chop_threshold")
44+
pd.set_option("chop", 4)
45+
pd.get_option("display.chop_threshold")
4646
4747
4848
The following will **not work** because it matches multiple option names, e.g.
@@ -52,7 +52,7 @@ The following will **not work** because it matches multiple option names, e.g.
5252
:okexcept:
5353
5454
try:
55-
pd.get_option("column")
55+
pd.get_option("max")
5656
except KeyError as e:
5757
print(e)
5858
@@ -153,27 +153,27 @@ lines are replaced by an ellipsis.
153153
.. ipython:: python
154154
155155
df = pd.DataFrame(np.random.randn(7, 2))
156-
pd.set_option("max_rows", 7)
156+
pd.set_option("display.max_rows", 7)
157157
df
158-
pd.set_option("max_rows", 5)
158+
pd.set_option("display.max_rows", 5)
159159
df
160-
pd.reset_option("max_rows")
160+
pd.reset_option("display.max_rows")
161161
162162
Once the ``display.max_rows`` is exceeded, the ``display.min_rows`` options
163163
determines how many rows are shown in the truncated repr.
164164

165165
.. ipython:: python
166166
167-
pd.set_option("max_rows", 8)
168-
pd.set_option("min_rows", 4)
167+
pd.set_option("display.max_rows", 8)
168+
pd.set_option("display.min_rows", 4)
169169
# below max_rows -> all rows shown
170170
df = pd.DataFrame(np.random.randn(7, 2))
171171
df
172172
# above max_rows -> only min_rows (4) rows shown
173173
df = pd.DataFrame(np.random.randn(9, 2))
174174
df
175-
pd.reset_option("max_rows")
176-
pd.reset_option("min_rows")
175+
pd.reset_option("display.max_rows")
176+
pd.reset_option("display.min_rows")
177177
178178
``display.expand_frame_repr`` allows for the representation of
179179
dataframes to stretch across pages, wrapped over the full column vs row-wise.
@@ -193,13 +193,13 @@ dataframes to stretch across pages, wrapped over the full column vs row-wise.
193193
.. ipython:: python
194194
195195
df = pd.DataFrame(np.random.randn(10, 10))
196-
pd.set_option("max_rows", 5)
196+
pd.set_option("display.max_rows", 5)
197197
pd.set_option("large_repr", "truncate")
198198
df
199199
pd.set_option("large_repr", "info")
200200
df
201201
pd.reset_option("large_repr")
202-
pd.reset_option("max_rows")
202+
pd.reset_option("display.max_rows")
203203
204204
``display.max_colwidth`` sets the maximum width of columns. Cells
205205
of this length or longer will be truncated with an ellipsis.
@@ -491,6 +491,10 @@ styler.render.repr html Standard output format for
491491
Should be one of "html" or "latex".
492492
styler.render.max_elements 262144 Maximum number of datapoints that Styler will render
493493
trimming either rows, columns or both to fit.
494+
styler.render.max_rows None Maximum number of rows that Styler will render. By default
495+
this is dynamic based on ``max_elements``.
496+
styler.render.max_columns None Maximum number of columns that Styler will render. By default
497+
this is dynamic based on ``max_elements``.
494498
styler.render.encoding utf-8 Default encoding for output HTML or LaTeX files.
495499
styler.format.formatter None Object to specify formatting functions to ``Styler.format``.
496500
styler.format.na_rep None String representation for missing data.

doc/source/whatsnew/v1.3.3.rst

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,8 +24,13 @@ Fixed regressions
2424
- Fixed regression in :meth:`read_parquet` where the ``fastparquet`` engine would not work properly with fastparquet 0.7.0 (:issue:`43075`)
2525
- Fixed regression in :meth:`DataFrame.loc.__setitem__` raising ``ValueError`` when setting array as cell value (:issue:`43422`)
2626
- Fixed regression in :func:`is_list_like` where objects with ``__iter__`` set to ``None`` would be identified as iterable (:issue:`43373`)
27+
- Fixed regression in :meth:`DataFrame.__getitem__` raising error for slice of :class:`DatetimeIndex` when index is non monotonic (:issue:`43223`)
2728
- Fixed regression in :meth:`.Resampler.aggregate` when used after column selection would raise if ``func`` is a list of aggregation functions (:issue:`42905`)
2829
- Fixed regression in :meth:`DataFrame.corr` where Kendall correlation would produce incorrect results for columns with repeated values (:issue:`43401`)
30+
- Fixed regression in :meth:`DataFrame.groupby` where aggregation on columns with object types dropped results on those columns (:issue:`42395`, :issue:`43108`)
31+
- Fixed regression in :meth:`Series.fillna` raising ``TypeError`` when filling ``float`` ``Series`` with list-like fill value having a dtype which couldn't cast lostlessly (like ``float32`` filled with ``float64``) (:issue:`43424`)
32+
- Fixed regression in :func:`read_csv` throwing an ``AttributeError`` when the file handle is an ``tempfile.SpooledTemporaryFile`` object (:issue:`43439`)
33+
-
2934

3035
.. ---------------------------------------------------------------------------
3136

doc/source/whatsnew/v1.4.0.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ Styler
7575
- Styling of indexing has been added, with :meth:`.Styler.apply_index` and :meth:`.Styler.applymap_index`. These mirror the signature of the methods already used to style data values, and work with both HTML and LaTeX format (:issue:`41893`).
7676
- :meth:`.Styler.bar` introduces additional arguments to control alignment and display (:issue:`26070`, :issue:`36419`), and it also validates the input arguments ``width`` and ``height`` (:issue:`42511`).
7777
- :meth:`.Styler.to_latex` introduces keyword argument ``environment``, which also allows a specific "longtable" entry through a separate jinja2 template (:issue:`41866`).
78-
- :meth:`.Styler.to_html` introduces keyword arguments ``sparse_index``, ``sparse_columns``, ``bold_headers``, ``caption`` (:issue:`41946`, :issue:`43149`).
78+
- :meth:`.Styler.to_html` introduces keyword arguments ``sparse_index``, ``sparse_columns``, ``bold_headers``, ``caption``, ``max_rows`` and ``max_columns`` (:issue:`41946`, :issue:`43149`, :issue:`42972`).
7979
- Keyword arguments ``level`` and ``names`` added to :meth:`.Styler.hide_index` and :meth:`.Styler.hide_columns` for additional control of visibility of MultiIndexes and index names (:issue:`25475`, :issue:`43404`, :issue:`43346`)
8080
- Global options have been extended to configure default ``Styler`` properties including formatting and encoding and mathjax options and LaTeX (:issue:`41395`)
8181
- Naive sparsification is now possible for LaTeX without the multirow package (:issue:`43369`)
@@ -294,6 +294,8 @@ Performance improvements
294294
- Performance improvement in :meth:`to_datetime` with ``uint`` dtypes (:issue:`42606`)
295295
- Performance improvement in :meth:`Series.sparse.to_coo` (:issue:`42880`)
296296
- Performance improvement in indexing with a :class:`MultiIndex` indexer on another :class:`MultiIndex` (:issue:43370`)
297+
- Performance improvement in :meth:`GroupBy.quantile` (:issue:`43469`)
298+
-
297299

298300
.. ---------------------------------------------------------------------------
299301
@@ -440,6 +442,7 @@ Styler
440442
- Bug in :meth:`Styler.apply` where functions which returned Series objects were not correctly handled in terms of aligning their index labels (:issue:`13657`, :issue:`42014`)
441443
- Bug when rendering an empty DataFrame with a named index (:issue:`43305`).
442444
- Bug when rendering a single level MultiIndex (:issue:`43383`).
445+
- Bug when combining non-sparse rendering and :meth:`.Styler.hide_columns` or :meth:`.Styler.hide_index` (:issue:`43464`)
443446

444447
Other
445448
^^^^^

pandas/_libs/groupby.pyi

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,11 +84,11 @@ def group_ohlc(
8484
min_count: int = ...,
8585
) -> None: ...
8686
def group_quantile(
87-
out: np.ndarray, # ndarray[float64_t]
87+
out: np.ndarray, # ndarray[float64_t, ndim=2]
8888
values: np.ndarray, # ndarray[numeric, ndim=1]
8989
labels: np.ndarray, # ndarray[int64_t]
9090
mask: np.ndarray, # ndarray[uint8_t]
91-
q: float, # float64_t
91+
qs: np.ndarray, # const float64_t[:]
9292
interpolation: Literal["linear", "lower", "higher", "nearest", "midpoint"],
9393
) -> None: ...
9494
def group_last(

pandas/_libs/groupby.pyx

Lines changed: 46 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -770,25 +770,25 @@ def group_ohlc(floating[:, ::1] out,
770770

771771
@cython.boundscheck(False)
772772
@cython.wraparound(False)
773-
def group_quantile(ndarray[float64_t] out,
773+
def group_quantile(ndarray[float64_t, ndim=2] out,
774774
ndarray[numeric, ndim=1] values,
775775
ndarray[intp_t] labels,
776776
ndarray[uint8_t] mask,
777-
float64_t q,
777+
const float64_t[:] qs,
778778
str interpolation) -> None:
779779
"""
780780
Calculate the quantile per group.
781781

782782
Parameters
783783
----------
784-
out : np.ndarray[np.float64]
784+
out : np.ndarray[np.float64, ndim=2]
785785
Array of aggregated values that will be written to.
786786
values : np.ndarray
787787
Array containing the values to apply the function against.
788788
labels : ndarray[np.intp]
789789
Array containing the unique group labels.
790-
q : float
791-
The quantile value to search for.
790+
qs : ndarray[float64_t]
791+
The quantile values to search for.
792792
interpolation : {'linear', 'lower', 'highest', 'nearest', 'midpoint'}
793793

794794
Notes
@@ -797,17 +797,20 @@ def group_quantile(ndarray[float64_t] out,
797797
provided `out` parameter.
798798
"""
799799
cdef:
800-
Py_ssize_t i, N=len(labels), ngroups, grp_sz, non_na_sz
800+
Py_ssize_t i, N=len(labels), ngroups, grp_sz, non_na_sz, k, nqs
801801
Py_ssize_t grp_start=0, idx=0
802802
intp_t lab
803803
uint8_t interp
804-
float64_t q_idx, frac, val, next_val
804+
float64_t q_val, q_idx, frac, val, next_val
805805
ndarray[int64_t] counts, non_na_counts, sort_arr
806806

807807
assert values.shape[0] == N
808808

809-
if not (0 <= q <= 1):
810-
raise ValueError(f"'q' must be between 0 and 1. Got '{q}' instead")
809+
if any(not (0 <= q <= 1) for q in qs):
810+
wrong = [x for x in qs if not (0 <= x <= 1)][0]
811+
raise ValueError(
812+
f"Each 'q' must be between 0 and 1. Got '{wrong}' instead"
813+
)
811814

812815
inter_methods = {
813816
'linear': INTERPOLATION_LINEAR,
@@ -818,9 +821,10 @@ def group_quantile(ndarray[float64_t] out,
818821
}
819822
interp = inter_methods[interpolation]
820823

821-
counts = np.zeros_like(out, dtype=np.int64)
822-
non_na_counts = np.zeros_like(out, dtype=np.int64)
823-
ngroups = len(counts)
824+
nqs = len(qs)
825+
ngroups = len(out)
826+
counts = np.zeros(ngroups, dtype=np.int64)
827+
non_na_counts = np.zeros(ngroups, dtype=np.int64)
824828

825829
# First figure out the size of every group
826830
with nogil:
@@ -850,33 +854,37 @@ def group_quantile(ndarray[float64_t] out,
850854
non_na_sz = non_na_counts[i]
851855

852856
if non_na_sz == 0:
853-
out[i] = NaN
857+
for k in range(nqs):
858+
out[i, k] = NaN
854859
else:
855-
# Calculate where to retrieve the desired value
856-
# Casting to int will intentionally truncate result
857-
idx = grp_start + <int64_t>(q * <float64_t>(non_na_sz - 1))
858-
859-
val = values[sort_arr[idx]]
860-
# If requested quantile falls evenly on a particular index
861-
# then write that index's value out. Otherwise interpolate
862-
q_idx = q * (non_na_sz - 1)
863-
frac = q_idx % 1
864-
865-
if frac == 0.0 or interp == INTERPOLATION_LOWER:
866-
out[i] = val
867-
else:
868-
next_val = values[sort_arr[idx + 1]]
869-
if interp == INTERPOLATION_LINEAR:
870-
out[i] = val + (next_val - val) * frac
871-
elif interp == INTERPOLATION_HIGHER:
872-
out[i] = next_val
873-
elif interp == INTERPOLATION_MIDPOINT:
874-
out[i] = (val + next_val) / 2.0
875-
elif interp == INTERPOLATION_NEAREST:
876-
if frac > .5 or (frac == .5 and q > .5): # Always OK?
877-
out[i] = next_val
878-
else:
879-
out[i] = val
860+
for k in range(nqs):
861+
q_val = qs[k]
862+
863+
# Calculate where to retrieve the desired value
864+
# Casting to int will intentionally truncate result
865+
idx = grp_start + <int64_t>(q_val * <float64_t>(non_na_sz - 1))
866+
867+
val = values[sort_arr[idx]]
868+
# If requested quantile falls evenly on a particular index
869+
# then write that index's value out. Otherwise interpolate
870+
q_idx = q_val * (non_na_sz - 1)
871+
frac = q_idx % 1
872+
873+
if frac == 0.0 or interp == INTERPOLATION_LOWER:
874+
out[i, k] = val
875+
else:
876+
next_val = values[sort_arr[idx + 1]]
877+
if interp == INTERPOLATION_LINEAR:
878+
out[i, k] = val + (next_val - val) * frac
879+
elif interp == INTERPOLATION_HIGHER:
880+
out[i, k] = next_val
881+
elif interp == INTERPOLATION_MIDPOINT:
882+
out[i, k] = (val + next_val) / 2.0
883+
elif interp == INTERPOLATION_NEAREST:
884+
if frac > .5 or (frac == .5 and q_val > .5): # Always OK?
885+
out[i, k] = next_val
886+
else:
887+
out[i, k] = val
880888

881889
# Increment the index reference in sorted_arr for the next group
882890
grp_start += grp_sz

0 commit comments

Comments
 (0)