Skip to content

Commit d2a673f

Browse files
committed
resolve conflict in whatsnew/v2.1.0.rst
2 parents 9042122 + 641427e commit d2a673f

File tree

88 files changed

+1953
-465
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

88 files changed

+1953
-465
lines changed

.github/workflows/code-checks.yml

+1
Original file line numberDiff line numberDiff line change
@@ -77,6 +77,7 @@ jobs:
7777

7878
- name: Install pandas in editable mode
7979
id: build-editable
80+
if: ${{ steps.build.outcome == 'success' && always() }}
8081
uses: ./.github/actions/build_pandas
8182
with:
8283
editable: true
+47-13
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,13 @@
1+
# This bot updates the issue with number DEPRECATION_TRACKER_ISSUE
2+
# with the PR number that issued the deprecation.
3+
4+
# It runs on commits to main, and will trigger if the PR linked to a merged commit has the "Deprecate" label
15
name: Deprecations Bot
26

37
on:
4-
pull_request:
8+
push:
59
branches:
610
- main
7-
types:
8-
[closed]
911

1012

1113
permissions:
@@ -15,17 +17,49 @@ jobs:
1517
deprecation_update:
1618
permissions:
1719
issues: write
18-
if: >-
19-
contains(github.event.pull_request.labels.*.name, 'Deprecate') && github.event.pull_request.merged == true
2020
runs-on: ubuntu-22.04
2121
env:
2222
DEPRECATION_TRACKER_ISSUE: 50578
2323
steps:
24-
- name: Checkout
25-
run: |
26-
echo "Adding deprecation PR number to deprecation tracking issue"
27-
export PR=${{ github.event.pull_request.number }}
28-
BODY=$(curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" https://api.github.com/repos/${{ github.repository }}/issues/${DEPRECATION_TRACKER_ISSUE} |
29-
python3 -c "import sys, json, os; x = {'body': json.load(sys.stdin)['body']}; pr = os.environ['PR']; x['body'] += f'\n- [ ] #{pr}'; print(json.dumps(x))")
30-
echo ${BODY}
31-
curl -H "Authorization: token ${{ secrets.GITHUB_TOKEN }}" -X PATCH -d "${BODY}" https://api.github.com/repos/${{ github.repository }}/issues/${DEPRECATION_TRACKER_ISSUE}
24+
- uses: actions/github-script@v6
25+
id: update-deprecation-issue
26+
with:
27+
script: |
28+
body = await github.rest.issues.get({
29+
owner: context.repo.owner,
30+
repo: context.repo.repo,
31+
issue_number: ${{ env.DEPRECATION_TRACKER_ISSUE }},
32+
})
33+
body = body["data"]["body"];
34+
linkedPRs = await github.rest.repos.listPullRequestsAssociatedWithCommit({
35+
owner: context.repo.owner,
36+
repo: context.repo.repo,
37+
commit_sha: '${{ github.sha }}'
38+
})
39+
linkedPRs = linkedPRs["data"];
40+
console.log(linkedPRs);
41+
if (linkedPRs.length > 0) {
42+
console.log("Found linked PR");
43+
linkedPR = linkedPRs[0]
44+
isDeprecation = false
45+
for (label of linkedPR["labels"]) {
46+
if (label["name"] == "Deprecate") {
47+
isDeprecation = true;
48+
break;
49+
}
50+
}
51+
52+
PR_NUMBER = linkedPR["number"];
53+
54+
body += ("\n- [ ] #" + PR_NUMBER);
55+
if (isDeprecation) {
56+
console.log("PR is a deprecation PR. Printing new body of issue");
57+
console.log(body);
58+
github.rest.issues.update({
59+
owner: context.repo.owner,
60+
repo: context.repo.repo,
61+
issue_number: ${{ env.DEPRECATION_TRACKER_ISSUE }},
62+
body: body
63+
})
64+
}
65+
}

.github/workflows/wheels.yml

+6-3
Original file line numberDiff line numberDiff line change
@@ -110,9 +110,12 @@ jobs:
110110
path: ./dist
111111

112112
- name: Build wheels
113-
uses: pypa/[email protected]
114-
with:
115-
package-dir: ./dist/${{ needs.build_sdist.outputs.sdist_file }}
113+
uses: pypa/[email protected]
114+
# TODO: Build wheels from sdist again
115+
# There's some sort of weird race condition?
116+
# within Github that makes the sdist be missing files
117+
#with:
118+
# package-dir: ./dist/${{ needs.build_sdist.outputs.sdist_file }}
116119
env:
117120
CIBW_BUILD: ${{ matrix.python[0] }}-${{ matrix.buildplat[1] }}
118121

asv_bench/benchmarks/groupby.py

+88-9
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,38 @@
5757
},
5858
}
5959

60+
# These aggregations don't have a kernel implemented for them yet
61+
_numba_unsupported_methods = [
62+
"all",
63+
"any",
64+
"bfill",
65+
"count",
66+
"cumcount",
67+
"cummax",
68+
"cummin",
69+
"cumprod",
70+
"cumsum",
71+
"describe",
72+
"diff",
73+
"ffill",
74+
"first",
75+
"head",
76+
"last",
77+
"median",
78+
"nunique",
79+
"pct_change",
80+
"prod",
81+
"quantile",
82+
"rank",
83+
"sem",
84+
"shift",
85+
"size",
86+
"skew",
87+
"tail",
88+
"unique",
89+
"value_counts",
90+
]
91+
6092

6193
class ApplyDictReturn:
6294
def setup(self):
@@ -453,9 +485,10 @@ class GroupByMethods:
453485
],
454486
["direct", "transformation"],
455487
[1, 5],
488+
["cython", "numba"],
456489
]
457490

458-
def setup(self, dtype, method, application, ncols):
491+
def setup(self, dtype, method, application, ncols, engine):
459492
if method in method_blocklist.get(dtype, {}):
460493
raise NotImplementedError # skip benchmark
461494

@@ -474,6 +507,19 @@ def setup(self, dtype, method, application, ncols):
474507
# DataFrameGroupBy doesn't have these methods
475508
raise NotImplementedError
476509

510+
# Numba currently doesn't support
511+
# multiple transform functions or strs for transform,
512+
# grouping on multiple columns
513+
# and we lack kernels for a bunch of methods
514+
if (
515+
engine == "numba"
516+
and method in _numba_unsupported_methods
517+
or ncols > 1
518+
or application == "transformation"
519+
or dtype == "datetime"
520+
):
521+
raise NotImplementedError
522+
477523
if method == "describe":
478524
ngroups = 20
479525
elif method == "skew":
@@ -505,17 +551,30 @@ def setup(self, dtype, method, application, ncols):
505551
if len(cols) == 1:
506552
cols = cols[0]
507553

554+
# Not everything supports the engine keyword yet
555+
kwargs = {}
556+
if engine == "numba":
557+
kwargs["engine"] = engine
558+
508559
if application == "transformation":
509-
self.as_group_method = lambda: df.groupby("key")[cols].transform(method)
510-
self.as_field_method = lambda: df.groupby(cols)["key"].transform(method)
560+
self.as_group_method = lambda: df.groupby("key")[cols].transform(
561+
method, **kwargs
562+
)
563+
self.as_field_method = lambda: df.groupby(cols)["key"].transform(
564+
method, **kwargs
565+
)
511566
else:
512-
self.as_group_method = getattr(df.groupby("key")[cols], method)
513-
self.as_field_method = getattr(df.groupby(cols)["key"], method)
567+
self.as_group_method = partial(
568+
getattr(df.groupby("key")[cols], method), **kwargs
569+
)
570+
self.as_field_method = partial(
571+
getattr(df.groupby(cols)["key"], method), **kwargs
572+
)
514573

515-
def time_dtype_as_group(self, dtype, method, application, ncols):
574+
def time_dtype_as_group(self, dtype, method, application, ncols, engine):
516575
self.as_group_method()
517576

518-
def time_dtype_as_field(self, dtype, method, application, ncols):
577+
def time_dtype_as_field(self, dtype, method, application, ncols, engine):
519578
self.as_field_method()
520579

521580

@@ -532,8 +591,12 @@ class GroupByCythonAgg:
532591
[
533592
"sum",
534593
"prod",
535-
"min",
536-
"max",
594+
# TODO: uncomment min/max
595+
# Currently, min/max implemented very inefficiently
596+
# because it re-uses the Window min/max kernel
597+
# so it will time out ASVs
598+
# "min",
599+
# "max",
537600
"mean",
538601
"median",
539602
"var",
@@ -554,6 +617,22 @@ def time_frame_agg(self, dtype, method):
554617
self.df.groupby("key").agg(method)
555618

556619

620+
class GroupByNumbaAgg(GroupByCythonAgg):
621+
"""
622+
Benchmarks specifically targeting our numba aggregation algorithms
623+
(using a big enough dataframe with simple key, so a large part of the
624+
time is actually spent in the grouped aggregation).
625+
"""
626+
627+
def setup(self, dtype, method):
628+
if method in _numba_unsupported_methods:
629+
raise NotImplementedError
630+
super().setup(dtype, method)
631+
632+
def time_frame_agg(self, dtype, method):
633+
self.df.groupby("key").agg(method, engine="numba")
634+
635+
557636
class GroupByCythonAggEaDtypes:
558637
"""
559638
Benchmarks specifically targeting our cython aggregation algorithms

ci/code_checks.sh

-13
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
105105
pandas.errors.UnsupportedFunctionCall \
106106
pandas.test \
107107
pandas.NaT \
108-
pandas.Timestamp.as_unit \
109-
pandas.Timestamp.ctime \
110108
pandas.Timestamp.date \
111109
pandas.Timestamp.dst \
112110
pandas.Timestamp.isocalendar \
@@ -121,16 +119,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
121119
pandas.Timestamp.utcoffset \
122120
pandas.Timestamp.utctimetuple \
123121
pandas.Timestamp.weekday \
124-
pandas.arrays.DatetimeArray \
125-
pandas.Timedelta.view \
126-
pandas.Timedelta.as_unit \
127-
pandas.Timedelta.ceil \
128-
pandas.Timedelta.floor \
129-
pandas.Timedelta.round \
130-
pandas.Timedelta.to_pytimedelta \
131-
pandas.Timedelta.to_timedelta64 \
132-
pandas.Timedelta.to_numpy \
133-
pandas.Timedelta.total_seconds \
134122
pandas.arrays.TimedeltaArray \
135123
pandas.Period.asfreq \
136124
pandas.Period.now \
@@ -263,7 +251,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
263251
pandas.core.window.ewm.ExponentialMovingWindow.cov \
264252
pandas.api.indexers.BaseIndexer \
265253
pandas.api.indexers.VariableOffsetWindowIndexer \
266-
pandas.core.groupby.SeriesGroupBy.fillna \
267254
pandas.io.formats.style.Styler \
268255
pandas.io.formats.style.Styler.from_custom_template \
269256
pandas.io.formats.style.Styler.set_caption \

doc/source/development/contributing.rst

+16
Original file line numberDiff line numberDiff line change
@@ -119,6 +119,22 @@ Some great resources for learning Git:
119119
* the `NumPy documentation <https://numpy.org/doc/stable/dev/index.html>`_.
120120
* Matthew Brett's `Pydagogue <https://matthew-brett.github.io/pydagogue/>`_.
121121

122+
Also, the project follows a forking workflow further described on this page whereby
123+
contributors fork the repository, make changes and then create a pull request.
124+
So please be sure to read and follow all the instructions in this guide.
125+
126+
If you are new to contributing to projects through forking on GitHub,
127+
take a look at the `GitHub documentation for contributing to projects <https://docs.github.com/en/get-started/quickstart/contributing-to-projects>`_.
128+
GitHub provides a quick tutorial using a test repository that may help you become more familiar
129+
with forking a repository, cloning a fork, creating a feature branch, pushing changes and
130+
making pull requests.
131+
132+
Below are some useful resources for learning more about forking and pull requests on GitHub:
133+
134+
* the `GitHub documentation for forking a repo <https://docs.github.com/en/get-started/quickstart/fork-a-repo>`_.
135+
* the `GitHub documentation for collaborating with pull requests <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests>`_.
136+
* the `GitHub documentation for working with forks <https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks>`_.
137+
122138
Getting started with Git
123139
------------------------
124140

doc/source/user_guide/basics.rst

+1-3
Original file line numberDiff line numberDiff line change
@@ -675,7 +675,7 @@ matching index:
675675
Value counts (histogramming) / mode
676676
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
677677

678-
The :meth:`~Series.value_counts` Series method and top-level function computes a histogram
678+
The :meth:`~Series.value_counts` Series method computes a histogram
679679
of a 1D array of values. It can also be used as a function on regular arrays:
680680

681681
.. ipython:: python
@@ -684,7 +684,6 @@ of a 1D array of values. It can also be used as a function on regular arrays:
684684
data
685685
s = pd.Series(data)
686686
s.value_counts()
687-
pd.value_counts(data)
688687
689688
The :meth:`~DataFrame.value_counts` method can be used to count combinations across multiple columns.
690689
By default all columns are used but a subset can be selected using the ``subset`` argument.
@@ -733,7 +732,6 @@ normally distributed data into equal-size quartiles like so:
733732
arr = np.random.randn(30)
734733
factor = pd.qcut(arr, [0, 0.25, 0.5, 0.75, 1])
735734
factor
736-
pd.value_counts(factor)
737735
738736
We can also pass infinite values to define the bins:
739737

doc/source/user_guide/missing_data.rst

-7
Original file line numberDiff line numberDiff line change
@@ -551,13 +551,6 @@ For a DataFrame, you can specify individual values by column:
551551
552552
df.replace({"a": 0, "b": 5}, 100)
553553
554-
Instead of replacing with specified values, you can treat all given values as
555-
missing and interpolate over them:
556-
557-
.. ipython:: python
558-
559-
ser.replace([1, 2, 3], method="pad")
560-
561554
.. _missing_data.replace_expression:
562555

563556
String/regular expression replacement

doc/source/whatsnew/v2.0.3.rst

+2
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@ Fixed regressions
2121

2222
Bug fixes
2323
~~~~~~~~~
24+
- Bug in :func:`RangeIndex.union` when using ``sort=True`` with another :class:`RangeIndex` (:issue:`53490`)
2425
- Bug in :func:`read_csv` when defining ``dtype`` with ``bool[pyarrow]`` for the ``"c"`` and ``"python"`` engines (:issue:`53390`)
2526
- Bug in :meth:`Series.str.split` and :meth:`Series.str.rsplit` with ``expand=True`` for :class:`ArrowDtype` with ``pyarrow.string`` (:issue:`53532`)
27+
- Bug in indexing methods (e.g. :meth:`DataFrame.__getitem__`) where taking the entire :class:`DataFrame`/:class:`Series` would raise an ``OverflowError`` when Copy on Write was enabled and the length of the array was over the maximum size a 32-bit integer can hold (:issue:`53616`)
2628
-
2729

2830
.. ---------------------------------------------------------------------------

0 commit comments

Comments
 (0)