Skip to content

Commit ba1edd6

Browse files
authored
Merge branch 'main' into dev/depr/literal-str-read_xml
2 parents e08f4e0 + 4576909 commit ba1edd6

File tree

120 files changed

+1487
-758
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

120 files changed

+1487
-758
lines changed
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Purge caches once a week
2+
on:
3+
schedule:
4+
# 4:10 UTC on Sunday
5+
- cron: "10 4 * * 0"
6+
7+
jobs:
8+
cleanup:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- name: Clean Cache
12+
run: |
13+
gh extension install actions/gh-actions-cache
14+
15+
REPO=${{ github.repository }}
16+
17+
echo "Fetching list of cache key"
18+
allCaches=$(gh actions-cache list -L 100 -R $REPO | cut -f 1 )
19+
20+
## Setting this to not fail the workflow while deleting cache keys.
21+
set +e
22+
echo "Deleting caches..."
23+
for cacheKey in $allCaches
24+
do
25+
gh actions-cache delete $cacheKey -R $REPO --confirm
26+
done
27+
echo "Done"
28+
env:
29+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}

.github/workflows/unit-tests.yml

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,15 @@ jobs:
5757
# Also install zh_CN (its encoding is gb2312) but do not activate it.
5858
# It will be temporarily activated during tests with locale.setlocale
5959
extra_loc: "zh_CN"
60-
- name: "Copy-on-Write"
60+
- name: "Copy-on-Write 3.9"
61+
env_file: actions-39.yaml
62+
pattern: "not slow and not network and not single_cpu"
63+
pandas_copy_on_write: "1"
64+
- name: "Copy-on-Write 3.10"
65+
env_file: actions-310.yaml
66+
pattern: "not slow and not network and not single_cpu"
67+
pandas_copy_on_write: "1"
68+
- name: "Copy-on-Write 3.11"
6169
env_file: actions-311.yaml
6270
pattern: "not slow and not network and not single_cpu"
6371
pandas_copy_on_write: "1"

.pre-commit-config.yaml

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -15,18 +15,11 @@ default_stages: [
1515
ci:
1616
autofix_prs: false
1717
repos:
18-
- repo: local
18+
- repo: https://github.com/hauntsaninja/black-pre-commit-mirror
19+
# black compiled with mypyc
20+
rev: 23.3.0
1921
hooks:
20-
# NOTE: we make `black` a local hook because if it's installed from
21-
# PyPI (rather than from source) then it'll run twice as fast thanks to mypyc
22-
- id: black
23-
name: black
24-
description: "Black: The uncompromising Python code formatter"
25-
entry: black
26-
language: python
27-
require_serial: true
28-
types_or: [python, pyi]
29-
additional_dependencies: [black==23.3.0]
22+
- id: black
3023
- repo: https://github.com/charliermarsh/ruff-pre-commit
3124
rev: v0.0.270
3225
hooks:
@@ -74,7 +67,7 @@ repos:
7467
--linelength=88,
7568
'--filter=-readability/casting,-runtime/int,-build/include_subdir,-readability/fn_size'
7669
]
77-
- repo: https://github.com/pycqa/pylint
70+
- repo: https://github.com/pylint-dev/pylint
7871
rev: v3.0.0a6
7972
hooks:
8073
- id: pylint
@@ -93,11 +86,6 @@ repos:
9386
|^pandas/conftest\.py # keep excluded
9487
args: [--disable=all, --enable=redefined-outer-name]
9588
stages: [manual]
96-
- id: pylint
97-
alias: unspecified-encoding
98-
name: Using open without explicitly specifying an encoding
99-
args: [--disable=all, --enable=unspecified-encoding]
100-
stages: [manual]
10189
- repo: https://github.com/PyCQA/isort
10290
rev: 5.12.0
10391
hooks:

ci/code_checks.sh

Lines changed: 0 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -110,31 +110,6 @@ if [[ -z "$CHECK" || "$CHECK" == "docstrings" ]]; then
110110
pandas_object \
111111
pandas.api.interchange.from_dataframe \
112112
pandas.DatetimeIndex.snap \
113-
pandas.core.window.rolling.Rolling.max \
114-
pandas.core.window.rolling.Rolling.cov \
115-
pandas.core.window.rolling.Rolling.skew \
116-
pandas.core.window.rolling.Rolling.apply \
117-
pandas.core.window.rolling.Window.mean \
118-
pandas.core.window.rolling.Window.sum \
119-
pandas.core.window.rolling.Window.var \
120-
pandas.core.window.rolling.Window.std \
121-
pandas.core.window.expanding.Expanding.count \
122-
pandas.core.window.expanding.Expanding.sum \
123-
pandas.core.window.expanding.Expanding.mean \
124-
pandas.core.window.expanding.Expanding.median \
125-
pandas.core.window.expanding.Expanding.min \
126-
pandas.core.window.expanding.Expanding.max \
127-
pandas.core.window.expanding.Expanding.corr \
128-
pandas.core.window.expanding.Expanding.cov \
129-
pandas.core.window.expanding.Expanding.skew \
130-
pandas.core.window.expanding.Expanding.apply \
131-
pandas.core.window.expanding.Expanding.quantile \
132-
pandas.core.window.ewm.ExponentialMovingWindow.mean \
133-
pandas.core.window.ewm.ExponentialMovingWindow.sum \
134-
pandas.core.window.ewm.ExponentialMovingWindow.std \
135-
pandas.core.window.ewm.ExponentialMovingWindow.var \
136-
pandas.core.window.ewm.ExponentialMovingWindow.corr \
137-
pandas.core.window.ewm.ExponentialMovingWindow.cov \
138113
pandas.api.indexers.BaseIndexer \
139114
pandas.api.indexers.VariableOffsetWindowIndexer \
140115
pandas.io.formats.style.Styler \

doc/source/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -240,7 +240,7 @@
240240
"footer_start": ["pandas_footer", "sphinx-version"],
241241
"github_url": "https://github.com/pandas-dev/pandas",
242242
"twitter_url": "https://twitter.com/pandas_dev",
243-
"analytics": {"google_analytics_id": "UA-27880019-2"},
243+
"analytics": {"google_analytics_id": "G-5RE31C1RNW"},
244244
"logo": {"image_dark": "https://pandas.pydata.org/static/img/pandas_white.svg"},
245245
"navbar_end": ["version-switcher", "theme-switcher", "navbar-icon-links"],
246246
"switcher": {

doc/source/development/contributing_codebase.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -764,7 +764,7 @@ install pandas) by typing::
764764
your installation is probably fine and you can start contributing!
765765

766766
Often it is worth running only a subset of tests first around your changes before running the
767-
entire suite (tip: you can use the [pandas-coverage app](https://pandas-coverage.herokuapp.com/))
767+
entire suite (tip: you can use the [pandas-coverage app](https://pandas-coverage-12d2130077bc.herokuapp.com/))
768768
to find out which tests hit the lines of code you've modified, and then run only those).
769769

770770
The easiest way to do this is with::

doc/source/getting_started/comparison/comparison_with_r.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ In pandas we may use :meth:`~pandas.pivot_table` method to handle this:
246246
}
247247
)
248248
249-
baseball.pivot_table(values="batting avg", columns="team", aggfunc=np.max)
249+
baseball.pivot_table(values="batting avg", columns="team", aggfunc="max")
250250
251251
For more details and examples see :ref:`the reshaping documentation
252252
<reshaping.pivot>`.
@@ -359,7 +359,7 @@ In pandas the equivalent expression, using the
359359
)
360360
361361
grouped = df.groupby(["month", "week"])
362-
grouped["x"].agg([np.mean, np.std])
362+
grouped["x"].agg(["mean", "std"])
363363
364364
365365
For more details and examples see :ref:`the groupby documentation
@@ -482,7 +482,7 @@ In Python the best way is to make use of :meth:`~pandas.pivot_table`:
482482
values="value",
483483
index=["variable", "week"],
484484
columns=["month"],
485-
aggfunc=np.mean,
485+
aggfunc="mean",
486486
)
487487
488488
Similarly for ``dcast`` which uses a data.frame called ``df`` in R to

doc/source/getting_started/comparison/comparison_with_sql.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ to your grouped DataFrame, indicating which functions to apply to specific colum
198198
199199
.. ipython:: python
200200
201-
tips.groupby("day").agg({"tip": np.mean, "day": np.size})
201+
tips.groupby("day").agg({"tip": "mean", "day": "size"})
202202
203203
Grouping by more than one column is done by passing a list of columns to the
204204
:meth:`~pandas.DataFrame.groupby` method.
@@ -222,7 +222,7 @@ Grouping by more than one column is done by passing a list of columns to the
222222
223223
.. ipython:: python
224224
225-
tips.groupby(["smoker", "day"]).agg({"tip": [np.size, np.mean]})
225+
tips.groupby(["smoker", "day"]).agg({"tip": ["size", "mean"]})
226226
227227
.. _compare_with_sql.join:
228228

doc/source/user_guide/10min.rst

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,16 @@ Customarily, we import as follows:
1616
import numpy as np
1717
import pandas as pd
1818
19+
Basic data structures in pandas
20+
-------------------------------
21+
22+
Pandas provides two types of classes for handling data:
23+
24+
1. :class:`Series`: a one-dimensional labeled array holding data of any type
25+
such as integers, strings, Python objects etc.
26+
2. :class:`DataFrame`: a two-dimensional data structure that holds data like
27+
a two-dimension array or a table with rows and columns.
28+
1929
Object creation
2030
---------------
2131

doc/source/user_guide/basics.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -881,8 +881,8 @@ statistics methods, takes an optional ``axis`` argument:
881881

882882
.. ipython:: python
883883
884-
df.apply(np.mean)
885-
df.apply(np.mean, axis=1)
884+
df.apply(lambda x: np.mean(x))
885+
df.apply(lambda x: np.mean(x), axis=1)
886886
df.apply(lambda x: x.max() - x.min())
887887
df.apply(np.cumsum)
888888
df.apply(np.exp)
@@ -986,7 +986,7 @@ output:
986986

987987
.. ipython:: python
988988
989-
tsdf.agg(np.sum)
989+
tsdf.agg(lambda x: np.sum(x))
990990
991991
tsdf.agg("sum")
992992

doc/source/user_guide/cookbook.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -530,7 +530,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
530530
531531
code_groups = df.groupby("code")
532532
533-
agg_n_sort_order = code_groups[["data"]].transform(sum).sort_values(by="data")
533+
agg_n_sort_order = code_groups[["data"]].transform("sum").sort_values(by="data")
534534
535535
sorted_df = df.loc[agg_n_sort_order.index]
536536
@@ -549,7 +549,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
549549
return x.iloc[1] * 1.234
550550
return pd.NaT
551551
552-
mhc = {"Mean": np.mean, "Max": np.max, "Custom": MyCust}
552+
mhc = {"Mean": "mean", "Max": "max", "Custom": MyCust}
553553
ts.resample("5min").apply(mhc)
554554
ts
555555
@@ -685,7 +685,7 @@ The :ref:`Pivot <reshaping.pivot>` docs.
685685
values=["Sales"],
686686
index=["Province"],
687687
columns=["City"],
688-
aggfunc=np.sum,
688+
aggfunc="sum",
689689
margins=True,
690690
)
691691
table.stack("City")

doc/source/user_guide/groupby.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -878,7 +878,7 @@ will be broadcast across the group.
878878
grouped.transform("sum")
879879
880880
In addition to string aliases, the :meth:`~.DataFrameGroupBy.transform` method can
881-
also except User-Defined functions (UDFs). The UDF must:
881+
also accept User-Defined Functions (UDFs). The UDF must:
882882

883883
* Return a result that is either the same size as the group chunk or
884884
broadcastable to the size of the group chunk (e.g., a scalar,
@@ -1363,7 +1363,7 @@ implementation headache).
13631363
Grouping with ordered factors
13641364
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13651365

1366-
Categorical variables represented as instance of pandas's ``Categorical`` class
1366+
Categorical variables represented as instances of pandas's ``Categorical`` class
13671367
can be used as group keys. If so, the order of the levels will be preserved:
13681368

13691369
.. ipython:: python
@@ -1496,7 +1496,7 @@ You can also select multiple rows from each group by specifying multiple nth val
14961496
# get the first, 4th, and last date index for each month
14971497
df.groupby([df.index.year, df.index.month]).nth([0, 3, -1])
14981498
1499-
You may also use a slices or lists of slices.
1499+
You may also use slices or lists of slices.
15001500

15011501
.. ipython:: python
15021502

doc/source/user_guide/io.rst

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1568,8 +1568,7 @@ class of the csv module. For this, you have to specify ``sep=None``.
15681568
.. ipython:: python
15691569
15701570
df = pd.DataFrame(np.random.randn(10, 4))
1571-
df.to_csv("tmp.csv", sep="|")
1572-
df.to_csv("tmp2.csv", sep=":")
1571+
df.to_csv("tmp2.csv", sep=":", index=False)
15731572
pd.read_csv("tmp2.csv", sep=None, engine="python")
15741573
15751574
.. ipython:: python
@@ -1597,8 +1596,8 @@ rather than reading the entire file into memory, such as the following:
15971596
.. ipython:: python
15981597
15991598
df = pd.DataFrame(np.random.randn(10, 4))
1600-
df.to_csv("tmp.csv", sep="|")
1601-
table = pd.read_csv("tmp.csv", sep="|")
1599+
df.to_csv("tmp.csv", index=False)
1600+
table = pd.read_csv("tmp.csv")
16021601
table
16031602
16041603
@@ -1607,8 +1606,8 @@ value will be an iterable object of type ``TextFileReader``:
16071606

16081607
.. ipython:: python
16091608
1610-
with pd.read_csv("tmp.csv", sep="|", chunksize=4) as reader:
1611-
reader
1609+
with pd.read_csv("tmp.csv", chunksize=4) as reader:
1610+
print(reader)
16121611
for chunk in reader:
16131612
print(chunk)
16141613
@@ -1620,8 +1619,8 @@ Specifying ``iterator=True`` will also return the ``TextFileReader`` object:
16201619

16211620
.. ipython:: python
16221621
1623-
with pd.read_csv("tmp.csv", sep="|", iterator=True) as reader:
1624-
reader.get_chunk(5)
1622+
with pd.read_csv("tmp.csv", iterator=True) as reader:
1623+
print(reader.get_chunk(5))
16251624
16261625
.. ipython:: python
16271626
:suppress:

doc/source/user_guide/reshaping.rst

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -402,12 +402,12 @@ We can produce pivot tables from this data very easily:
402402
.. ipython:: python
403403
404404
pd.pivot_table(df, values="D", index=["A", "B"], columns=["C"])
405-
pd.pivot_table(df, values="D", index=["B"], columns=["A", "C"], aggfunc=np.sum)
405+
pd.pivot_table(df, values="D", index=["B"], columns=["A", "C"], aggfunc="sum")
406406
pd.pivot_table(
407407
df, values=["D", "E"],
408408
index=["B"],
409409
columns=["A", "C"],
410-
aggfunc=np.sum,
410+
aggfunc="sum",
411411
)
412412
413413
The result object is a :class:`DataFrame` having potentially hierarchical indexes on the
@@ -451,7 +451,7 @@ rows and columns:
451451
columns="C",
452452
values=["D", "E"],
453453
margins=True,
454-
aggfunc=np.std
454+
aggfunc="std"
455455
)
456456
table
457457
@@ -552,7 +552,7 @@ each group defined by the first two :class:`Series`:
552552

553553
.. ipython:: python
554554
555-
pd.crosstab(df["A"], df["B"], values=df["C"], aggfunc=np.sum)
555+
pd.crosstab(df["A"], df["B"], values=df["C"], aggfunc="sum")
556556
557557
Adding margins
558558
~~~~~~~~~~~~~~
@@ -562,7 +562,7 @@ Finally, one can also add margins or normalize this output.
562562
.. ipython:: python
563563
564564
pd.crosstab(
565-
df["A"], df["B"], values=df["C"], aggfunc=np.sum, normalize=True, margins=True
565+
df["A"], df["B"], values=df["C"], aggfunc="sum", normalize=True, margins=True
566566
)
567567
568568
.. _reshaping.tile:

doc/source/user_guide/timeseries.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1801,22 +1801,22 @@ You can pass a list or dict of functions to do aggregation with, outputting a ``
18011801

18021802
.. ipython:: python
18031803
1804-
r["A"].agg([np.sum, np.mean, np.std])
1804+
r["A"].agg(["sum", "mean", "std"])
18051805
18061806
On a resampled ``DataFrame``, you can pass a list of functions to apply to each
18071807
column, which produces an aggregated result with a hierarchical index:
18081808

18091809
.. ipython:: python
18101810
1811-
r.agg([np.sum, np.mean])
1811+
r.agg(["sum", "mean"])
18121812
18131813
By passing a dict to ``aggregate`` you can apply a different aggregation to the
18141814
columns of a ``DataFrame``:
18151815

18161816
.. ipython:: python
18171817
:okexcept:
18181818
1819-
r.agg({"A": np.sum, "B": lambda x: np.std(x, ddof=1)})
1819+
r.agg({"A": "sum", "B": lambda x: np.std(x, ddof=1)})
18201820
18211821
The function names can also be strings. In order for a string to be valid it
18221822
must be implemented on the resampled object:

doc/source/user_guide/window.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ of multiple aggregations applied to a window.
140140
.. ipython:: python
141141
142142
df = pd.DataFrame({"A": range(5), "B": range(10, 15)})
143-
df.expanding().agg([np.sum, np.mean, np.std])
143+
df.expanding().agg(["sum", "mean", "std"])
144144
145145
146146
.. _window.generic:

0 commit comments

Comments
 (0)