Skip to content

Commit 0a87cbe

Browse files
Merge remote-tracking branch 'upstream/main' into multi-bug
2 parents caa3f99 + 50c119d commit 0a87cbe

File tree

118 files changed

+1678
-489
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

118 files changed

+1678
-489
lines changed

.circleci/setup_env.sh

+12-54
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,16 @@
11
#!/bin/bash -e
22

3-
# edit the locale file if needed
4-
if [[ "$(uname)" == "Linux" && -n "$LC_ALL" ]]; then
5-
echo "Adding locale to the first line of pandas/__init__.py"
6-
rm -f pandas/__init__.pyc
7-
SEDC="3iimport locale\nlocale.setlocale(locale.LC_ALL, '$LC_ALL')\n"
8-
sed -i "$SEDC" pandas/__init__.py
9-
10-
echo "[head -4 pandas/__init__.py]"
11-
head -4 pandas/__init__.py
12-
echo
13-
fi
3+
echo "Install Mambaforge"
4+
MAMBA_URL="https://github.com/conda-forge/miniforge/releases/download/4.14.0-0/Mambaforge-4.14.0-0-Linux-aarch64.sh"
5+
echo "Downloading $MAMBA_URL"
6+
wget -q $MAMBA_URL -O minimamba.sh
7+
chmod +x minimamba.sh
148

9+
MAMBA_DIR="$HOME/miniconda3"
10+
rm -rf $MAMBA_DIR
11+
./minimamba.sh -b -p $MAMBA_DIR
1512

16-
MINICONDA_DIR=/usr/local/miniconda
17-
if [ -e $MINICONDA_DIR ] && [ "$BITS32" != yes ]; then
18-
echo "Found Miniconda installation at $MINICONDA_DIR"
19-
else
20-
echo "Install Miniconda"
21-
DEFAULT_CONDA_URL="https://repo.continuum.io/miniconda/Miniconda3-latest"
22-
if [[ "$(uname -m)" == 'aarch64' ]]; then
23-
CONDA_URL="https://github.com/conda-forge/miniforge/releases/download/4.10.1-4/Miniforge3-4.10.1-4-Linux-aarch64.sh"
24-
elif [[ "$(uname)" == 'Linux' ]]; then
25-
if [[ "$BITS32" == "yes" ]]; then
26-
CONDA_URL="$DEFAULT_CONDA_URL-Linux-x86.sh"
27-
else
28-
CONDA_URL="$DEFAULT_CONDA_URL-Linux-x86_64.sh"
29-
fi
30-
elif [[ "$(uname)" == 'Darwin' ]]; then
31-
CONDA_URL="$DEFAULT_CONDA_URL-MacOSX-x86_64.sh"
32-
else
33-
echo "OS $(uname) not supported"
34-
exit 1
35-
fi
36-
echo "Downloading $CONDA_URL"
37-
wget -q $CONDA_URL -O miniconda.sh
38-
chmod +x miniconda.sh
39-
40-
MINICONDA_DIR="$HOME/miniconda3"
41-
rm -rf $MINICONDA_DIR
42-
./miniconda.sh -b -p $MINICONDA_DIR
43-
fi
44-
export PATH=$MINICONDA_DIR/bin:$PATH
13+
export PATH=$MAMBA_DIR/bin:$PATH
4514

4615
echo
4716
echo "which conda"
@@ -51,7 +20,7 @@ echo
5120
echo "update conda"
5221
conda config --set ssl_verify false
5322
conda config --set quiet true --set always_yes true --set changeps1 false
54-
conda install -y -c conda-forge -n base 'mamba>=0.21.2' pip setuptools
23+
mamba install -y -c conda-forge -n base pip setuptools
5524

5625
echo "conda info -a"
5726
conda info -a
@@ -70,11 +39,6 @@ time mamba env update -n pandas-dev --file="${ENV_FILE}"
7039
echo "conda list -n pandas-dev"
7140
conda list -n pandas-dev
7241

73-
if [[ "$BITS32" == "yes" ]]; then
74-
# activate 32-bit compiler
75-
export CONDA_BUILD=1
76-
fi
77-
7842
echo "activate pandas-dev"
7943
source activate pandas-dev
8044

@@ -90,15 +54,9 @@ if pip list | grep -q ^pandas; then
9054
pip uninstall -y pandas || true
9155
fi
9256

93-
if [ "$(conda list -f qt --json)" != [] ]; then
94-
echo
95-
echo "remove qt"
96-
echo "causes problems with the clipboard, we use xsel for that"
97-
conda remove qt -y --force || true
98-
fi
99-
10057
echo "Build extensions"
101-
python setup.py build_ext -q -j3
58+
# GH 47305: Parallel build can causes flaky ImportError from pandas/_libs/tslibs
59+
python setup.py build_ext -q -j1
10260

10361
echo "Install pandas"
10462
python -m pip install --no-build-isolation --no-use-pep517 -e .

.github/workflows/docbuild-and-upload.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -67,7 +67,7 @@ jobs:
6767
echo "${{ secrets.server_ssh_key }}" > ~/.ssh/id_rsa
6868
chmod 600 ~/.ssh/id_rsa
6969
echo "${{ secrets.server_ip }} ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBE1Kkopomm7FHG5enATf7SgnpICZ4W2bw+Ho+afqin+w7sMcrsa0je7sbztFAV8YchDkiBKnWTG4cRT+KZgZCaY=" > ~/.ssh/known_hosts
70-
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
70+
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/'))
7171

7272
- name: Copy cheatsheets into site directory
7373
run: cp doc/cheatsheet/Pandas_Cheat_Sheet* web/build/

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ repos:
2222
hooks:
2323
- id: black
2424
- repo: https://github.com/codespell-project/codespell
25-
rev: v2.1.0
25+
rev: v2.2.1
2626
hooks:
2727
- id: codespell
2828
types_or: [python, rst, markdown]

Dockerfile

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM quay.io/condaforge/mambaforge
1+
FROM quay.io/condaforge/mambaforge:4.13.0-1
22

33
# if you forked pandas, you can pass in your own GitHub username to use your fork
44
# i.e. gh_username=myname

asv_bench/benchmarks/groupby.py

+41
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
import numpy as np
66

77
from pandas import (
8+
NA,
89
Categorical,
910
DataFrame,
1011
Index,
@@ -560,6 +561,46 @@ def time_frame_agg(self, dtype, method):
560561
self.df.groupby("key").agg(method)
561562

562563

564+
class GroupByCythonAggEaDtypes:
565+
"""
566+
Benchmarks specifically targeting our cython aggregation algorithms
567+
(using a big enough dataframe with simple key, so a large part of the
568+
time is actually spent in the grouped aggregation).
569+
"""
570+
571+
param_names = ["dtype", "method"]
572+
params = [
573+
["Float64", "Int64", "Int32"],
574+
[
575+
"sum",
576+
"prod",
577+
"min",
578+
"max",
579+
"mean",
580+
"median",
581+
"var",
582+
"first",
583+
"last",
584+
"any",
585+
"all",
586+
],
587+
]
588+
589+
def setup(self, dtype, method):
590+
N = 1_000_000
591+
df = DataFrame(
592+
np.random.randint(0, high=100, size=(N, 10)),
593+
columns=list("abcdefghij"),
594+
dtype=dtype,
595+
)
596+
df.loc[list(range(1, N, 5)), list("abcdefghij")] = NA
597+
df["key"] = np.random.randint(0, 100, size=N)
598+
self.df = df
599+
600+
def time_frame_agg(self, dtype, method):
601+
self.df.groupby("key").agg(method)
602+
603+
563604
class Cumulative:
564605
param_names = ["dtype", "method"]
565606
params = [

asv_bench/benchmarks/multiindex_object.py

+29
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,11 @@
33
import numpy as np
44

55
from pandas import (
6+
NA,
67
DataFrame,
78
MultiIndex,
89
RangeIndex,
10+
Series,
911
date_range,
1012
)
1113

@@ -255,4 +257,31 @@ def time_operation(self, index_structure, dtype, method):
255257
getattr(self.left, method)(self.right)
256258

257259

260+
class Unique:
261+
params = [
262+
(("Int64", NA), ("int64", 0)),
263+
]
264+
param_names = ["dtype_val"]
265+
266+
def setup(self, dtype_val):
267+
level = Series(
268+
[1, 2, dtype_val[1], dtype_val[1]] + list(range(1_000_000)),
269+
dtype=dtype_val[0],
270+
)
271+
self.midx = MultiIndex.from_arrays([level, level])
272+
273+
level_dups = Series(
274+
[1, 2, dtype_val[1], dtype_val[1]] + list(range(500_000)) * 2,
275+
dtype=dtype_val[0],
276+
)
277+
278+
self.midx_dups = MultiIndex.from_arrays([level_dups, level_dups])
279+
280+
def time_unique(self, dtype_val):
281+
self.midx.unique()
282+
283+
def time_unique_dups(self, dtype_val):
284+
self.midx_dups.unique()
285+
286+
258287
from .pandas_vb_common import setup # noqa: F401 isort:skip

ci/deps/actions-310.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ dependencies:
4343
- pyreadstat
4444
- python-snappy
4545
- pyxlsb
46-
- s3fs
46+
- s3fs>=2021.08.0
4747
- scipy
4848
- sqlalchemy
4949
- tabulate

ci/deps/actions-38-downstream_compat.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ dependencies:
4444
- pytables
4545
- python-snappy
4646
- pyxlsb
47-
- s3fs
47+
- s3fs>=2021.08.0
4848
- scipy
4949
- sqlalchemy
5050
- tabulate

ci/deps/actions-38-minimum_versions.yaml

+3-3
Original file line numberDiff line numberDiff line change
@@ -26,10 +26,10 @@ dependencies:
2626
- bottleneck=1.3.2
2727
- brotlipy=0.7.0
2828
- fastparquet=0.4.0
29-
- fsspec=2021.05.0
29+
- fsspec=2021.07.0
3030
- html5lib=1.1
3131
- hypothesis=6.13.0
32-
- gcsfs=2021.05.0
32+
- gcsfs=2021.07.0
3333
- jinja2=3.0.0
3434
- lxml=4.6.3
3535
- matplotlib=3.3.2
@@ -45,7 +45,7 @@ dependencies:
4545
- pytables=3.6.1
4646
- python-snappy=0.6.0
4747
- pyxlsb=1.0.8
48-
- s3fs=2021.05.0
48+
- s3fs=2021.08.0
4949
- scipy=1.7.1
5050
- sqlalchemy=1.4.16
5151
- tabulate=0.8.9

ci/deps/actions-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ dependencies:
4343
- pytables
4444
- python-snappy
4545
- pyxlsb
46-
- s3fs
46+
- s3fs>=2021.08.0
4747
- scipy
4848
- sqlalchemy
4949
- tabulate

ci/deps/actions-39.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ dependencies:
4343
- pytables
4444
- python-snappy
4545
- pyxlsb
46-
- s3fs
46+
- s3fs>=2021.08.0
4747
- scipy
4848
- sqlalchemy
4949
- tabulate

ci/deps/circle-38-arm64.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ dependencies:
4444
- pytables
4545
- python-snappy
4646
- pyxlsb
47-
- s3fs
47+
- s3fs>=2021.08.0
4848
- scipy
4949
- sqlalchemy
5050
- tabulate

doc/source/development/contributing.rst

+19-27
Original file line numberDiff line numberDiff line change
@@ -194,30 +194,10 @@ Doing 'git status' again should give something like::
194194
# modified: /relative/path/to/file-you-added.py
195195
#
196196

197-
Finally, commit your changes to your local repository with an explanatory message. pandas
198-
uses a convention for commit message prefixes and layout. Here are
199-
some common prefixes along with general guidelines for when to use them:
197+
Finally, commit your changes to your local repository with an explanatory commit
198+
message::
200199

201-
* ENH: Enhancement, new functionality
202-
* BUG: Bug fix
203-
* DOC: Additions/updates to documentation
204-
* TST: Additions/updates to tests
205-
* BLD: Updates to the build process/scripts
206-
* PERF: Performance improvement
207-
* TYP: Type annotations
208-
* CLN: Code cleanup
209-
210-
The following defines how a commit message should be structured. Please reference the
211-
relevant GitHub issues in your commit message using GH1234 or #1234. Either style
212-
is fine, but the former is generally preferred:
213-
214-
* a subject line with ``< 80`` chars.
215-
* One blank line.
216-
* Optionally, a commit message body.
217-
218-
Now you can commit your changes in your local repository::
219-
220-
git commit -m
200+
git commit -m "your commit message goes here"
221201

222202
.. _contributing.push-code:
223203

@@ -262,16 +242,28 @@ double check your branch changes against the branch it was based on:
262242
Finally, make the pull request
263243
------------------------------
264244

265-
If everything looks good, you are ready to make a pull request. A pull request is how
245+
If everything looks good, you are ready to make a pull request. A pull request is how
266246
code from a local repository becomes available to the GitHub community and can be looked
267-
at and eventually merged into the main version. This pull request and its associated
247+
at and eventually merged into the main version. This pull request and its associated
268248
changes will eventually be committed to the main branch and available in the next
269-
release. To submit a pull request:
249+
release. To submit a pull request:
270250

271251
#. Navigate to your repository on GitHub
272-
#. Click on the ``Pull Request`` button
252+
#. Click on the ``Compare & pull request`` button
273253
#. You can then click on ``Commits`` and ``Files Changed`` to make sure everything looks
274254
okay one last time
255+
#. Write a descriptive title that includes prefixes. pandas uses a convention for title
256+
prefixes. Here are some common ones along with general guidelines for when to use them:
257+
258+
* ENH: Enhancement, new functionality
259+
* BUG: Bug fix
260+
* DOC: Additions/updates to documentation
261+
* TST: Additions/updates to tests
262+
* BLD: Updates to the build process/scripts
263+
* PERF: Performance improvement
264+
* TYP: Type annotations
265+
* CLN: Code cleanup
266+
275267
#. Write a description of your changes in the ``Preview Discussion`` tab
276268
#. Click ``Send Pull Request``.
277269

doc/source/development/contributing_codebase.rst

+5
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,11 @@ If you want to run checks on all recently committed files on upstream/main you c
7575

7676
without needing to have done ``pre-commit install`` beforehand.
7777

78+
.. note::
79+
80+
You may want to periodically run ``pre-commit gc``, to clean up repos
81+
which are no longer used.
82+
7883
.. note::
7984

8085
If you have conflicting installations of ``virtualenv``, then you may get an

0 commit comments

Comments
 (0)