Skip to content

Commit dace93d

Browse files
authored
Merge branch 'master' into issue4889
2 parents 77fec12 + dca6901 commit dace93d

File tree

193 files changed

+3273
-1921
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

193 files changed

+3273
-1921
lines changed

.github/actions/build_pandas/action.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -13,5 +13,5 @@ runs:
1313
- name: Build Pandas
1414
run: |
1515
python setup.py build_ext -j 2
16-
python -m pip install -e . --no-build-isolation --no-use-pep517
16+
python -m pip install -e . --no-build-isolation --no-use-pep517 --no-index
1717
shell: bash -l {0}

.github/workflows/python-dev.yml

+18-7
Original file line numberDiff line numberDiff line change
@@ -21,12 +21,20 @@ env:
2121

2222
jobs:
2323
build:
24-
runs-on: ubuntu-latest
24+
runs-on: ${{ matrix.os }}
25+
strategy:
26+
fail-fast: false
27+
matrix:
28+
os: [ubuntu-latest, macOS-latest, windows-latest]
29+
2530
name: actions-310-dev
2631
timeout-minutes: 60
2732

33+
env:
34+
NUMPY_WHEELS_AVAILABLE: ${{ matrix.os == 'ubuntu-latest' }}
35+
2836
concurrency:
29-
group: ${{ github.ref }}-dev
37+
group: ${{ github.ref }}-${{ matrix.os }}-dev
3038
cancel-in-progress: ${{github.event_name == 'pull_request'}}
3139

3240
steps:
@@ -40,12 +48,16 @@ jobs:
4048
python-version: '3.10-dev'
4149

4250
- name: Install dependencies
51+
shell: bash
4352
run: |
4453
python -m pip install --upgrade pip setuptools wheel
45-
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
46-
pip install git+https://github.com/pytest-dev/pytest.git
54+
if [[ "$NUMPY_WHEELS_AVAILABLE" == "true" ]]; then
55+
pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
56+
else
57+
pip install git+https://github.com/numpy/numpy.git
58+
fi
4759
pip install git+https://github.com/nedbat/coveragepy.git
48-
pip install cython python-dateutil pytz hypothesis pytest-xdist pytest-cov
60+
pip install cython python-dateutil pytz hypothesis pytest>=6.2.5 pytest-xdist pytest-cov
4961
pip list
5062
5163
- name: Build Pandas
@@ -58,10 +70,9 @@ jobs:
5870
python -c "import pandas; pandas.show_versions();"
5971
6072
- name: Test with pytest
73+
shell: bash
6174
run: |
6275
ci/run_tests.sh
63-
# GH 41935
64-
continue-on-error: true
6576
6677
- name: Publish test results
6778
uses: actions/upload-artifact@master

.github/workflows/sdist.yml

+14-3
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
strategy:
2424
fail-fast: false
2525
matrix:
26-
python-version: ["3.8", "3.9"]
26+
python-version: ["3.8", "3.9", "3.10"]
2727
concurrency:
2828
group: ${{github.ref}}-${{matrix.python-version}}-sdist
2929
cancel-in-progress: ${{github.event_name == 'pull_request'}}
@@ -53,13 +53,24 @@ jobs:
5353
- uses: conda-incubator/setup-miniconda@v2
5454
with:
5555
activate-environment: pandas-sdist
56-
python-version: ${{ matrix.python-version }}
56+
python-version: '${{ matrix.python-version }}'
5757

5858
- name: Install pandas from sdist
5959
run: |
60-
conda list
60+
pip list
6161
python -m pip install dist/*.gz
6262
63+
- name: Force oldest supported NumPy
64+
run: |
65+
case "${{matrix.python-version}}" in
66+
3.8)
67+
pip install numpy==1.18.5 ;;
68+
3.9)
69+
pip install numpy==1.19.3 ;;
70+
3.10)
71+
pip install numpy==1.21.2 ;;
72+
esac
73+
6374
- name: Import pandas
6475
run: |
6576
cd ..

.pre-commit-config.yaml

+6
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,12 @@ repos:
135135
entry: 'np\.random\.seed'
136136
files: ^asv_bench/benchmarks
137137
exclude: ^asv_bench/benchmarks/pandas_vb_common\.py
138+
- id: np-testing-array-equal
139+
name: Check for usage of numpy testing or array_equal
140+
language: pygrep
141+
entry: '(numpy|np)(\.testing|\.array_equal)'
142+
files: ^pandas/tests/
143+
types: [python]
138144
- id: invalid-ea-testing
139145
name: Check for invalid EA testing
140146
language: pygrep

asv_bench/benchmarks/indexing_engines.py

+35-11
Original file line numberDiff line numberDiff line change
@@ -35,25 +35,49 @@ class NumericEngineIndexing:
3535
params = [
3636
_get_numeric_engines(),
3737
["monotonic_incr", "monotonic_decr", "non_monotonic"],
38+
[True, False],
39+
[10 ** 5, 2 * 10 ** 6], # 2e6 is above SIZE_CUTOFF
3840
]
39-
param_names = ["engine_and_dtype", "index_type"]
41+
param_names = ["engine_and_dtype", "index_type", "unique", "N"]
4042

41-
def setup(self, engine_and_dtype, index_type):
43+
def setup(self, engine_and_dtype, index_type, unique, N):
4244
engine, dtype = engine_and_dtype
43-
N = 10 ** 5
44-
values = list([1] * N + [2] * N + [3] * N)
45-
arr = {
46-
"monotonic_incr": np.array(values, dtype=dtype),
47-
"monotonic_decr": np.array(list(reversed(values)), dtype=dtype),
48-
"non_monotonic": np.array([1, 2, 3] * N, dtype=dtype),
49-
}[index_type]
45+
46+
if index_type == "monotonic_incr":
47+
if unique:
48+
arr = np.arange(N * 3, dtype=dtype)
49+
else:
50+
values = list([1] * N + [2] * N + [3] * N)
51+
arr = np.array(values, dtype=dtype)
52+
elif index_type == "monotonic_decr":
53+
if unique:
54+
arr = np.arange(N * 3, dtype=dtype)[::-1]
55+
else:
56+
values = list([1] * N + [2] * N + [3] * N)
57+
arr = np.array(values, dtype=dtype)[::-1]
58+
else:
59+
assert index_type == "non_monotonic"
60+
if unique:
61+
arr = np.empty(N * 3, dtype=dtype)
62+
arr[:N] = np.arange(N * 2, N * 3, dtype=dtype)
63+
arr[N:] = np.arange(N * 2, dtype=dtype)
64+
else:
65+
arr = np.array([1, 2, 3] * N, dtype=dtype)
5066

5167
self.data = engine(arr)
5268
# code belows avoids populating the mapping etc. while timing.
5369
self.data.get_loc(2)
5470

55-
def time_get_loc(self, engine_and_dtype, index_type):
56-
self.data.get_loc(2)
71+
self.key_middle = arr[len(arr) // 2]
72+
self.key_early = arr[2]
73+
74+
def time_get_loc(self, engine_and_dtype, index_type, unique, N):
75+
self.data.get_loc(self.key_early)
76+
77+
def time_get_loc_near_middle(self, engine_and_dtype, index_type, unique, N):
78+
# searchsorted performance may be different near the middle of a range
79+
# vs near an endpoint
80+
self.data.get_loc(self.key_middle)
5781

5882

5983
class ObjectEngineIndexing:

asv_bench/benchmarks/io/csv.py

+29
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@
1010
from pandas import (
1111
Categorical,
1212
DataFrame,
13+
concat,
1314
date_range,
1415
read_csv,
1516
to_datetime,
@@ -459,6 +460,34 @@ def time_read_special_date(self, value, engine):
459460
)
460461

461462

463+
class ReadCSVMemMapUTF8:
464+
465+
fname = "__test__.csv"
466+
number = 5
467+
468+
def setup(self):
469+
lines = []
470+
line_length = 128
471+
start_char = " "
472+
end_char = "\U00010080"
473+
# This for loop creates a list of 128-char strings
474+
# consisting of consecutive Unicode chars
475+
for lnum in range(ord(start_char), ord(end_char), line_length):
476+
line = "".join([chr(c) for c in range(lnum, lnum + 0x80)]) + "\n"
477+
try:
478+
line.encode("utf-8")
479+
except UnicodeEncodeError:
480+
# Some 16-bit words are not valid Unicode chars and must be skipped
481+
continue
482+
lines.append(line)
483+
df = DataFrame(lines)
484+
df = concat([df for n in range(100)], ignore_index=True)
485+
df.to_csv(self.fname, index=False, header=False, encoding="utf-8")
486+
487+
def time_read_memmapped_utf8(self):
488+
read_csv(self.fname, header=None, memory_map=True, encoding="utf-8", engine="c")
489+
490+
462491
class ParseDateComparison(StringIORewind):
463492
params = ([False, True],)
464493
param_names = ["cache_dates"]

ci/deps/actions-38-db-min.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
- python=3.8
66

77
# tools
8-
- cython>=0.29.21
8+
- cython>=0.29.24
99
- pytest>=6.0
1010
- pytest-cov
1111
- pytest-xdist>=1.31

ci/deps/actions-38-db.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
- python=3.8
66

77
# tools
8-
- cython>=0.29.21
8+
- cython>=0.29.24
99
- pytest>=6.0
1010
- pytest-xdist>=1.31
1111
- hypothesis>=5.5.3

ci/deps/actions-38-locale.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
- python=3.8
66

77
# tools
8-
- cython>=0.29.21
8+
- cython>=0.29.24
99
- pytest>=6.0
1010
- pytest-cov
1111
- pytest-xdist>=1.31

ci/deps/actions-38-locale_slow.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ dependencies:
66
- python=3.8
77

88
# tools
9-
- cython>=0.29.21
9+
- cython>=0.29.24
1010
- pytest>=6.0
1111
- pytest-cov
1212
- pytest-xdist>=1.31

ci/deps/actions-38-minimum_versions.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
- python=3.8.0
66

77
# tools
8-
- cython=0.29.21
8+
- cython=0.29.24
99
- pytest>=6.0
1010
- pytest-cov
1111
- pytest-xdist>=1.31

ci/deps/actions-38-slow.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
- python=3.8
66

77
# tools
8-
- cython>=0.29.21
8+
- cython>=0.29.24
99
- pytest>=6.0
1010
- pytest-cov
1111
- pytest-xdist>=1.31

ci/deps/actions-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ dependencies:
66
- python=3.8
77

88
# tools
9-
- cython>=0.29.21
9+
- cython>=0.29.24
1010
- pytest>=6.0
1111
- pytest-cov
1212
- pytest-xdist>=1.31

ci/deps/actions-39-numpydev.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ dependencies:
1515
- pytz
1616
- pip
1717
- pip:
18-
- cython==0.29.21 # GH#34014
18+
- cython==0.29.24 # GH#34014
1919
- "--extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple"
2020
- "--pre"
2121
- "numpy"

ci/deps/actions-39-slow.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ dependencies:
66
- python=3.9
77

88
# tools
9-
- cython>=0.29.21
9+
- cython>=0.29.24
1010
- pytest>=6.0
1111
- pytest-cov
1212
- pytest-xdist>=1.31

ci/deps/actions-39.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
- python=3.9
66

77
# tools
8-
- cython>=0.29.21
8+
- cython>=0.29.24
99
- pytest>=6.0
1010
- pytest-cov
1111
- pytest-xdist>=1.31

ci/deps/azure-macos-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,6 @@ dependencies:
3232
- xlwt
3333
- pip
3434
- pip:
35-
- cython>=0.29.21
35+
- cython>=0.29.24
3636
- pyreadstat
3737
- pyxlsb

ci/deps/azure-windows-38.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ dependencies:
66
- python=3.8
77

88
# tools
9-
- cython>=0.29.21
9+
- cython>=0.29.24
1010
- pytest>=6.0
1111
- pytest-xdist>=1.31
1212
- hypothesis>=5.5.3

ci/deps/azure-windows-39.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ dependencies:
66
- python=3.9
77

88
# tools
9-
- cython>=0.29.21
9+
- cython>=0.29.24
1010
- pytest>=6.0
1111
- pytest-xdist>=1.31
1212
- hypothesis>=5.5.3

ci/deps/circle-38-arm64.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ dependencies:
55
- python=3.8
66

77
# tools
8-
- cython>=0.29.21
8+
- cython>=0.29.24
99
- pytest>=6.0
1010
- pytest-xdist>=1.31
1111
- hypothesis>=5.5.3

doc/source/reference/extensions.rst

+1
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,7 @@ objects.
6060
api.extensions.ExtensionArray.nbytes
6161
api.extensions.ExtensionArray.ndim
6262
api.extensions.ExtensionArray.shape
63+
api.extensions.ExtensionArray.tolist
6364
6465
Additionally, we have some utility methods for ensuring your object
6566
behaves correctly.

doc/source/reference/window.rst

+1
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@ Exponentially-weighted window functions
8888
:toctree: api/
8989

9090
ExponentialMovingWindow.mean
91+
ExponentialMovingWindow.sum
9192
ExponentialMovingWindow.std
9293
ExponentialMovingWindow.var
9394
ExponentialMovingWindow.corr

doc/source/user_guide/indexing.rst

+9
Original file line numberDiff line numberDiff line change
@@ -997,6 +997,15 @@ a list of items you want to check for.
997997
998998
df.isin(values)
999999
1000+
To return the DataFrame of booleans where the values are *not* in the original DataFrame,
1001+
use the ``~`` operator:
1002+
1003+
.. ipython:: python
1004+
1005+
values = {'ids': ['a', 'b'], 'vals': [1, 3]}
1006+
1007+
~df.isin(values)
1008+
10001009
Combine DataFrame's ``isin`` with the ``any()`` and ``all()`` methods to
10011010
quickly select subsets of your data that meet a given criteria.
10021011
To select a row where each column meets its own criterion:

0 commit comments

Comments
 (0)