Skip to content

Commit bb313e7

Browse files
Merge remote-tracking branch 'upstream/main' into bisect
2 parents 5743851 + f99ec8b commit bb313e7

File tree

156 files changed

+3432
-1238
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

156 files changed

+3432
-1238
lines changed

.circleci/config.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ jobs:
88
environment:
99
ENV_FILE: ci/deps/circle-38-arm64.yaml
1010
PYTEST_WORKERS: auto
11-
PATTERN: "not slow and not network and not clipboard and not arm_slow"
11+
PATTERN: "not single_cpu and not slow and not network and not clipboard and not arm_slow and not db"
1212
PYTEST_TARGET: "pandas"
1313
PANDAS_CI: "1"
1414
steps:

.github/workflows/code-checks.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -74,7 +74,7 @@ jobs:
7474

7575
- name: Install pyright
7676
# note: keep version in sync with .pre-commit-config.yaml
77-
run: npm install -g [email protected].212
77+
run: npm install -g [email protected].230
7878

7979
- name: Build Pandas
8080
id: build

.github/workflows/posix.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ jobs:
162162
shell: bash
163163
run: |
164164
# TODO: re-enable cov, its slowing the tests down though
165-
pip install Cython numpy python-dateutil pytz pytest>=6.0 pytest-xdist>=1.31.0 hypothesis>=5.5.3
165+
pip install Cython numpy python-dateutil pytz pytest>=6.0 pytest-xdist>=1.31.0 pytest-asyncio hypothesis>=5.5.3
166166
if: ${{ env.IS_PYPY == 'true' }}
167167

168168
- name: Build Pandas

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ repos:
8585
types: [python]
8686
stages: [manual]
8787
# note: keep version in sync with .github/workflows/code-checks.yml
88-
additional_dependencies: ['[email protected].212']
88+
additional_dependencies: ['[email protected].230']
8989
- repo: local
9090
hooks:
9191
- id: flake8-rst

asv_bench/benchmarks/array.py

+31
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,8 @@
22

33
import pandas as pd
44

5+
from .pandas_vb_common import tm
6+
57

68
class BooleanArray:
79
def setup(self):
@@ -39,3 +41,32 @@ def time_constructor(self):
3941

4042
def time_from_integer_array(self):
4143
pd.array(self.values_integer, dtype="Int64")
44+
45+
46+
class ArrowStringArray:
47+
48+
params = [False, True]
49+
param_names = ["multiple_chunks"]
50+
51+
def setup(self, multiple_chunks):
52+
try:
53+
import pyarrow as pa
54+
except ImportError:
55+
raise NotImplementedError
56+
strings = tm.rands_array(3, 10_000)
57+
if multiple_chunks:
58+
chunks = [strings[i : i + 100] for i in range(0, len(strings), 100)]
59+
self.array = pd.arrays.ArrowStringArray(pa.chunked_array(chunks))
60+
else:
61+
self.array = pd.arrays.ArrowStringArray(pa.array(strings))
62+
63+
def time_setitem(self, multiple_chunks):
64+
for i in range(200):
65+
self.array[i] = "foo"
66+
67+
def time_setitem_list(self, multiple_chunks):
68+
indexer = list(range(0, 50)) + list(range(-50, 0))
69+
self.array[indexer] = ["foo"] * len(indexer)
70+
71+
def time_setitem_slice(self, multiple_chunks):
72+
self.array[::10] = "foo"

asv_bench/benchmarks/categoricals.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ def time_remove_categories(self):
187187
class Rank:
188188
def setup(self):
189189
N = 10**5
190-
ncats = 100
190+
ncats = 15
191191

192192
self.s_str = pd.Series(tm.makeCategoricalIndex(N, ncats)).astype(str)
193193
self.s_str_cat = pd.Series(self.s_str, dtype="category")

ci/deps/circle-38-arm64.yaml

+41-8
Original file line numberDiff line numberDiff line change
@@ -4,19 +4,52 @@ channels:
44
dependencies:
55
- python=3.8
66

7-
# tools
8-
- cython>=0.29.24
7+
# test dependencies
8+
- cython=0.29.24
99
- pytest>=6.0
10+
- pytest-cov
1011
- pytest-xdist>=1.31
1112
- hypothesis>=5.5.3
13+
- psutil
1214
- pytest-asyncio
15+
- boto3
1316

14-
# pandas dependencies
15-
- botocore>=1.11
16-
- flask
17-
- moto
18-
- numpy
17+
# required dependencies
1918
- python-dateutil
19+
- numpy
2020
- pytz
21+
22+
# optional dependencies
23+
- beautifulsoup4
24+
- blosc
25+
- bottleneck
26+
- brotlipy
27+
- fastparquet
28+
- fsspec
29+
- html5lib
30+
- gcsfs
31+
- jinja2
32+
- lxml
33+
- matplotlib
34+
- numba
35+
- numexpr
36+
- openpyxl
37+
- odfpy
38+
- pandas-gbq
39+
- psycopg2
40+
- pyarrow
41+
- pymysql
42+
# Not provided on ARM
43+
#- pyreadstat
44+
- pytables
45+
- python-snappy
46+
- pyxlsb
47+
- s3fs
48+
- scipy
49+
- sqlalchemy
50+
- tabulate
51+
- xarray
52+
- xlrd
53+
- xlsxwriter
54+
- xlwt
2155
- zstandard
22-
- pip

doc/source/getting_started/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -276,7 +276,7 @@ Computation
276276
========================= ================== =============================================================
277277
Dependency Minimum Version Notes
278278
========================= ================== =============================================================
279-
SciPy 1.14.1 Miscellaneous statistical functions
279+
SciPy 1.4.1 Miscellaneous statistical functions
280280
numba 0.50.1 Alternative execution engine for rolling operations
281281
(see :ref:`Enhancing Performance <enhancingperf.numba>`)
282282
xarray 0.15.1 pandas-like API for N-dimensional data

doc/source/getting_started/intro_tutorials/03_subset_data.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -358,9 +358,9 @@ See the user guide section on :ref:`different choices for indexing <indexing.cho
358358
of column/row labels, a slice of labels, a conditional expression or
359359
a colon.
360360
- Select specific rows and/or columns using ``loc`` when using the row
361-
and column names
361+
and column names.
362362
- Select specific rows and/or columns using ``iloc`` when using the
363-
positions in the table
363+
positions in the table.
364364
- You can assign new values to a selection based on ``loc``/``iloc``.
365365

366366
.. raw:: html

doc/source/getting_started/intro_tutorials/04_plotting.rst

+8-8
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ method. Hence, the :meth:`~DataFrame.plot` method works on both ``Series`` and
8888
<ul class="task-bullet">
8989
<li>
9090

91-
I want to visually compare the :math:`N0_2` values measured in London versus Paris.
91+
I want to visually compare the :math:`NO_2` values measured in London versus Paris.
9292

9393
.. ipython:: python
9494
@@ -197,26 +197,26 @@ I want to further customize, extend or save the resulting plot.
197197
</ul>
198198

199199
Each of the plot objects created by pandas is a
200-
`matplotlib <https://matplotlib.org/>`__ object. As Matplotlib provides
200+
`Matplotlib <https://matplotlib.org/>`__ object. As Matplotlib provides
201201
plenty of options to customize plots, making the link between pandas and
202-
Matplotlib explicit enables all the power of matplotlib to the plot.
202+
Matplotlib explicit enables all the power of Matplotlib to the plot.
203203
This strategy is applied in the previous example:
204204

205205
::
206206

207-
fig, axs = plt.subplots(figsize=(12, 4)) # Create an empty matplotlib Figure and Axes
207+
fig, axs = plt.subplots(figsize=(12, 4)) # Create an empty Matplotlib Figure and Axes
208208
air_quality.plot.area(ax=axs) # Use pandas to put the area plot on the prepared Figure/Axes
209-
axs.set_ylabel("NO$_2$ concentration") # Do any matplotlib customization you like
210-
fig.savefig("no2_concentrations.png") # Save the Figure/Axes using the existing matplotlib method.
209+
axs.set_ylabel("NO$_2$ concentration") # Do any Matplotlib customization you like
210+
fig.savefig("no2_concentrations.png") # Save the Figure/Axes using the existing Matplotlib method.
211211

212212
.. raw:: html
213213

214214
<div class="shadow gs-callout gs-callout-remember">
215215
<h4>REMEMBER</h4>
216216

217-
- The ``.plot.*`` methods are applicable on both Series and DataFrames
217+
- The ``.plot.*`` methods are applicable on both Series and DataFrames.
218218
- By default, each of the columns is plotted as a different element
219-
(line, boxplot,…)
219+
(line, boxplot,…).
220220
- Any plot created by pandas is a Matplotlib object.
221221

222222
.. raw:: html

doc/source/getting_started/intro_tutorials/05_add_columns.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ How to create new columns derived from existing columns?
4141
<ul class="task-bullet">
4242
<li>
4343

44-
I want to express the :math:`NO_2` concentration of the station in London in mg/m\ :math:`^3`
44+
I want to express the :math:`NO_2` concentration of the station in London in mg/m\ :math:`^3`.
4545

4646
(*If we assume temperature of 25 degrees Celsius and pressure of 1013
4747
hPa, the conversion factor is 1.882*)
@@ -60,7 +60,7 @@ at the left side of the assignment.
6060
</ul>
6161

6262
.. note::
63-
The calculation of the values is done **element_wise**. This
63+
The calculation of the values is done **element-wise**. This
6464
means all values in the given column are multiplied by the value 1.882
6565
at once. You do not need to use a loop to iterate each of the rows!
6666

@@ -72,7 +72,7 @@ at the left side of the assignment.
7272
<ul class="task-bullet">
7373
<li>
7474

75-
I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column
75+
I want to check the ratio of the values in Paris versus Antwerp and save the result in a new column.
7676

7777
.. ipython:: python
7878
@@ -89,8 +89,8 @@ values in each row*.
8989
</li>
9090
</ul>
9191

92-
Also other mathematical operators (``+``, ``-``, ``\*``, ``/``) or
93-
logical operators (``<``, ``>``, ``=``,…) work element wise. The latter was already
92+
Also other mathematical operators (``+``, ``-``, ``*``, ``/``,…) or
93+
logical operators (``<``, ``>``, ``==``,…) work element-wise. The latter was already
9494
used in the :ref:`subset data tutorial <10min_tut_03_subset>` to filter
9595
rows of a table using a conditional expression.
9696

@@ -101,7 +101,7 @@ If you need more advanced logic, you can use arbitrary Python code via :meth:`~D
101101
<ul class="task-bullet">
102102
<li>
103103

104-
I want to rename the data columns to the corresponding station identifiers used by openAQ
104+
I want to rename the data columns to the corresponding station identifiers used by `OpenAQ <https://openaq.org/>`__.
105105

106106
.. ipython:: python
107107

doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst

+7-7
Original file line numberDiff line numberDiff line change
@@ -74,15 +74,15 @@ What is the median age and ticket fare price of the Titanic passengers?
7474
titanic[["Age", "Fare"]].median()
7575
7676
The statistic applied to multiple columns of a ``DataFrame`` (the selection of two columns
77-
return a ``DataFrame``, see the :ref:`subset data tutorial <10min_tut_03_subset>`) is calculated for each numeric column.
77+
returns a ``DataFrame``, see the :ref:`subset data tutorial <10min_tut_03_subset>`) is calculated for each numeric column.
7878

7979
.. raw:: html
8080

8181
</li>
8282
</ul>
8383

8484
The aggregating statistic can be calculated for multiple columns at the
85-
same time. Remember the ``describe`` function from :ref:`first tutorial <10min_tut_01_tableoriented>`?
85+
same time. Remember the ``describe`` function from the :ref:`first tutorial <10min_tut_01_tableoriented>`?
8686

8787
.. ipython:: python
8888
@@ -161,7 +161,7 @@ columns:
161161
titanic.groupby("Sex").mean()
162162
163163
It does not make much sense to get the average value of the ``Pclass``.
164-
if we are only interested in the average age for each gender, the
164+
If we are only interested in the average age for each gender, the
165165
selection of columns (rectangular brackets ``[]`` as usual) is supported
166166
on the grouped data as well:
167167

@@ -254,7 +254,7 @@ within each group:
254254
<div class="d-flex flex-row gs-torefguide">
255255
<span class="badge badge-info">To user guide</span>
256256

257-
The user guide has a dedicated section on ``value_counts`` , see page on :ref:`discretization <basics.discretization>`.
257+
The user guide has a dedicated section on ``value_counts`` , see the page on :ref:`discretization <basics.discretization>`.
258258

259259
.. raw:: html
260260

@@ -265,10 +265,10 @@ The user guide has a dedicated section on ``value_counts`` , see page on :ref:`d
265265
<div class="shadow gs-callout gs-callout-remember">
266266
<h4>REMEMBER</h4>
267267

268-
- Aggregation statistics can be calculated on entire columns or rows
269-
- ``groupby`` provides the power of the *split-apply-combine* pattern
268+
- Aggregation statistics can be calculated on entire columns or rows.
269+
- ``groupby`` provides the power of the *split-apply-combine* pattern.
270270
- ``value_counts`` is a convenient shortcut to count the number of
271-
entries in each category of a variable
271+
entries in each category of a variable.
272272

273273
.. raw:: html
274274

0 commit comments

Comments
 (0)