Skip to content

Commit 6ff583c

Browse files
authored
Merge branch 'pandas-dev:main' into read-json-engine-argument
2 parents a20e415 + 8b503a8 commit 6ff583c

File tree

263 files changed

+3032
-3794
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

263 files changed

+3032
-3794
lines changed

.github/ISSUE_TEMPLATE/feature_request.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -68,5 +68,5 @@ body:
6868
attributes:
6969
label: Additional Context
7070
description: >
71-
Please provide any relevant Github issues, code examples or references that help describe and support
71+
Please provide any relevant GitHub issues, code examples or references that help describe and support
7272
the feature request.

.github/PULL_REQUEST_TEMPLATE.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
- [ ] closes #xxxx (Replace xxxx with the Github issue number)
1+
- [ ] closes #xxxx (Replace xxxx with the GitHub issue number)
22
- [ ] [Tests added and passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#writing-tests) if fixing a bug or adding a new feature
33
- [ ] All [code checks passed](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#pre-commit).
44
- [ ] Added [type annotations](https://pandas.pydata.org/pandas-docs/dev/development/contributing_codebase.html#type-hints) to new arguments/methods/functions.

.github/workflows/macos-windows.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
name: Windows-MacOS
1+
name: Windows-macOS
22

33
on:
44
push:

.github/workflows/python-dev.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
#
44
# In general, this file will remain frozen(present, but not running) until:
55
# - The next unreleased Python version has released beta 1
6-
# - This version should be available on Github Actions.
6+
# - This version should be available on GitHub Actions.
77
# - Our required build/runtime dependencies(numpy, pytz, Cython, python-dateutil)
88
# support that unreleased Python version.
99
# To unfreeze, comment out the ``if: false`` condition, and make sure you update

.github/workflows/ubuntu.yml

+1-5
Original file line numberDiff line numberDiff line change
@@ -31,9 +31,7 @@ jobs:
3131
matrix:
3232
env_file: [actions-38.yaml, actions-39.yaml, actions-310.yaml]
3333
pattern: ["not single_cpu", "single_cpu"]
34-
# Don't test pyarrow v2/3: Causes timeouts in read_csv engine
35-
# even if tests are skipped/xfailed
36-
pyarrow_version: ["5", "6", "7"]
34+
pyarrow_version: ["7", "8", "9"]
3735
include:
3836
- name: "Downstream Compat"
3937
env_file: actions-38-downstream_compat.yaml
@@ -77,7 +75,6 @@ jobs:
7775
- name: "Numpy Dev"
7876
env_file: actions-310-numpydev.yaml
7977
pattern: "not slow and not network and not single_cpu"
80-
pandas_testing_mode: "deprecate"
8178
test_args: "-W error::DeprecationWarning:numpy -W error::FutureWarning:numpy"
8279
exclude:
8380
- env_file: actions-39.yaml
@@ -96,7 +93,6 @@ jobs:
9693
EXTRA_APT: ${{ matrix.extra_apt || '' }}
9794
LANG: ${{ matrix.lang || '' }}
9895
LC_ALL: ${{ matrix.lc_all || '' }}
99-
PANDAS_TESTING_MODE: ${{ matrix.pandas_testing_mode || '' }}
10096
PANDAS_DATA_MANAGER: ${{ matrix.pandas_data_manager || 'block' }}
10197
PANDAS_COPY_ON_WRITE: ${{ matrix.pandas_copy_on_write || '0' }}
10298
TEST_ARGS: ${{ matrix.test_args || '' }}

.github/workflows/wheels.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ jobs:
4444
# Ensure that a wheel builder finishes even if another fails
4545
fail-fast: false
4646
matrix:
47-
# Github Actions doesn't support pairing matrix values together, let's improvise
47+
# GitHub Actions doesn't support pairing matrix values together, let's improvise
4848
# https://github.com/github/feedback/discussions/7835#discussioncomment-1769026
4949
buildplat:
5050
- [ubuntu-20.04, manylinux_x86_64]

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -251,7 +251,7 @@ repos:
251251
files: ^(ci/deps/actions-.*-minimum_versions\.yaml|pandas/compat/_optional\.py)$
252252
- id: validate-errors-locations
253253
name: Validate errors locations
254-
description: Validate errors are in approriate locations.
254+
description: Validate errors are in appropriate locations.
255255
entry: python scripts/validate_exception_location.py
256256
language: python
257257
files: ^pandas/

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
[![Coverage](https://codecov.io/github/pandas-dev/pandas/coverage.svg?branch=main)](https://codecov.io/gh/pandas-dev/pandas)
1414
[![OpenSSF Scorecard](https://api.securityscorecards.dev/projects/github.com/pandas-dev/pandas/badge)](https://api.securityscorecards.dev/projects/github.com/pandas-dev/pandas)
1515
[![Downloads](https://static.pepy.tech/personalized-badge/pandas?period=month&units=international_system&left_color=black&right_color=orange&left_text=PyPI%20downloads%20per%20month)](https://pepy.tech/project/pandas)
16-
[![Gitter](https://badges.gitter.im/Join%20Chat.svg)](https://gitter.im/pydata/pandas)
16+
[![Slack](https://img.shields.io/badge/join_Slack-information-brightgreen.svg?logo=slack)](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack)
1717
[![Powered by NumFOCUS](https://img.shields.io/badge/powered%20by-NumFOCUS-orange.svg?style=flat&colorA=E1523D&colorB=007D8A)](https://numfocus.org)
1818
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)
1919
[![Imports: isort](https://img.shields.io/badge/%20imports-isort-%231674b1?style=flat&labelColor=ef8336)](https://pycqa.github.io/isort/)
@@ -152,7 +152,7 @@ For usage questions, the best place to go to is [StackOverflow](https://stackove
152152
Further, general questions and discussions can also take place on the [pydata mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata).
153153

154154
## Discussion and Development
155-
Most development discussions take place on GitHub in this repo. Further, the [pandas-dev mailing list](https://mail.python.org/mailman/listinfo/pandas-dev) can also be used for specialized discussions or design issues, and a [Gitter channel](https://gitter.im/pydata/pandas) is available for quick development related questions.
155+
Most development discussions take place on GitHub in this repo. Further, the [pandas-dev mailing list](https://mail.python.org/mailman/listinfo/pandas-dev) can also be used for specialized discussions or design issues, and a [Slack channel](https://pandas.pydata.org/docs/dev/development/community.html?highlight=slack#community-slack) is available for quick development related questions.
156156

157157
## Contributing to pandas [![Open Source Helpers](https://www.codetriage.com/pandas-dev/pandas/badges/users.svg)](https://www.codetriage.com/pandas-dev/pandas)
158158

asv_bench/benchmarks/array.py

+21
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,24 @@ def time_from_integer_array(self):
4444
pd.array(self.values_integer, dtype="Int64")
4545

4646

47+
class StringArray:
48+
def setup(self):
49+
N = 100_000
50+
values = tm.rands_array(3, N)
51+
self.values_obj = np.array(values, dtype="object")
52+
self.values_str = np.array(values, dtype="U")
53+
self.values_list = values.tolist()
54+
55+
def time_from_np_object_array(self):
56+
pd.array(self.values_obj, dtype="string")
57+
58+
def time_from_np_str_array(self):
59+
pd.array(self.values_str, dtype="string")
60+
61+
def time_from_list(self):
62+
pd.array(self.values_list, dtype="string")
63+
64+
4765
class ArrowStringArray:
4866

4967
params = [False, True]
@@ -71,3 +89,6 @@ def time_setitem_list(self, multiple_chunks):
7189

7290
def time_setitem_slice(self, multiple_chunks):
7391
self.array[::10] = "foo"
92+
93+
def time_tolist(self, multiple_chunks):
94+
self.array.tolist()

asv_bench/benchmarks/join_merge.py

+34
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44

55
from pandas import (
66
DataFrame,
7+
Index,
78
MultiIndex,
89
Series,
910
array,
@@ -92,6 +93,39 @@ def time_f_ordered(self, axis, ignore_index):
9293
concat(self.frame_f, axis=axis, ignore_index=ignore_index)
9394

9495

96+
class ConcatIndexDtype:
97+
98+
params = (
99+
["datetime64[ns]", "int64", "Int64", "string[python]", "string[pyarrow]"],
100+
[0, 1],
101+
[True, False],
102+
[True, False],
103+
)
104+
param_names = ["dtype", "axis", "sort", "is_monotonic"]
105+
106+
def setup(self, dtype, axis, sort, is_monotonic):
107+
N = 10_000
108+
if dtype == "datetime64[ns]":
109+
vals = date_range("1970-01-01", periods=N)
110+
elif dtype in ("int64", "Int64"):
111+
vals = np.arange(N, dtype=np.int64)
112+
elif dtype in ("string[python]", "string[pyarrow]"):
113+
vals = tm.makeStringIndex(N)
114+
else:
115+
raise NotImplementedError
116+
117+
idx = Index(vals, dtype=dtype)
118+
if is_monotonic:
119+
idx = idx.sort_values()
120+
else:
121+
idx = idx[::-1]
122+
123+
self.series = [Series(i, idx[i:]) for i in range(5)]
124+
125+
def time_concat_series(self, dtype, axis, sort, is_monotonic):
126+
concat(self.series, axis=axis, sort=sort)
127+
128+
95129
class Join:
96130

97131
params = [True, False]

asv_bench/benchmarks/multiindex_object.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -239,10 +239,11 @@ class SetOperations:
239239
("monotonic", "non_monotonic"),
240240
("datetime", "int", "string", "ea_int"),
241241
("intersection", "union", "symmetric_difference"),
242+
(False, None),
242243
]
243-
param_names = ["index_structure", "dtype", "method"]
244+
param_names = ["index_structure", "dtype", "method", "sort"]
244245

245-
def setup(self, index_structure, dtype, method):
246+
def setup(self, index_structure, dtype, method, sort):
246247
N = 10**5
247248
level1 = range(1000)
248249

@@ -272,8 +273,8 @@ def setup(self, index_structure, dtype, method):
272273
self.left = data[dtype]["left"]
273274
self.right = data[dtype]["right"]
274275

275-
def time_operation(self, index_structure, dtype, method):
276-
getattr(self.left, method)(self.right)
276+
def time_operation(self, index_structure, dtype, method, sort):
277+
getattr(self.left, method)(self.right, sort=sort)
277278

278279

279280
class Difference:

ci/deps/actions-38-minimum_versions.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ dependencies:
1616
- boto3
1717

1818
# required dependencies
19-
- python-dateutil=2.8.1
19+
- python-dateutil=2.8.2
2020
- numpy=1.20.3
2121
- pytz=2020.1
2222

@@ -39,7 +39,7 @@ dependencies:
3939
- openpyxl=3.0.7
4040
- pandas-gbq=0.15.0
4141
- psycopg2=2.8.6
42-
- pyarrow=1.0.1
42+
- pyarrow=6.0.0
4343
- pymysql=1.0.2
4444
- pyreadstat=1.1.2
4545
- pytables=3.6.1

doc/_templates/pandas_footer.html

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
<p class="copyright">
2-
&copy {{copyright}} pandas via <a href="https://numfocus.org">NumFOCUS, Inc.</a> Hosted by <a href="https://www.ovhcloud.com">OVH Cloud</a>.
2+
&copy {{copyright}} pandas via <a href="https://numfocus.org">NumFOCUS, Inc.</a> Hosted by <a href="https://www.ovhcloud.com">OVHcloud</a>.
33
</p>

doc/redirects.csv

+1
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ contributing_docstring,development/contributing_docstring
4545
developer,development/developer
4646
extending,development/extending
4747
internals,development/internals
48+
development/meeting,community
4849

4950
# api moved function
5051
reference/api/pandas.io.json.json_normalize,pandas.json_normalize

doc/source/development/community.rst

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
.. _community:
2+
3+
=====================
4+
Contributor community
5+
=====================
6+
7+
pandas is a community-driven open source project developed by a large group
8+
of `contributors <https://github.com/pandas-dev/pandas/graphs/contributors>`_
9+
and a smaller group of `maintainers <https://pandas.pydata.org/about/team.html>`_.
10+
The pandas leadership has made a strong commitment to creating an open,
11+
inclusive, and positive community. Please read the pandas `Code of Conduct
12+
<https://pandas.pydata.org/community/coc.html>`_ for guidance on how to
13+
interact with others in a way that makes the community thrive.
14+
15+
We offer several meetings and communication channels to share knowledge and
16+
connect with others within the pandas community.
17+
18+
Community meeting
19+
-----------------
20+
21+
The pandas Community Meeting is a regular sync meeting for the project's
22+
maintainers which is open to the community. Everyone is welcome to attend and
23+
contribute to conversations.
24+
25+
The meetings take place on the second Wednesday of each month at 18:00 UTC.
26+
27+
The minutes of past meetings are available in `this Google Document <https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?usp=sharing>`__.
28+
29+
30+
New contributor meeting
31+
-----------------------
32+
33+
On the third Wednesday of the month, we hold meetings to welcome and support
34+
new contributors in our community.
35+
36+
| 👋 you all are invited
37+
| 💬 everyone can present (add yourself to the hackMD agenda)
38+
| 👀 anyone can sit in and listen
39+
40+
Attendees are new and experienced contributors, as well as a few maintainers.
41+
We aim to answer questions about getting started, or help with work in
42+
progress when possible, as well as get to know each other and share our
43+
learnings and experiences.
44+
45+
The agenda for the next meeting and minutes of past meetings are available in
46+
`this HackMD <https://hackmd.io/@pandas-dev/HJgQt1Tei>`__.
47+
48+
Calendar
49+
--------
50+
51+
This calendar shows all the community meetings. Our community meetings are
52+
ideal for anyone wanting to contribute to pandas, or just curious to know how
53+
current development is going.
54+
55+
.. raw:: html
56+
57+
<iframe src="https://calendar.google.com/calendar/embed?src=pgbn14p6poja8a1cf2dv2jhrmg%40group.calendar.google.com" style="border: 0" width="800" height="600" frameborder="0" scrolling="no"></iframe>
58+
59+
You can subscribe to this calendar with the following links:
60+
61+
* `iCal <https://calendar.google.com/calendar/ical/pgbn14p6poja8a1cf2dv2jhrmg%40group.calendar.google.com/public/basic.ics>`__
62+
* `Google calendar <https://calendar.google.com/calendar/[email protected]>`__
63+
64+
Additionally, we'll sometimes have one-off meetings on specific topics.
65+
These will be published on the same calendar.
66+
67+
`GitHub issue tracker <https://github.com/pandas-dev/pandas/issues>`_
68+
----------------------------------------------------------------------
69+
70+
The pandas contributor community conducts conversations mainly via this channel.
71+
Any community member can open issues to:
72+
73+
- Report bugs, e.g. "I noticed the behavior of a certain function is
74+
incorrect"
75+
- Request features, e.g. "I would like this error message to be more readable"
76+
- Request documentation improvements, e.g. "I found this section unclear"
77+
- Ask questions, e.g. "I noticed the behavior of a certain function
78+
changed between versions. Is this expected?".
79+
80+
Ideally your questions should be related to how pandas work rather
81+
than how you use pandas. `StackOverflow <https://stackoverflow.com/>`_ is
82+
better suited for answering usage questions, and we ask that all usage
83+
questions are first asked on StackOverflow. Thank you for respecting are
84+
time and wishes. 🙇
85+
86+
Maintainers and frequent contributors might also open issues to discuss the
87+
ongoing development of the project. For example:
88+
89+
- Report issues with the CI, GitHub Actions, or the performance of pandas
90+
- Open issues relating to the internals
91+
- Start roadmap discussion aligning on proposals what to do in future
92+
releases or changes to the API.
93+
- Open issues relating to the project's website, logo, or governance
94+
95+
The developer mailing list
96+
--------------------------
97+
98+
The pandas mailing list `[email protected] <mailto://pandas-dev@python
99+
.org>`_ is used for long form
100+
conversations and to engages people in the wider community who might not
101+
be active on the issue tracker but we would like to include in discussions.
102+
103+
Community slack
104+
---------------
105+
106+
We have a chat platform for contributors, maintainers and potential
107+
contributors. This is not a space for user questions, rather for questions about
108+
contributing to pandas. The slack is a private space, specifically meant for
109+
people who are hesitant to bring up their questions or ideas on a large public
110+
mailing list or GitHub.
111+
112+
If this sounds like the right place for you, you are welcome to join! Email us
113+
at `[email protected] <mailto://[email protected]>`_ and let us
114+
know that you read and agree to our `Code of Conduct <https://pandas.pydata.org/community/coc.html>`_
115+
😉 to get an invite. And please remember the slack is not meant to replace the
116+
mailing list or issue tracker - all important announcements and conversations
117+
should still happen there.

0 commit comments

Comments
 (0)