Skip to content

Commit a2856e1

Browse files
author
MarcoGorelli
committed
Merge remote-tracking branch 'upstream/main' into implementation-pdep-4
2 parents bc0cd67 + 90b4add commit a2856e1

File tree

174 files changed

+1427
-1638
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

174 files changed

+1427
-1638
lines changed

asv_bench/benchmarks/multiindex_object.py

+5-4
Original file line numberDiff line numberDiff line change
@@ -239,10 +239,11 @@ class SetOperations:
239239
("monotonic", "non_monotonic"),
240240
("datetime", "int", "string", "ea_int"),
241241
("intersection", "union", "symmetric_difference"),
242+
(False, None),
242243
]
243-
param_names = ["index_structure", "dtype", "method"]
244+
param_names = ["index_structure", "dtype", "method", "sort"]
244245

245-
def setup(self, index_structure, dtype, method):
246+
def setup(self, index_structure, dtype, method, sort):
246247
N = 10**5
247248
level1 = range(1000)
248249

@@ -272,8 +273,8 @@ def setup(self, index_structure, dtype, method):
272273
self.left = data[dtype]["left"]
273274
self.right = data[dtype]["right"]
274275

275-
def time_operation(self, index_structure, dtype, method):
276-
getattr(self.left, method)(self.right)
276+
def time_operation(self, index_structure, dtype, method, sort):
277+
getattr(self.left, method)(self.right, sort=sort)
277278

278279

279280
class Difference:

doc/redirects.csv

+1
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ contributing_docstring,development/contributing_docstring
4545
developer,development/developer
4646
extending,development/extending
4747
internals,development/internals
48+
development/meeting,community
4849

4950
# api moved function
5051
reference/api/pandas.io.json.json_normalize,pandas.json_normalize

doc/source/development/community.rst

+117
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
.. _community:
2+
3+
=====================
4+
Contributor community
5+
=====================
6+
7+
pandas is a community-driven open source project developed by a large group
8+
of `contributors <https://github.com/pandas-dev/pandas/graphs/contributors>`_
9+
and a smaller group of `maintainers <https://pandas.pydata.org/about/team.html>`_.
10+
The pandas leadership has made a strong commitment to creating an open,
11+
inclusive, and positive community. Please read the pandas `Code of Conduct
12+
<https://pandas.pydata.org/community/coc.html>`_ for guidance on how to
13+
interact with others in a way that makes the community thrive.
14+
15+
We offer several meetings and communication channels to share knowledge and
16+
connect with others within the pandas community.
17+
18+
Community meeting
19+
-----------------
20+
21+
The pandas Community Meeting is a regular sync meeting for the project's
22+
maintainers which is open to the community. Everyone is welcome to attend and
23+
contribute to conversations.
24+
25+
The meetings take place on the second Wednesday of each month at 18:00 UTC.
26+
27+
The minutes of past meetings are available in `this Google Document <https://docs.google.com/document/d/1tGbTiYORHiSPgVMXawiweGJlBw5dOkVJLY-licoBmBU/edit?usp=sharing>`__.
28+
29+
30+
New contributor meeting
31+
-----------------------
32+
33+
On the third Wednesday of the month, we hold meetings to welcome and support
34+
new contributors in our community.
35+
36+
| 👋 you all are invited
37+
| 💬 everyone can present (add yourself to the hackMD agenda)
38+
| 👀 anyone can sit in and listen
39+
40+
Attendees are new and experienced contributors, as well as a few maintainers.
41+
We aim to answer questions about getting started, or help with work in
42+
progress when possible, as well as get to know each other and share our
43+
learnings and experiences.
44+
45+
The agenda for the next meeting and minutes of past meetings are available in
46+
`this HackMD <https://hackmd.io/@pandas-dev/HJgQt1Tei>`__.
47+
48+
Calendar
49+
--------
50+
51+
This calendar shows all the community meetings. Our community meetings are
52+
ideal for anyone wanting to contribute to pandas, or just curious to know how
53+
current development is going.
54+
55+
.. raw:: html
56+
57+
<iframe src="https://calendar.google.com/calendar/embed?src=pgbn14p6poja8a1cf2dv2jhrmg%40group.calendar.google.com" style="border: 0" width="800" height="600" frameborder="0" scrolling="no"></iframe>
58+
59+
You can subscribe to this calendar with the following links:
60+
61+
* `iCal <https://calendar.google.com/calendar/ical/pgbn14p6poja8a1cf2dv2jhrmg%40group.calendar.google.com/public/basic.ics>`__
62+
* `Google calendar <https://calendar.google.com/calendar/[email protected]>`__
63+
64+
Additionally, we'll sometimes have one-off meetings on specific topics.
65+
These will be published on the same calendar.
66+
67+
`GitHub issue tracker <https://github.com/pandas-dev/pandas/issues>`_
68+
----------------------------------------------------------------------
69+
70+
The pandas contributor community conducts conversations mainly via this channel.
71+
Any community member can open issues to:
72+
73+
- Report bugs, e.g. "I noticed the behavior of a certain function is
74+
incorrect"
75+
- Request features, e.g. "I would like this error message to be more readable"
76+
- Request documentation improvements, e.g. "I found this section unclear"
77+
- Ask questions, e.g. "I noticed the behavior of a certain function
78+
changed between versions. Is this expected?".
79+
80+
Ideally your questions should be related to how pandas work rather
81+
than how you use pandas. `StackOverflow <https://stackoverflow.com/>`_ is
82+
better suited for answering usage questions, and we ask that all usage
83+
questions are first asked on StackOverflow. Thank you for respecting are
84+
time and wishes. 🙇
85+
86+
Maintainers and frequent contributors might also open issues to discuss the
87+
ongoing development of the project. For example:
88+
89+
- Report issues with the CI, GitHub Actions, or the performance of pandas
90+
- Open issues relating to the internals
91+
- Start roadmap discussion aligning on proposals what to do in future
92+
releases or changes to the API.
93+
- Open issues relating to the project's website, logo, or governance
94+
95+
The developer mailing list
96+
--------------------------
97+
98+
The pandas mailing list `[email protected] <mailto://pandas-dev@python
99+
.org>`_ is used for long form
100+
conversations and to engages people in the wider community who might not
101+
be active on the issue tracker but we would like to include in discussions.
102+
103+
Community slack
104+
---------------
105+
106+
We have a chat platform for contributors, maintainers and potential
107+
contributors. This is not a space for user questions, rather for questions about
108+
contributing to pandas. The slack is a private space, specifically meant for
109+
people who are hesitant to bring up their questions or ideas on a large public
110+
mailing list or GitHub.
111+
112+
If this sounds like the right place for you, you are welcome to join! Email us
113+
at `[email protected] <mailto://[email protected]>`_ and let us
114+
know that you read and agree to our `Code of Conduct <https://pandas.pydata.org/community/coc.html>`_
115+
😉 to get an invite. And please remember the slack is not meant to replace the
116+
mailing list or issue tracker - all important announcements and conversations
117+
should still happen there.

doc/source/development/contributing_codebase.rst

+57-11
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,7 @@ Otherwise, you need to do it manually:
139139
warnings.warn(
140140
'Use new_func instead.',
141141
FutureWarning,
142-
stacklevel=find_stack_level(inspect.currentframe()),
142+
stacklevel=find_stack_level(),
143143
)
144144
new_func()
145145
@@ -790,14 +790,11 @@ Or with one of the following constructs::
790790
pytest pandas/tests/[test-module].py::[TestClass]
791791
pytest pandas/tests/[test-module].py::[TestClass]::[test_method]
792792

793-
Using `pytest-xdist <https://pypi.org/project/pytest-xdist>`_, one can
794-
speed up local testing on multicore machines. To use this feature, you will
795-
need to install ``pytest-xdist`` via::
796-
797-
pip install pytest-xdist
798-
799-
The ``-n`` flag then can be specified when running ``pytest`` to parallelize a test run
800-
across the number of specified cores or ``auto`` to utilize all the available cores on your machine.
793+
Using `pytest-xdist <https://pypi.org/project/pytest-xdist>`_, which is
794+
included in our 'pandas-dev' environment, one can speed up local testing on
795+
multicore machines. The ``-n`` number flag then can be specified when running
796+
pytest to parallelize a test run across the number of specified cores or auto to
797+
utilize all the available cores on your machine.
801798

802799
.. code-block:: bash
803800
@@ -807,8 +804,57 @@ across the number of specified cores or ``auto`` to utilize all the available co
807804
# Utilizes all available cores
808805
pytest -n auto pandas
809806
810-
This can significantly reduce the time it takes to locally run tests before
811-
submitting a pull request.
807+
If you'd like to speed things along further a more advanced use of this
808+
command would look like this
809+
810+
.. code-block:: bash
811+
812+
pytest pandas -n 4 -m "not slow and not network and not db and not single_cpu" -r sxX
813+
814+
In addition to the multithreaded performance increase this improves test
815+
speed by skipping some tests using the ``-m`` mark flag:
816+
817+
- slow: any test taking long (think seconds rather than milliseconds)
818+
- network: tests requiring network connectivity
819+
- db: tests requiring a database (mysql or postgres)
820+
- single_cpu: tests that should run on a single cpu only
821+
822+
You might want to enable the following option if it's relevant for you:
823+
824+
- arm_slow: any test taking long on arm64 architecture
825+
826+
These markers are defined `in this toml file <https://github.com/pandas-dev/pandas/blob/main/pyproject.toml>`_
827+
, under ``[tool.pytest.ini_options]`` in a list called ``markers``, in case
828+
you want to check if new ones have been created which are of interest to you.
829+
830+
The ``-r`` report flag will display a short summary info (see `pytest
831+
documentation <https://docs.pytest.org/en/4.6.x/usage.html#detailed-summary-report>`_)
832+
. Here we are displaying the number of:
833+
834+
- s: skipped tests
835+
- x: xfailed tests
836+
- X: xpassed tests
837+
838+
The summary is optional and can be removed if you don't need the added
839+
information. Using the parallelization option can significantly reduce the
840+
time it takes to locally run tests before submitting a pull request.
841+
842+
If you require assistance with the results,
843+
which has happened in the past, please set a seed before running the command
844+
and opening a bug report, that way we can reproduce it. Here's an example
845+
for setting a seed on windows
846+
847+
.. code-block:: bash
848+
849+
set PYTHONHASHSEED=314159265
850+
pytest pandas -n 4 -m "not slow and not network and not db and not single_cpu" -r sxX
851+
852+
On Unix use
853+
854+
.. code-block:: bash
855+
856+
export PYTHONHASHSEED=314159265
857+
pytest pandas -n 4 -m "not slow and not network and not db and not single_cpu" -r sxX
812858
813859
For more, see the `pytest <https://docs.pytest.org/en/latest/>`_ documentation.
814860

doc/source/development/index.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ Development
2323
developer
2424
policies
2525
roadmap
26-
meeting
26+
community

doc/source/development/meeting.rst

-31
This file was deleted.

doc/source/getting_started/comparison/comparison_with_r.rst

-4
Original file line numberDiff line numberDiff line change
@@ -21,10 +21,6 @@ libraries, we care about the following things:
2121
This page is also here to offer a bit of a translation guide for users of these
2222
R packages.
2323

24-
For transfer of ``DataFrame`` objects from pandas to R, one option is to
25-
use HDF5 files, see :ref:`io.external_compatibility` for an
26-
example.
27-
2824

2925
Quick reference
3026
---------------

doc/source/user_guide/io.rst

-93
Original file line numberDiff line numberDiff line change
@@ -5245,99 +5245,6 @@ You could inadvertently turn an actual ``nan`` value into a missing value.
52455245
store.append("dfss2", dfss, nan_rep="_nan_")
52465246
store.select("dfss2")
52475247
5248-
.. _io.external_compatibility:
5249-
5250-
External compatibility
5251-
''''''''''''''''''''''
5252-
5253-
``HDFStore`` writes ``table`` format objects in specific formats suitable for
5254-
producing loss-less round trips to pandas objects. For external
5255-
compatibility, ``HDFStore`` can read native ``PyTables`` format
5256-
tables.
5257-
5258-
It is possible to write an ``HDFStore`` object that can easily be imported into ``R`` using the
5259-
``rhdf5`` library (`Package website`_). Create a table format store like this:
5260-
5261-
.. _package website: https://www.bioconductor.org/packages/release/bioc/html/rhdf5.html
5262-
5263-
.. ipython:: python
5264-
5265-
df_for_r = pd.DataFrame(
5266-
{
5267-
"first": np.random.rand(100),
5268-
"second": np.random.rand(100),
5269-
"class": np.random.randint(0, 2, (100,)),
5270-
},
5271-
index=range(100),
5272-
)
5273-
df_for_r.head()
5274-
5275-
store_export = pd.HDFStore("export.h5")
5276-
store_export.append("df_for_r", df_for_r, data_columns=df_dc.columns)
5277-
store_export
5278-
5279-
.. ipython:: python
5280-
:suppress:
5281-
5282-
store_export.close()
5283-
os.remove("export.h5")
5284-
5285-
In R this file can be read into a ``data.frame`` object using the ``rhdf5``
5286-
library. The following example function reads the corresponding column names
5287-
and data values from the values and assembles them into a ``data.frame``:
5288-
5289-
.. code-block:: R
5290-
5291-
# Load values and column names for all datasets from corresponding nodes and
5292-
# insert them into one data.frame object.
5293-
5294-
library(rhdf5)
5295-
5296-
loadhdf5data <- function(h5File) {
5297-
5298-
listing <- h5ls(h5File)
5299-
# Find all data nodes, values are stored in *_values and corresponding column
5300-
# titles in *_items
5301-
data_nodes <- grep("_values", listing$name)
5302-
name_nodes <- grep("_items", listing$name)
5303-
data_paths = paste(listing$group[data_nodes], listing$name[data_nodes], sep = "/")
5304-
name_paths = paste(listing$group[name_nodes], listing$name[name_nodes], sep = "/")
5305-
columns = list()
5306-
for (idx in seq(data_paths)) {
5307-
# NOTE: matrices returned by h5read have to be transposed to obtain
5308-
# required Fortran order!
5309-
data <- data.frame(t(h5read(h5File, data_paths[idx])))
5310-
names <- t(h5read(h5File, name_paths[idx]))
5311-
entry <- data.frame(data)
5312-
colnames(entry) <- names
5313-
columns <- append(columns, entry)
5314-
}
5315-
5316-
data <- data.frame(columns)
5317-
5318-
return(data)
5319-
}
5320-
5321-
Now you can import the ``DataFrame`` into R:
5322-
5323-
.. code-block:: R
5324-
5325-
> data = loadhdf5data("transfer.hdf5")
5326-
> head(data)
5327-
first second class
5328-
1 0.4170220047 0.3266449 0
5329-
2 0.7203244934 0.5270581 0
5330-
3 0.0001143748 0.8859421 1
5331-
4 0.3023325726 0.3572698 1
5332-
5 0.1467558908 0.9085352 1
5333-
6 0.0923385948 0.6233601 1
5334-
5335-
.. note::
5336-
The R function lists the entire HDF5 file's contents and assembles the
5337-
``data.frame`` object from all matching nodes, so use this only as a
5338-
starting point if you have stored multiple ``DataFrame`` objects to a
5339-
single HDF5 file.
5340-
53415248
53425249
Performance
53435250
'''''''''''

0 commit comments

Comments
 (0)