Skip to content

Commit 04d7186

Browse files
committed
Merge remote-tracking branch 'upstream/master' into pandas-devgh-16454
2 parents 6205d1a + f937843 commit 04d7186

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+1458
-691
lines changed

.pre-commit-config.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ repos:
1111
language: python_venv
1212
additional_dependencies: [flake8-comprehensions>=3.1.0]
1313
- repo: https://github.com/pre-commit/mirrors-isort
14-
rev: v4.3.20
14+
rev: v4.3.21
1515
hooks:
1616
- id: isort
1717
language: python_venv

asv_bench/benchmarks/indexing.py

+4
Original file line numberDiff line numberDiff line change
@@ -131,6 +131,7 @@ def setup(self):
131131
self.col_scalar = columns[10]
132132
self.bool_indexer = self.df[self.col_scalar] > 0
133133
self.bool_obj_indexer = self.bool_indexer.astype(object)
134+
self.boolean_indexer = (self.df[self.col_scalar] > 0).astype("boolean")
134135

135136
def time_loc(self):
136137
self.df.loc[self.idx_scalar, self.col_scalar]
@@ -144,6 +145,9 @@ def time_boolean_rows(self):
144145
def time_boolean_rows_object(self):
145146
self.df[self.bool_obj_indexer]
146147

148+
def time_boolean_rows_boolean(self):
149+
self.df[self.boolean_indexer]
150+
147151

148152
class DataFrameNumericIndexing:
149153
def setup(self):

ci/deps/travis-36-cov.yaml

-2
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,6 @@ dependencies:
3030
- openpyxl<=3.0.1
3131
# https://github.com/pandas-dev/pandas/pull/30009 openpyxl 3.0.2 broke
3232
- pandas-gbq
33-
# https://github.com/pydata/pandas-gbq/issues/271
34-
- google-cloud-bigquery<=1.11
3533
- psycopg2
3634
- pyarrow>=0.12.0
3735
- pymysql

doc/source/development/code_style.rst

+129
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,129 @@
1+
.. _code_style:
2+
3+
{{ header }}
4+
5+
=======================
6+
pandas code style guide
7+
=======================
8+
9+
.. contents:: Table of contents:
10+
:local:
11+
12+
Patterns
13+
========
14+
15+
foo.__class__
16+
-------------
17+
18+
*pandas* uses 'type(foo)' instead 'foo.__class__' as it is making the code more
19+
readable.
20+
21+
For example:
22+
23+
**Good:**
24+
25+
.. code-block:: python
26+
27+
foo = "bar"
28+
type(foo)
29+
30+
**Bad:**
31+
32+
.. code-block:: python
33+
34+
foo = "bar"
35+
foo.__class__
36+
37+
38+
String formatting
39+
=================
40+
41+
Concatenated strings
42+
--------------------
43+
44+
f-strings
45+
~~~~~~~~~
46+
47+
*pandas* uses f-strings formatting instead of '%' and '.format()' string formatters.
48+
49+
The convention of using f-strings on a string that is concatenated over serveral lines,
50+
is to prefix only the lines containing the value needs to be interpeted.
51+
52+
For example:
53+
54+
**Good:**
55+
56+
.. code-block:: python
57+
58+
foo = "old_function"
59+
bar = "new_function"
60+
61+
my_warning_message = (
62+
f"Warning, {foo} is deprecated, "
63+
"please use the new and way better "
64+
f"{bar}"
65+
)
66+
67+
**Bad:**
68+
69+
.. code-block:: python
70+
71+
foo = "old_function"
72+
bar = "new_function"
73+
74+
my_warning_message = (
75+
f"Warning, {foo} is deprecated, "
76+
f"please use the new and way better "
77+
f"{bar}"
78+
)
79+
80+
White spaces
81+
~~~~~~~~~~~~
82+
83+
Putting the white space only at the end of the previous line, so
84+
there is no whitespace at the beggining of the concatenated string.
85+
86+
For example:
87+
88+
**Good:**
89+
90+
.. code-block:: python
91+
92+
example_string = (
93+
"Some long concatenated string, "
94+
"with good placement of the "
95+
"whitespaces"
96+
)
97+
98+
**Bad:**
99+
100+
.. code-block:: python
101+
102+
example_string = (
103+
"Some long concatenated string,"
104+
" with bad placement of the"
105+
" whitespaces"
106+
)
107+
108+
Representation function (aka 'repr()')
109+
--------------------------------------
110+
111+
*pandas* uses 'repr()' instead of '%r' and '!r'.
112+
113+
The use of 'repr()' will only happend when the value is not an obvious string.
114+
115+
For example:
116+
117+
**Good:**
118+
119+
.. code-block:: python
120+
121+
value = str
122+
f"Unknown recived value, got: {repr(value)}"
123+
124+
**Good:**
125+
126+
.. code-block:: python
127+
128+
value = str
129+
f"Unknown recived type, got: '{type(value).__name__}'"

doc/source/development/contributing.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -569,8 +569,7 @@ do not make sudden changes to the code that could have the potential to break
569569
a lot of user code as a result, that is, we need it to be as *backwards compatible*
570570
as possible to avoid mass breakages.
571571

572-
Additional standards are outlined on the `code style wiki
573-
page <https://github.com/pandas-dev/pandas/wiki/Code-Style-and-Conventions>`_.
572+
Additional standards are outlined on the `pandas code style guide <code_style>`_
574573

575574
Optional dependencies
576575
---------------------

doc/source/development/index.rst

+1
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ Development
1313
:maxdepth: 2
1414

1515
contributing
16+
code_style
1617
maintaining
1718
internals
1819
extending

doc/source/ecosystem.rst

+15-1
Original file line numberDiff line numberDiff line change
@@ -327,6 +327,21 @@ PyTables, h5py, and pymongo to move data between non pandas formats. Its graph
327327
based approach is also extensible by end users for custom formats that may be
328328
too specific for the core of odo.
329329

330+
`Pandarallel <https://github.com/nalepae/pandarallel>`__
331+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
332+
333+
Pandarallel provides a simple way to parallelize your pandas operations on all your CPUs by changing only one line of code.
334+
If also displays progress bars.
335+
336+
.. code:: python
337+
338+
from pandarallel import pandarallel
339+
340+
pandarallel.initialize(progress_bar=True)
341+
342+
# df.apply(func)
343+
df.parallel_apply(func)
344+
330345
`Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html>`__
331346
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
332347

@@ -380,4 +395,3 @@ Library Accessor Classes
380395

381396
.. _cyberpandas: https://cyberpandas.readthedocs.io/en/latest
382397
.. _pdvega: https://altair-viz.github.io/pdvega/
383-

doc/source/getting_started/basics.rst

+28-13
Original file line numberDiff line numberDiff line change
@@ -1937,21 +1937,36 @@ See :ref:`extending.extension-types` for how to write your own extension that
19371937
works with pandas. See :ref:`ecosystem.extensions` for a list of third-party
19381938
libraries that have implemented an extension.
19391939

1940-
The following table lists all of pandas extension types. See the respective
1940+
The following table lists all of pandas extension types. For methods requiring ``dtype``
1941+
arguments, strings can be specified as indicated. See the respective
19411942
documentation sections for more on each type.
19421943

1943-
=================== ========================= ================== ============================= =============================
1944-
Kind of Data Data Type Scalar Array Documentation
1945-
=================== ========================= ================== ============================= =============================
1946-
tz-aware datetime :class:`DatetimeTZDtype` :class:`Timestamp` :class:`arrays.DatetimeArray` :ref:`timeseries.timezone`
1947-
Categorical :class:`CategoricalDtype` (none) :class:`Categorical` :ref:`categorical`
1948-
period (time spans) :class:`PeriodDtype` :class:`Period` :class:`arrays.PeriodArray` :ref:`timeseries.periods`
1949-
sparse :class:`SparseDtype` (none) :class:`arrays.SparseArray` :ref:`sparse`
1950-
intervals :class:`IntervalDtype` :class:`Interval` :class:`arrays.IntervalArray` :ref:`advanced.intervalindex`
1951-
nullable integer :class:`Int64Dtype`, ... (none) :class:`arrays.IntegerArray` :ref:`integer_na`
1952-
Strings :class:`StringDtype` :class:`str` :class:`arrays.StringArray` :ref:`text`
1953-
Boolean (with NA) :class:`BooleanDtype` :class:`bool` :class:`arrays.BooleanArray` :ref:`api.arrays.bool`
1954-
=================== ========================= ================== ============================= =============================
1944+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1945+
| Kind of Data | Data Type | Scalar | Array | String Aliases | Documentation |
1946+
+===================+===========================+====================+===============================+=========================================+===============================+
1947+
| tz-aware datetime | :class:`DatetimeTZDtype` | :class:`Timestamp` | :class:`arrays.DatetimeArray` | ``'datetime64[ns, <tz>]'`` | :ref:`timeseries.timezone` |
1948+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1949+
| Categorical | :class:`CategoricalDtype` | (none) | :class:`Categorical` | ``'category'`` | :ref:`categorical` |
1950+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1951+
| period | :class:`PeriodDtype` | :class:`Period` | :class:`arrays.PeriodArray` | ``'period[<freq>]'``, | :ref:`timeseries.periods` |
1952+
| (time spans) | | | | ``'Period[<freq>]'`` | |
1953+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1954+
| sparse | :class:`SparseDtype` | (none) | :class:`SparseArray` | ``'Sparse'``, ``'Sparse[int]'``, | :ref:`sparse` |
1955+
| | | | | ``'Sparse[float]'`` | |
1956+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1957+
| intervals | :class:`IntervalDtype` | :class:`Interval` | :class:`arrays.IntervalArray` | ``'interval'``, ``'Interval'``, | :ref:`advanced.intervalindex` |
1958+
| | | | | ``'Interval[<numpy_dtype>]'``, | |
1959+
| | | | | ``'Interval[datetime64[ns, <tz>]]'``, | |
1960+
| | | | | ``'Interval[timedelta64[<freq>]]'`` | |
1961+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1962+
| nullable integer + :class:`Int64Dtype`, ... | (none) | :class:`arrays.IntegerArray` | ``'Int8'``, ``'Int16'``, ``'Int32'``, | :ref:`integer_na` |
1963+
| | | | | ``'Int64'``, ``'UInt8'``, ``'UInt16'``, | |
1964+
| | | | | ``'UInt32'``, ``'UInt64'`` | |
1965+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1966+
| Strings | :class:`StringDtype` | :class:`str` | :class:`arrays.StringArray` | ``'string'`` | :ref:`text` |
1967+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
1968+
| Boolean (with NA) | :class:`BooleanDtype` | :class:`bool` | :class:`arrays.BooleanArray` | ``'boolean'`` | :ref:`api.arrays.bool` |
1969+
+-------------------+---------------------------+--------------------+-------------------------------+-----------------------------------------+-------------------------------+
19551970

19561971
Pandas has two ways to store strings.
19571972

doc/source/getting_started/comparison/comparison_with_sas.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -629,7 +629,7 @@ for more details and examples.
629629

630630
.. ipython:: python
631631
632-
tips_summed = tips.groupby(['sex', 'smoker'])['total_bill', 'tip'].sum()
632+
tips_summed = tips.groupby(['sex', 'smoker'])[['total_bill', 'tip']].sum()
633633
tips_summed.head()
634634
635635

doc/source/getting_started/comparison/comparison_with_stata.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,7 @@ for more details and examples.
617617

618618
.. ipython:: python
619619
620-
tips_summed = tips.groupby(['sex', 'smoker'])['total_bill', 'tip'].sum()
620+
tips_summed = tips.groupby(['sex', 'smoker'])[['total_bill', 'tip']].sum()
621621
tips_summed.head()
622622
623623

doc/source/index.rst.template

+1
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,7 @@ See the :ref:`overview` for more detail about what's in the library.
109109
* :doc:`development/index`
110110

111111
* :doc:`development/contributing`
112+
* :doc:`development/code_style`
112113
* :doc:`development/internals`
113114
* :doc:`development/extending`
114115
* :doc:`development/developer`

doc/source/reference/arrays.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,8 @@ For most data types, pandas uses NumPy arrays as the concrete
1212
objects contained with a :class:`Index`, :class:`Series`, or
1313
:class:`DataFrame`.
1414

15-
For some data types, pandas extends NumPy's type system.
15+
For some data types, pandas extends NumPy's type system. String aliases for these types
16+
can be found at :ref:`basics.dtypes`.
1617

1718
=================== ========================= ================== =============================
1819
Kind of Data Pandas Data Type Scalar Array

doc/source/reference/extensions.rst

+8
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,11 @@ objects.
5959
api.extensions.ExtensionArray.nbytes
6060
api.extensions.ExtensionArray.ndim
6161
api.extensions.ExtensionArray.shape
62+
63+
Additionally, we have some utility methods for ensuring your object
64+
behaves correctly.
65+
66+
.. autosummary::
67+
:toctree: api/
68+
69+
api.indexers.check_bool_array_indexer

doc/source/user_guide/advanced.rst

+5-9
Original file line numberDiff line numberDiff line change
@@ -565,19 +565,15 @@ When working with an ``Index`` object directly, rather than via a ``DataFrame``,
565565
mi2 = mi.rename("new name", level=0)
566566
mi2
567567
568-
.. warning::
569568
570-
Prior to pandas 1.0.0, you could also set the names of a ``MultiIndex``
571-
by updating the name of a level.
569+
You cannot set the names of the MultiIndex via a level.
572570

573-
.. code-block:: none
571+
.. ipython:: python
572+
:okexcept:
574573
575-
>>> mi.levels[0].name = 'name via level'
576-
>>> mi.names[0] # only works for older pandas
577-
'name via level'
574+
mi.levels[0].name = "name via level"
578575
579-
As of pandas 1.0, this will *silently* fail to update the names
580-
of the MultiIndex. Use :meth:`Index.set_names` instead.
576+
Use :meth:`Index.set_names` instead.
581577

582578
Sorting a ``MultiIndex``
583579
------------------------

doc/source/user_guide/boolean.rst

+23
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,29 @@ Nullable Boolean Data Type
1414

1515
.. versionadded:: 1.0.0
1616

17+
18+
.. _boolean.indexing:
19+
20+
Indexing with NA values
21+
-----------------------
22+
23+
pandas does not allow indexing with NA values. Attempting to do so
24+
will raise a ``ValueError``.
25+
26+
.. ipython:: python
27+
:okexcept:
28+
29+
s = pd.Series([1, 2, 3])
30+
mask = pd.array([True, False, pd.NA], dtype="boolean")
31+
s[mask]
32+
33+
The missing values will need to be explicitly filled with True or False prior
34+
to using the array as a mask.
35+
36+
.. ipython:: python
37+
38+
s[mask.fillna(False)]
39+
1740
.. _boolean.kleene:
1841

1942
Kleene Logical Operations

0 commit comments

Comments
 (0)