Skip to content

Commit 382e780

Browse files
committed
Merge branch 'master' of git://github.com/pandas-dev/pandas into fix/pipeline_performance
2 parents 8d37554 + 9f93d57 commit 382e780

File tree

100 files changed

+1691
-1401
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

100 files changed

+1691
-1401
lines changed

.github/FUNDING.yml

+1
Original file line numberDiff line numberDiff line change
@@ -1 +1,2 @@
11
custom: https://pandas.pydata.org/donate.html
2+
tidelift: pypi/pandas

.github/SECURITY.md

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
To report a security vulnerability to pandas, please go to https://tidelift.com/security and see the instructions there.

README.md

+2
Original file line numberDiff line numberDiff line change
@@ -233,3 +233,5 @@ You can also triage issues which may include reproducing bug reports, or asking
233233
Or maybe through using pandas you have an idea of your own or are looking for something in the documentation and thinking ‘this can be improved’...you can do something about it!
234234

235235
Feel free to ask questions on the [mailing list](https://groups.google.com/forum/?fromgroups#!forum/pydata) or on [Gitter](https://gitter.im/pydata/pandas).
236+
237+
As contributors and maintainers to this project, you are expected to abide by pandas' code of conduct. More information can be found at: [Contributor Code of Conduct](https://github.com/pandas-dev/pandas/blob/master/.github/CODE_OF_CONDUCT.md)

asv_bench/benchmarks/index_object.py

+18
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
import gc
12
import numpy as np
23
import pandas.util.testing as tm
34
from pandas import (
@@ -225,4 +226,21 @@ def time_intersection_both_duplicate(self, N):
225226
self.intv.intersection(self.intv2)
226227

227228

229+
class GC:
230+
params = [1, 2, 5]
231+
232+
def create_use_drop(self):
233+
idx = Index(list(range(1000 * 1000)))
234+
idx._engine
235+
236+
def peakmem_gc_instances(self, N):
237+
try:
238+
gc.disable()
239+
240+
for _ in range(N):
241+
self.create_use_drop()
242+
finally:
243+
gc.enable()
244+
245+
228246
from .pandas_vb_common import setup # noqa: F401

ci/deps/azure-36-locale.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ dependencies:
2020
- xlsxwriter=0.9.8
2121
- xlwt=1.2.0
2222
# universal
23-
- pytest>=4.0.2,<5.0.0
24-
- pytest-xdist
23+
- pytest>=5.0.0
24+
- pytest-xdist>=1.29.0
2525
- pytest-mock
2626
- pytest-azurepipelines
2727
- hypothesis>=3.58.0

ci/deps/azure-37-locale.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ dependencies:
2626
- xlsxwriter
2727
- xlwt
2828
# universal
29-
- pytest>=4.0.2
30-
- pytest-xdist
29+
- pytest>=5.0.1
30+
- pytest-xdist>=1.29.0
3131
- pytest-mock
3232
- pytest-azurepipelines
3333
- pip

ci/deps/azure-37-numpydev.yaml

+2-1
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,8 @@ dependencies:
66
- pytz
77
- Cython>=0.28.2
88
# universal
9-
- pytest>=4.0.2
9+
# pytest < 5 until defaults has pytest-xdist>=1.29.0
10+
- pytest>=4.0.2,<5.0
1011
- pytest-xdist
1112
- pytest-mock
1213
- hypothesis>=3.58.0

ci/deps/azure-macos-35.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ dependencies:
2525
- pip:
2626
- pyreadstat
2727
# universal
28-
- pytest==4.5.0
29-
- pytest-xdist
28+
- pytest>=5.0.1
29+
- pytest-xdist>=1.29.0
3030
- pytest-mock
3131
- hypothesis>=3.58.0
3232
# https://github.com/pandas-dev/pandas/issues/27421

ci/deps/azure-windows-36.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,8 @@ dependencies:
2323
- xlwt
2424
# universal
2525
- cython>=0.28.2
26-
- pytest>=4.0.2
27-
- pytest-xdist
26+
- pytest>=5.0.1
27+
- pytest-xdist>=1.29.0
2828
- pytest-mock
2929
- pytest-azurepipelines
3030
- hypothesis>=3.58.0

ci/deps/azure-windows-37.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -26,8 +26,8 @@ dependencies:
2626
- xlwt
2727
# universal
2828
- cython>=0.28.2
29-
- pytest>=4.0.2
30-
- pytest-xdist
29+
- pytest>=5.0.0
30+
- pytest-xdist>=1.29.0
3131
- pytest-mock
3232
- pytest-azurepipelines
3333
- hypothesis>=3.58.0

ci/deps/travis-36-cov.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@ dependencies:
3939
- xlsxwriter
4040
- xlwt
4141
# universal
42-
- pytest
43-
- pytest-xdist
42+
- pytest>=5.0.1
43+
- pytest-xdist>=1.29.0
4444
- pytest-cov
4545
- pytest-mock
4646
- hypothesis>=3.58.0

ci/deps/travis-36-slow.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,8 @@ dependencies:
2525
- xlsxwriter
2626
- xlwt
2727
# universal
28-
- pytest>=4.0.2,<5.0.0
29-
- pytest-xdist
28+
- pytest>=5.0.0
29+
- pytest-xdist>=1.29.0
3030
- pytest-mock
3131
- moto
3232
- hypothesis>=3.58.0

ci/deps/travis-37.yaml

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,8 @@ dependencies:
1313
- pyarrow
1414
- pytz
1515
# universal
16-
- pytest>=4.0.2
17-
- pytest-xdist
16+
- pytest>=5.0.0
17+
- pytest-xdist>=1.29.0
1818
- pytest-mock
1919
- hypothesis>=3.58.0
2020
- s3fs

doc/source/development/developer.rst

+41-19
Original file line numberDiff line numberDiff line change
@@ -37,12 +37,19 @@ So that a ``pandas.DataFrame`` can be faithfully reconstructed, we store a
3737

3838
.. code-block:: text
3939
40-
{'index_columns': ['__index_level_0__', '__index_level_1__', ...],
40+
{'index_columns': [<descr0>, <descr1>, ...],
4141
'column_indexes': [<ci0>, <ci1>, ..., <ciN>],
4242
'columns': [<c0>, <c1>, ...],
43-
'pandas_version': $VERSION}
43+
'pandas_version': $VERSION,
44+
'creator': {
45+
'library': $LIBRARY,
46+
'version': $LIBRARY_VERSION
47+
}}
4448
45-
Here, ``<c0>``/``<ci0>`` and so forth are dictionaries containing the metadata
49+
The "descriptor" values ``<descr0>`` in the ``'index_columns'`` field are
50+
strings (referring to a column) or dictionaries with values as described below.
51+
52+
The ``<c0>``/``<ci0>`` and so forth are dictionaries containing the metadata
4653
for each column, *including the index columns*. This has JSON form:
4754

4855
.. code-block:: text
@@ -53,26 +60,37 @@ for each column, *including the index columns*. This has JSON form:
5360
'numpy_type': numpy_type,
5461
'metadata': metadata}
5562
56-
.. note::
63+
See below for the detailed specification for these.
64+
65+
Index Metadata Descriptors
66+
~~~~~~~~~~~~~~~~~~~~~~~~~~
67+
68+
``RangeIndex`` can be stored as metadata only, not requiring serialization. The
69+
descriptor format for these as is follows:
5770

58-
Every index column is stored with a name matching the pattern
59-
``__index_level_\d+__`` and its corresponding column information is can be
60-
found with the following code snippet.
71+
.. code-block:: python
6172
62-
Following this naming convention isn't strictly necessary, but strongly
63-
suggested for compatibility with Arrow.
73+
index = pd.RangeIndex(0, 10, 2)
74+
{'kind': 'range',
75+
'name': index.name,
76+
'start': index.start,
77+
'stop': index.stop,
78+
'step': index.step}
6479
65-
Here's an example of how the index metadata is structured in pyarrow:
80+
Other index types must be serialized as data columns along with the other
81+
DataFrame columns. The metadata for these is a string indicating the name of
82+
the field in the data columns, for example ``'__index_level_0__'``.
6683

67-
.. code-block:: python
84+
If an index has a non-None ``name`` attribute, and there is no other column
85+
with a name matching that value, then the ``index.name`` value can be used as
86+
the descriptor. Otherwise (for unnamed indexes and ones with names colliding
87+
with other column names) a disambiguating name with pattern matching
88+
``__index_level_\d+__`` should be used. In cases of named indexes as data
89+
columns, ``name`` attribute is always stored in the column descriptors as
90+
above.
6891

69-
# assuming there's at least 3 levels in the index
70-
index_columns = metadata['index_columns'] # noqa: F821
71-
columns = metadata['columns'] # noqa: F821
72-
ith_index = 2
73-
assert index_columns[ith_index] == '__index_level_2__'
74-
ith_index_info = columns[-len(index_columns):][ith_index]
75-
ith_index_level_name = ith_index_info['name']
92+
Column Metadata
93+
~~~~~~~~~~~~~~~
7694

7795
``pandas_type`` is the logical type of the column, and is one of:
7896

@@ -161,4 +179,8 @@ As an example of fully-formed metadata:
161179
'numpy_type': 'int64',
162180
'metadata': None}
163181
],
164-
'pandas_version': '0.20.0'}
182+
'pandas_version': '0.20.0',
183+
'creator': {
184+
'library': 'pyarrow',
185+
'version': '0.13.0'
186+
}}

doc/source/reference/extensions.rst

+1
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ objects.
4444
api.extensions.ExtensionArray.argsort
4545
api.extensions.ExtensionArray.astype
4646
api.extensions.ExtensionArray.copy
47+
api.extensions.ExtensionArray.view
4748
api.extensions.ExtensionArray.dropna
4849
api.extensions.ExtensionArray.factorize
4950
api.extensions.ExtensionArray.fillna

doc/source/user_guide/io.rst

+6-6
Original file line numberDiff line numberDiff line change
@@ -3572,7 +3572,7 @@ Closing a Store and using a context manager:
35723572
Read/write API
35733573
''''''''''''''
35743574

3575-
``HDFStore`` supports an top-level API using ``read_hdf`` for reading and ``to_hdf`` for writing,
3575+
``HDFStore`` supports a top-level API using ``read_hdf`` for reading and ``to_hdf`` for writing,
35763576
similar to how ``read_csv`` and ``to_csv`` work.
35773577

35783578
.. ipython:: python
@@ -3687,7 +3687,7 @@ Hierarchical keys
36873687
Keys to a store can be specified as a string. These can be in a
36883688
hierarchical path-name like format (e.g. ``foo/bar/bah``), which will
36893689
generate a hierarchy of sub-stores (or ``Groups`` in PyTables
3690-
parlance). Keys can be specified with out the leading '/' and are **always**
3690+
parlance). Keys can be specified without the leading '/' and are **always**
36913691
absolute (e.g. 'foo' refers to '/foo'). Removal operations can remove
36923692
everything in the sub-store and **below**, so be *careful*.
36933693

@@ -3825,7 +3825,7 @@ data.
38253825

38263826
A query is specified using the ``Term`` class under the hood, as a boolean expression.
38273827

3828-
* ``index`` and ``columns`` are supported indexers of a ``DataFrames``.
3828+
* ``index`` and ``columns`` are supported indexers of ``DataFrames``.
38293829
* if ``data_columns`` are specified, these can be used as additional indexers.
38303830

38313831
Valid comparison operators are:
@@ -3917,7 +3917,7 @@ Use boolean expressions, with in-line function evaluation.
39173917
39183918
store.select('dfq', "index>pd.Timestamp('20130104') & columns=['A', 'B']")
39193919
3920-
Use and inline column reference
3920+
Use inline column reference.
39213921

39223922
.. ipython:: python
39233923
@@ -4593,8 +4593,8 @@ Performance
45934593
write chunksize (default is 50000). This will significantly lower
45944594
your memory usage on writing.
45954595
* You can pass ``expectedrows=<int>`` to the first ``append``,
4596-
to set the TOTAL number of expected rows that ``PyTables`` will
4597-
expected. This will optimize read/write performance.
4596+
to set the TOTAL number of rows that ``PyTables`` will expect.
4597+
This will optimize read/write performance.
45984598
* Duplicate rows can be written to tables, but are filtered out in
45994599
selection (with the last items being selected; thus a table is
46004600
unique on major, minor pairs)

doc/source/whatsnew/v0.25.1.rst

+10-10
Original file line numberDiff line numberDiff line change
@@ -25,8 +25,7 @@ Bug fixes
2525
Categorical
2626
^^^^^^^^^^^
2727

28-
-
29-
-
28+
- Bug in :meth:`Categorical.fillna` would replace all values, not just those that are ``NaN`` (:issue:`26215`)
3029
-
3130

3231
Datetimelike
@@ -83,7 +82,8 @@ Indexing
8382
^^^^^^^^
8483

8584
- Bug in partial-string indexing returning a NumPy array rather than a ``Series`` when indexing with a scalar like ``.loc['2015']`` (:issue:`27516`)
86-
-
85+
- Break reference cycle involving :class:`Index` and other index classes to allow garbage collection of index objects without running the GC. (:issue:`27585`, :issue:`27840`)
86+
- Fix regression in assigning values to a single column of a DataFrame with a ``MultiIndex`` columns (:issue:`27841`).
8787
-
8888

8989
Missing
@@ -103,36 +103,36 @@ MultiIndex
103103
I/O
104104
^^^
105105

106-
-
107-
-
106+
- Avoid calling ``S3File.s3`` when reading parquet, as this was removed in s3fs version 0.3.0 (:issue:`27756`)
107+
- Better error message when a negative header is passed in :func:`pandas.read_csv` (:issue:`27779`)
108108
-
109109

110110
Plotting
111111
^^^^^^^^
112112

113113
- Added a pandas_plotting_backends entrypoint group for registering plot backends. See :ref:`extending.plotting-backends` for more (:issue:`26747`).
114-
-
114+
- Fix compatibility issue with matplotlib when passing a pandas ``Index`` to a plot call (:issue:`27775`).
115115
-
116116

117117
Groupby/resample/rolling
118118
^^^^^^^^^^^^^^^^^^^^^^^^
119119

120120
- Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.transform` where applying a timezone conversion lambda function would drop timezone information (:issue:`27496`)
121121
- Bug in windowing over read-only arrays (:issue:`27766`)
122-
-
122+
- Fixed segfault in `pandas.core.groupby.DataFrameGroupBy.quantile` when an invalid quantile was passed (:issue:`27470`)
123123
-
124124

125125
Reshaping
126126
^^^^^^^^^
127127

128128
- A ``KeyError`` is now raised if ``.unstack()`` is called on a :class:`Series` or :class:`DataFrame` with a flat :class:`Index` passing a name which is not the correct one (:issue:`18303`)
129-
- Bug in :meth:`DataFrame.crosstab` when ``margins`` set to ``True`` and ``normalize`` is not ``False``, an error is raised. (:issue:`27500`)
129+
- Bug in :meth:`DataFrame.crosstab` when ``margins`` set to ``True`` and ``normalize`` is not ``False``, an error is raised. (:issue:`27500`)
130130
- :meth:`DataFrame.join` now suppresses the ``FutureWarning`` when the sort parameter is specified (:issue:`21952`)
131-
-
131+
- Bug in :meth:`DataFrame.join` raising with readonly arrays (:issue:`27943`)
132132

133133
Sparse
134134
^^^^^^
135-
135+
- Bug in reductions for :class:`Series` with Sparse dtypes (:issue:`27080`)
136136
-
137137
-
138138
-

0 commit comments

Comments
 (0)