Skip to content

Commit ec55f9b

Browse files
committed
Merge remote-tracking branch 'upstream/master' into ci-envs
2 parents f70a6dc + 0ae7e90 commit ec55f9b

34 files changed

+6272
-5914
lines changed

doc/source/api.rst

+2
Original file line numberDiff line numberDiff line change
@@ -2106,6 +2106,7 @@ Standard moving window functions
21062106
Rolling.skew
21072107
Rolling.kurt
21082108
Rolling.apply
2109+
Rolling.aggregate
21092110
Rolling.quantile
21102111
Window.mean
21112112
Window.sum
@@ -2133,6 +2134,7 @@ Standard expanding window functions
21332134
Expanding.skew
21342135
Expanding.kurt
21352136
Expanding.apply
2137+
Expanding.aggregate
21362138
Expanding.quantile
21372139

21382140
Exponentially-weighted moving window functions

doc/source/extending.rst

+25
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,13 @@ If you write a custom accessor, make a pull request adding it to our
5757
Extension Types
5858
---------------
5959

60+
.. versionadded:: 0.23.0
61+
62+
.. warning::
63+
64+
The ``ExtensionDtype`` and ``ExtensionArray`` APIs are new and
65+
experimental. They may change between versions without warning.
66+
6067
Pandas defines an interface for implementing data types and arrays that *extend*
6168
NumPy's type system. Pandas itself uses the extension system for some types
6269
that aren't built into NumPy (categorical, period, interval, datetime with
@@ -106,6 +113,24 @@ by some other storage type, like Python lists.
106113
See the `extension array source`_ for the interface definition. The docstrings
107114
and comments contain guidance for properly implementing the interface.
108115

116+
We provide a test suite for ensuring that your extension arrays satisfy the expected
117+
behavior. To use the test suite, you must provide several pytest fixtures and inherit
118+
from the base test class. The required fixtures are found in
119+
https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/conftest.py.
120+
121+
To use a test, subclass it:
122+
123+
.. code-block:: python
124+
125+
from pandas.tests.extension import base
126+
127+
class TestConstructors(base.BaseConstructorsTests):
128+
pass
129+
130+
131+
See https://github.com/pandas-dev/pandas/blob/master/pandas/tests/extension/base/__init__.py
132+
for a list of all the tests available.
133+
109134
.. _extension dtype source: https://github.com/pandas-dev/pandas/blob/master/pandas/core/dtypes/base.py
110135
.. _extension array source: https://github.com/pandas-dev/pandas/blob/master/pandas/core/arrays/base.py
111136

doc/source/install.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Instructions for installing from source,
1515
`PyPI <http://pypi.python.org/pypi/pandas>`__, `ActivePython <https://www.activestate.com/activepython/downloads>`__, various Linux distributions, or a
1616
`development version <http://github.com/pandas-dev/pandas>`__ are also provided.
1717

18-
.. _install.dropping_27:
18+
.. _install.dropping-27:
1919

2020
Plan for dropping Python 2.7
2121
----------------------------

doc/source/whatsnew/v0.23.0.txt

+12-6
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ version.
1111
.. warning::
1212

1313
Starting January 1, 2019, pandas feature releases will support Python 3 only.
14-
See :ref:`here <install.dropping_27>` for more.
14+
See :ref:`install.dropping-27` for more.
1515

1616
.. _whatsnew_0230.enhancements:
1717

@@ -221,6 +221,12 @@ Current Behavior:
221221

222222
s.rank(na_option='top')
223223

224+
These bugs were squashed:
225+
226+
- Bug in :meth:`DataFrame.rank` and :meth:`Series.rank` when ``method='dense'`` and ``pct=True`` in which percentile ranks were not being used with the number of distinct observations (:issue:`15630`)
227+
- Bug in :meth:`Series.rank` and :meth:`DataFrame.rank` when ``ascending='False'`` failed to return correct ranks for infinity if ``NaN`` were present (:issue:`19538`)
228+
- Bug in :func:`DataFrameGroupBy.rank` where ranks were incorrect when both infinity and ``NaN`` were present (:issue:`20561`)
229+
224230
.. _whatsnew_0230.enhancements.round-trippable_json:
225231

226232
JSON read/write round-trippable with ``orient='table'``
@@ -335,8 +341,8 @@ Supplying a ``CategoricalDtype`` will make the categories in each column consist
335341

336342
.. _whatsnew_023.enhancements.extension:
337343

338-
Extending Pandas with Custom Types
339-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
344+
Extending Pandas with Custom Types (Experimental)
345+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
340346

341347
Pandas now supports storing array-like objects that aren't necessarily 1-D NumPy
342348
arrays as columns in a DataFrame or values in a Series. This allows third-party
@@ -438,6 +444,7 @@ Other Enhancements
438444
``SQLAlchemy`` dialects supporting multivalue inserts include: ``mysql``, ``postgresql``, ``sqlite`` and any dialect with ``supports_multivalues_insert``. (:issue:`14315`, :issue:`8953`)
439445
- :func:`read_html` now accepts a ``displayed_only`` keyword argument to controls whether or not hidden elements are parsed (``True`` by default) (:issue:`20027`)
440446
- zip compression is supported via ``compression=zip`` in :func:`DataFrame.to_pickle`, :func:`Series.to_pickle`, :func:`DataFrame.to_csv`, :func:`Series.to_csv`, :func:`DataFrame.to_json`, :func:`Series.to_json`. (:issue:`17778`)
447+
- :class:`WeekOfMonth` constructor now supports ``n=0`` (:issue:`20517`).
441448
- :class:`DataFrame` and :class:`Series` now support matrix multiplication (```@```) operator (:issue:`10259`) for Python>=3.5
442449
- Updated ``to_gbq`` and ``read_gbq`` signature and documentation to reflect changes from
443450
the Pandas-GBQ library version 0.4.0. Adds intersphinx mapping to Pandas-GBQ
@@ -847,7 +854,7 @@ Other API Changes
847854
- :func:`DatetimeIndex.strftime` and :func:`PeriodIndex.strftime` now return an ``Index`` instead of a numpy array to be consistent with similar accessors (:issue:`20127`)
848855
- Constructing a Series from a list of length 1 no longer broadcasts this list when a longer index is specified (:issue:`19714`, :issue:`20391`).
849856
- :func:`DataFrame.to_dict` with ``orient='index'`` no longer casts int columns to float for a DataFrame with only int and float columns (:issue:`18580`)
850-
- A user-defined-function that is passed to :func:`Series.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, :func:`DataFrame.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, or its expanding cousins, will now *always* be passed a ``Series``, rather than an ``np.array``; ``.apply()`` only has the ``raw`` keyword, see :ref:`here <whatsnew_0230.enhancements.window_raw>`. This is consistent with the signatures of ``.aggregate()`` across pandas (:issue:`20584`)
857+
- A user-defined-function that is passed to :func:`Series.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, :func:`DataFrame.rolling().aggregate() <pandas.core.window.Rolling.aggregate>`, or its expanding cousins, will now *always* be passed a ``Series``, rather than a ``np.array``; ``.apply()`` only has the ``raw`` keyword, see :ref:`here <whatsnew_0230.enhancements.window_raw>`. This is consistent with the signatures of ``.aggregate()`` across pandas (:issue:`20584`)
851858

852859
.. _whatsnew_0230.deprecations:
853860

@@ -1081,14 +1088,12 @@ Offsets
10811088

10821089
Numeric
10831090
^^^^^^^
1084-
- Bug in :meth:`DataFrame.rank` and :meth:`Series.rank` when ``method='dense'`` and ``pct=True`` in which percentile ranks were not being used with the number of distinct observations (:issue:`15630`)
10851091
- Bug in :class:`Series` constructor with an int or float list where specifying ``dtype=str``, ``dtype='str'`` or ``dtype='U'`` failed to convert the data elements to strings (:issue:`16605`)
10861092
- Bug in :class:`Index` multiplication and division methods where operating with a ``Series`` would return an ``Index`` object instead of a ``Series`` object (:issue:`19042`)
10871093
- Bug in the :class:`DataFrame` constructor in which data containing very large positive or very large negative numbers was causing ``OverflowError`` (:issue:`18584`)
10881094
- Bug in :class:`Index` constructor with ``dtype='uint64'`` where int-like floats were not coerced to :class:`UInt64Index` (:issue:`18400`)
10891095
- Bug in :class:`DataFrame` flex arithmetic (e.g. ``df.add(other, fill_value=foo)``) with a ``fill_value`` other than ``None`` failed to raise ``NotImplementedError`` in corner cases where either the frame or ``other`` has length zero (:issue:`19522`)
10901096
- Multiplication and division of numeric-dtyped :class:`Index` objects with timedelta-like scalars returns ``TimedeltaIndex`` instead of raising ``TypeError`` (:issue:`19333`)
1091-
- Bug in :meth:`Series.rank` and :meth:`DataFrame.rank` when ``ascending='False'`` failed to return correct ranks for infinity if ``NaN`` were present (:issue:`19538`)
10921097
- Bug where ``NaN`` was returned instead of 0 by :func:`Series.pct_change` and :func:`DataFrame.pct_change` when ``fill_method`` is not ``None`` (:issue:`19873`)
10931098

10941099

@@ -1113,6 +1118,7 @@ Indexing
11131118
- Bug in :meth:`DataFrame.first_valid_index` and :meth:`DataFrame.last_valid_index` in presence of entire rows of NaNs in the middle of values (:issue:`20499`).
11141119
- Bug in :class:`IntervalIndex` where some indexing operations were not supported for overlapping or non-monotonic ``uint64`` data (:issue:`20636`)
11151120
- Bug in ``Series.is_unique`` where extraneous output in stderr is shown if Series contains objects with ``__ne__`` defined (:issue:`20661`)
1121+
- Bug in ``.loc`` assignment with a single-element list-like incorrectly assigns as a list (:issue:`19474`)
11161122
- Bug in partial string indexing on a ``Series/DataFrame`` with a monotonic decreasing ``DatetimeIndex`` (:issue:`19362`)
11171123

11181124
MultiIndex

pandas/_libs/groupby_helper.pxi.in

+20-11
Original file line numberDiff line numberDiff line change
@@ -417,25 +417,33 @@ def group_rank_{{name}}(ndarray[float64_t, ndim=2] out,
417417
ndarray[int64_t] labels,
418418
bint is_datetimelike, object ties_method,
419419
bint ascending, bint pct, object na_option):
420-
"""Provides the rank of values within each group
420+
"""
421+
Provides the rank of values within each group.
421422

422423
Parameters
423424
----------
424425
out : array of float64_t values which this method will write its results to
425426
values : array of {{c_type}} values to be ranked
426427
labels : array containing unique label for each group, with its ordering
427428
matching up to the corresponding record in `values`
428-
is_datetimelike : bool
429+
is_datetimelike : bool, default False
429430
unused in this method but provided for call compatibility with other
430431
Cython transformations
431-
ties_method : {'keep', 'top', 'bottom'}
432+
ties_method : {'average', 'min', 'max', 'first', 'dense'}, default 'average'
433+
* average: average rank of group
434+
* min: lowest rank in group
435+
* max: highest rank in group
436+
* first: ranks assigned in order they appear in the array
437+
* dense: like 'min', but rank always increases by 1 between groups
438+
ascending : boolean, default True
439+
False for ranks by high (1) to low (N)
440+
na_option : {'keep', 'top', 'bottom'}, default 'keep'
441+
pct : boolean, default False
442+
Compute percentage rank of data within each group
443+
na_option : {'keep', 'top', 'bottom'}, default 'keep'
432444
* keep: leave NA values where they are
433445
* top: smallest rank if ascending
434446
* bottom: smallest rank if descending
435-
ascending : boolean
436-
False for ranks by high (1) to low (N)
437-
pct : boolean
438-
Compute percentage rank of data within each group
439447

440448
Notes
441449
-----
@@ -508,7 +516,8 @@ def group_rank_{{name}}(ndarray[float64_t, ndim=2] out,
508516

509517
# if keep_na, check for missing values and assign back
510518
# to the result where appropriate
511-
if keep_na and masked_vals[_as[i]] == nan_fill_val:
519+
520+
if keep_na and mask[_as[i]]:
512521
grp_na_count += 1
513522
out[_as[i], 0] = nan
514523
else:
@@ -548,9 +557,9 @@ def group_rank_{{name}}(ndarray[float64_t, ndim=2] out,
548557
# reset the dups and sum_ranks, knowing that a new value is coming
549558
# up. the conditional also needs to handle nan equality and the
550559
# end of iteration
551-
if (i == N - 1 or (
552-
(masked_vals[_as[i]] != masked_vals[_as[i+1]]) and not
553-
(mask[_as[i]] and mask[_as[i+1]]))):
560+
if (i == N - 1 or
561+
(masked_vals[_as[i]] != masked_vals[_as[i+1]]) or
562+
(mask[_as[i]] ^ mask[_as[i+1]])):
554563
dups = sum_ranks = 0
555564
val_start = i
556565
grp_vals_seen += 1

pandas/core/arrays/base.py

+21-6
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
1-
"""An interface for extending pandas with custom arrays."""
1+
"""An interface for extending pandas with custom arrays.
2+
3+
.. warning::
4+
5+
This is an experimental API and subject to breaking changes
6+
without warning.
7+
"""
28
import numpy as np
39

410
from pandas.errors import AbstractMethodError
@@ -14,12 +20,15 @@ class ExtensionArray(object):
1420
with a custom type and will not attempt to coerce them to objects. They
1521
may be stored directly inside a :class:`DataFrame` or :class:`Series`.
1622
23+
.. versionadded:: 0.23.0
24+
1725
Notes
1826
-----
1927
The interface includes the following abstract methods that must be
2028
implemented by subclasses:
2129
2230
* _constructor_from_sequence
31+
* _from_factorized
2332
* __getitem__
2433
* __len__
2534
* dtype
@@ -30,11 +39,21 @@ class ExtensionArray(object):
3039
* _concat_same_type
3140
3241
Some additional methods are available to satisfy pandas' internal, private
33-
block API.
42+
block API:
3443
3544
* _can_hold_na
3645
* _formatting_values
3746
47+
Some methods require casting the ExtensionArray to an ndarray of Python
48+
objects with ``self.astype(object)``, which may be expensive. When
49+
performance is a concern, we highly recommend overriding the following
50+
methods:
51+
52+
* fillna
53+
* unique
54+
* factorize / _values_for_factorize
55+
* argsort / _values_for_argsort
56+
3857
This class does not inherit from 'abc.ABCMeta' for performance reasons.
3958
Methods and properties required by the interface raise
4059
``pandas.errors.AbstractMethodError`` and no ``register`` method is
@@ -50,10 +69,6 @@ class ExtensionArray(object):
5069
by some other storage type, like Python lists. Pandas makes no
5170
assumptions on how the data are stored, just that it can be converted
5271
to a NumPy array.
53-
54-
Extension arrays should be able to be constructed with instances of
55-
the class, i.e. ``ExtensionArray(extension_array)`` should return
56-
an instance, not error.
5772
"""
5873
# '_typ' is for pandas.core.dtypes.generic.ABCExtensionArray.
5974
# Don't override this.

pandas/core/dtypes/common.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -1807,7 +1807,7 @@ def _get_dtype(arr_or_dtype):
18071807
return arr_or_dtype
18081808
elif isinstance(arr_or_dtype, type):
18091809
return np.dtype(arr_or_dtype)
1810-
elif isinstance(arr_or_dtype, CategoricalDtype):
1810+
elif isinstance(arr_or_dtype, ExtensionDtype):
18111811
return arr_or_dtype
18121812
elif isinstance(arr_or_dtype, DatetimeTZDtype):
18131813
return arr_or_dtype

0 commit comments

Comments
 (0)