Skip to content

Commit 53797b7

Browse files
committed
Merge branch 'master' into bug_groupby_quantile_arraylike_fails
2 parents 64e9176 + cde73af commit 53797b7

File tree

81 files changed

+1051
-1025
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

81 files changed

+1051
-1025
lines changed

ci/code_checks.sh

+9
Original file line numberDiff line numberDiff line change
@@ -122,13 +122,18 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
122122
# Check for imports from collections.abc instead of `from collections import abc`
123123
MSG='Check for non-standard imports' ; echo $MSG
124124
invgrep -R --include="*.py*" -E "from pandas.core.common import" pandas
125+
RET=$(($RET + $?)) ; echo $MSG "DONE"
125126
invgrep -R --include="*.py*" -E "from pandas.core import common" pandas
127+
RET=$(($RET + $?)) ; echo $MSG "DONE"
126128
invgrep -R --include="*.py*" -E "from collections.abc import" pandas
129+
RET=$(($RET + $?)) ; echo $MSG "DONE"
127130
invgrep -R --include="*.py*" -E "from numpy import nan" pandas
131+
RET=$(($RET + $?)) ; echo $MSG "DONE"
128132

129133
# Checks for test suite
130134
# Check for imports from pandas.util.testing instead of `import pandas.util.testing as tm`
131135
invgrep -R --include="*.py*" -E "from pandas.util.testing import" pandas/tests
136+
RET=$(($RET + $?)) ; echo $MSG "DONE"
132137
invgrep -R --include="*.py*" -E "from pandas.util import testing as tm" pandas/tests
133138
RET=$(($RET + $?)) ; echo $MSG "DONE"
134139

@@ -195,6 +200,10 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
195200
invgrep -R --include="*.py" --include="*.pyx" -E 'class.*:\n\n( )+"""' .
196201
RET=$(($RET + $?)) ; echo $MSG "DONE"
197202

203+
MSG='Check for use of {foo!r} instead of {repr(foo)}' ; echo $MSG
204+
invgrep -R --include=*.{py,pyx} '!r}' pandas
205+
RET=$(($RET + $?)) ; echo $MSG "DONE"
206+
198207
MSG='Check for use of comment-based annotation syntax' ; echo $MSG
199208
invgrep -R --include="*.py" -P '# type: (?!ignore)' pandas
200209
RET=$(($RET + $?)) ; echo $MSG "DONE"

doc/source/user_guide/advanced.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -573,7 +573,7 @@ When working with an ``Index`` object directly, rather than via a ``DataFrame``,
573573
.. code-block:: none
574574
575575
>>> mi.levels[0].name = 'name via level'
576-
>>> mi.names[0] # only works for older panads
576+
>>> mi.names[0] # only works for older pandas
577577
'name via level'
578578
579579
As of pandas 1.0, this will *silently* fail to update the names

doc/source/user_guide/missing_data.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -791,7 +791,7 @@ the nullable :doc:`integer <integer_na>`, boolean and
791791
:ref:`dedicated string <text.types>` data types as the missing value indicator.
792792

793793
The goal of ``pd.NA`` is provide a "missing" indicator that can be used
794-
consistently accross data types (instead of ``np.nan``, ``None`` or ``pd.NaT``
794+
consistently across data types (instead of ``np.nan``, ``None`` or ``pd.NaT``
795795
depending on the data type).
796796

797797
For example, when having missing values in a Series with the nullable integer

doc/source/user_guide/text.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -101,10 +101,10 @@ l. For ``StringDtype``, :ref:`string accessor methods<api.series.str>`
101101
2. Some string methods, like :meth:`Series.str.decode` are not available
102102
on ``StringArray`` because ``StringArray`` only holds strings, not
103103
bytes.
104-
3. In comparision operations, :class:`arrays.StringArray` and ``Series`` backed
104+
3. In comparison operations, :class:`arrays.StringArray` and ``Series`` backed
105105
by a ``StringArray`` will return an object with :class:`BooleanDtype`,
106106
rather than a ``bool`` dtype object. Missing values in a ``StringArray``
107-
will propagate in comparision operations, rather than always comparing
107+
will propagate in comparison operations, rather than always comparing
108108
unequal like :attr:`numpy.nan`.
109109

110110
Everything else that follows in the rest of this document applies equally to

doc/source/whatsnew/v1.0.0.rst

+5-2
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ A new ``pd.NA`` value (singleton) is introduced to represent scalar missing
111111
values. Up to now, ``np.nan`` is used for this for float data, ``np.nan`` or
112112
``None`` for object-dtype data and ``pd.NaT`` for datetime-like data. The
113113
goal of ``pd.NA`` is provide a "missing" indicator that can be used
114-
consistently accross data types. For now, the nullable integer and boolean
114+
consistently across data types. For now, the nullable integer and boolean
115115
data types and the new string data type make use of ``pd.NA`` (:issue:`28095`).
116116

117117
.. warning::
@@ -571,6 +571,7 @@ or ``matplotlib.Axes.plot``. See :ref:`plotting.formatters` for more.
571571
- A tuple passed to :meth:`DataFrame.groupby` is now exclusively treated as a single key (:issue:`18314`)
572572
- Removed the previously deprecated :meth:`Index.contains`, use ``key in index`` instead (:issue:`30103`)
573573
- Addition and subtraction of ``int`` or integer-arrays is no longer allowed in :class:`Timestamp`, :class:`DatetimeIndex`, :class:`TimedeltaIndex`, use ``obj + n * obj.freq`` instead of ``obj + n`` (:issue:`22535`)
574+
- Removed :meth:`Series.ptp` (:issue:`21614`)
574575
- Removed :meth:`Series.from_array` (:issue:`18258`)
575576
- Removed :meth:`DataFrame.from_items` (:issue:`18458`)
576577
- Removed :meth:`DataFrame.as_matrix`, :meth:`Series.as_matrix` (:issue:`18458`)
@@ -716,8 +717,10 @@ Datetimelike
716717
- Bug in :func:`pandas.to_datetime` failing for `deques` when using ``cache=True`` (the default) (:issue:`29403`)
717718
- Bug in :meth:`Series.item` with ``datetime64`` or ``timedelta64`` dtype, :meth:`DatetimeIndex.item`, and :meth:`TimedeltaIndex.item` returning an integer instead of a :class:`Timestamp` or :class:`Timedelta` (:issue:`30175`)
718719
- Bug in :class:`DatetimeIndex` addition when adding a non-optimized :class:`DateOffset` incorrectly dropping timezone information (:issue:`30336`)
720+
- Bug in :meth:`DataFrame.drop` where attempting to drop non-existent values from a DatetimeIndex would yield a confusing error message (:issue:`30399`)
719721
- Bug in :meth:`DataFrame.append` would remove the timezone-awareness of new data (:issue:`30238`)
720722

723+
721724
Timedelta
722725
^^^^^^^^^
723726
- Bug in subtracting a :class:`TimedeltaIndex` or :class:`TimedeltaArray` from a ``np.datetime64`` object (:issue:`29558`)
@@ -825,7 +828,7 @@ Plotting
825828
- Bug where :meth:`DataFrame.boxplot` would not accept a `color` parameter like `DataFrame.plot.box` (:issue:`26214`)
826829
- Bug in the ``xticks`` argument being ignored for :meth:`DataFrame.plot.bar` (:issue:`14119`)
827830
- :func:`set_option` now validates that the plot backend provided to ``'plotting.backend'`` implements the backend when the option is set, rather than when a plot is created (:issue:`28163`)
828-
- :meth:`DataFrame.plot` now allow a ``backend`` keyword arugment to allow changing between backends in one session (:issue:`28619`).
831+
- :meth:`DataFrame.plot` now allow a ``backend`` keyword argument to allow changing between backends in one session (:issue:`28619`).
829832
- Bug in color validation incorrectly raising for non-color styles (:issue:`29122`).
830833

831834
Groupby/resample/rolling

environment.yml

+19-10
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,8 @@ dependencies:
3333
- nbconvert>=5.4.1
3434
- nbsphinx
3535
- pandoc
36-
# Dask and its dependencies
36+
37+
# Dask and its dependencies (that dont install with dask)
3738
- dask-core
3839
- toolz>=0.7.3
3940
- fsspec>=0.5.1
@@ -54,6 +55,8 @@ dependencies:
5455
- pytest>=5.0.1
5556
- pytest-cov
5657
- pytest-xdist>=1.21
58+
59+
# downstream tests
5760
- seaborn
5861
- statsmodels
5962

@@ -74,22 +77,28 @@ dependencies:
7477
- scipy>=1.1
7578

7679
# optional for io
77-
- beautifulsoup4>=4.6.0 # pandas.read_html
80+
# ---------------
81+
# pd.read_html
82+
- beautifulsoup4>=4.6.0
83+
- html5lib
84+
- lxml
85+
86+
# pd.read_excel, DataFrame.to_excel, pd.ExcelWriter, pd.ExcelFile
87+
- openpyxl<=3.0.1
88+
- xlrd
89+
- xlsxwriter
90+
- xlwt
91+
- odfpy
92+
7893
- fastparquet>=0.3.2 # pandas.read_parquet, DataFrame.to_parquet
79-
- html5lib # pandas.read_html
80-
- lxml # pandas.read_html
81-
- openpyxl<=3.0.1 # pandas.read_excel, DataFrame.to_excel, pandas.ExcelWriter, pandas.ExcelFile
8294
- pyarrow>=0.13.1 # pandas.read_parquet, DataFrame.to_parquet, pandas.read_feather, DataFrame.to_feather
95+
- python-snappy # required by pyarrow
96+
8397
- pyqt>=5.9.2 # pandas.read_clipboard
8498
- pytables>=3.4.2 # pandas.read_hdf, DataFrame.to_hdf
85-
- python-snappy # required by pyarrow
8699
- s3fs # pandas.read_csv... when using 's3://...' path
87100
- sqlalchemy # pandas.read_sql, DataFrame.to_sql
88101
- xarray # DataFrame.to_xarray
89-
- xlrd # pandas.read_excel, DataFrame.to_excel, pandas.ExcelWriter, pandas.ExcelFile
90-
- xlsxwriter # pandas.read_excel, DataFrame.to_excel, pandas.ExcelWriter, pandas.ExcelFile
91-
- xlwt # pandas.read_excel, DataFrame.to_excel, pandas.ExcelWriter, pandas.ExcelFile
92-
- odfpy # pandas.read_excel
93102
- pyreadstat # pandas.read_spss
94103
- pip:
95104
- git+https://github.com/pandas-dev/pandas-sphinx-theme.git@master

pandas/_libs/intervaltree.pxi.in

-37
Original file line numberDiff line numberDiff line change
@@ -114,43 +114,6 @@ cdef class IntervalTree(IntervalMixin):
114114
sort_order = np.lexsort(values)
115115
return is_monotonic(sort_order, False)[0]
116116

117-
def get_loc(self, scalar_t key):
118-
"""Return all positions corresponding to intervals that overlap with
119-
the given scalar key
120-
"""
121-
result = Int64Vector()
122-
self.root.query(result, key)
123-
if not result.data.n:
124-
raise KeyError(key)
125-
return result.to_array().astype('intp')
126-
127-
def _get_partial_overlap(self, key_left, key_right, side):
128-
"""Return all positions corresponding to intervals with the given side
129-
falling between the left and right bounds of an interval query
130-
"""
131-
if side == 'left':
132-
values = self.left
133-
sorter = self.left_sorter
134-
else:
135-
values = self.right
136-
sorter = self.right_sorter
137-
key = [key_left, key_right]
138-
i, j = values.searchsorted(key, sorter=sorter)
139-
return sorter[i:j]
140-
141-
def get_loc_interval(self, key_left, key_right):
142-
"""Lookup the intervals enclosed in the given interval bounds
143-
144-
The given interval is presumed to have closed bounds.
145-
"""
146-
import pandas as pd
147-
left_overlap = self._get_partial_overlap(key_left, key_right, 'left')
148-
right_overlap = self._get_partial_overlap(key_left, key_right, 'right')
149-
enclosing = self.get_loc(0.5 * (key_left + key_right))
150-
combined = np.concatenate([left_overlap, right_overlap, enclosing])
151-
uniques = pd.unique(combined)
152-
return uniques.astype('intp')
153-
154117
def get_indexer(self, scalar_t[:] target):
155118
"""Return the positions corresponding to unique intervals that overlap
156119
with the given array of scalar targets.

pandas/core/arrays/datetimelike.py

+4-6
Original file line numberDiff line numberDiff line change
@@ -915,10 +915,8 @@ def _is_unique(self):
915915
__rdivmod__ = make_invalid_op("__rdivmod__")
916916

917917
def _add_datetimelike_scalar(self, other):
918-
# Overriden by TimedeltaArray
919-
raise TypeError(
920-
f"cannot add {type(self).__name__} and " f"{type(other).__name__}"
921-
)
918+
# Overridden by TimedeltaArray
919+
raise TypeError(f"cannot add {type(self).__name__} and {type(other).__name__}")
922920

923921
_add_datetime_arraylike = _add_datetimelike_scalar
924922

@@ -930,7 +928,7 @@ def _sub_datetimelike_scalar(self, other):
930928
_sub_datetime_arraylike = _sub_datetimelike_scalar
931929

932930
def _sub_period(self, other):
933-
# Overriden by PeriodArray
931+
# Overridden by PeriodArray
934932
raise TypeError(f"cannot subtract Period from a {type(self).__name__}")
935933

936934
def _add_offset(self, offset):
@@ -1087,7 +1085,7 @@ def _addsub_int_array(self, other, op):
10871085
-------
10881086
result : same class as self
10891087
"""
1090-
# _addsub_int_array is overriden by PeriodArray
1088+
# _addsub_int_array is overridden by PeriodArray
10911089
assert not is_period_dtype(self)
10921090
assert op in [operator.add, operator.sub]
10931091

pandas/core/base.py

+33-38
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,6 @@
22
Base and utility classes for pandas objects.
33
"""
44
import builtins
5-
from collections import OrderedDict
65
import textwrap
76
from typing import Dict, FrozenSet, List, Optional
87

@@ -141,39 +140,35 @@ class SelectionMixin:
141140
_internal_names = ["_cache", "__setstate__"]
142141
_internal_names_set = set(_internal_names)
143142

144-
_builtin_table = OrderedDict(
145-
((builtins.sum, np.sum), (builtins.max, np.max), (builtins.min, np.min))
146-
)
147-
148-
_cython_table = OrderedDict(
149-
(
150-
(builtins.sum, "sum"),
151-
(builtins.max, "max"),
152-
(builtins.min, "min"),
153-
(np.all, "all"),
154-
(np.any, "any"),
155-
(np.sum, "sum"),
156-
(np.nansum, "sum"),
157-
(np.mean, "mean"),
158-
(np.nanmean, "mean"),
159-
(np.prod, "prod"),
160-
(np.nanprod, "prod"),
161-
(np.std, "std"),
162-
(np.nanstd, "std"),
163-
(np.var, "var"),
164-
(np.nanvar, "var"),
165-
(np.median, "median"),
166-
(np.nanmedian, "median"),
167-
(np.max, "max"),
168-
(np.nanmax, "max"),
169-
(np.min, "min"),
170-
(np.nanmin, "min"),
171-
(np.cumprod, "cumprod"),
172-
(np.nancumprod, "cumprod"),
173-
(np.cumsum, "cumsum"),
174-
(np.nancumsum, "cumsum"),
175-
)
176-
)
143+
_builtin_table = {builtins.sum: np.sum, builtins.max: np.max, builtins.min: np.min}
144+
145+
_cython_table = {
146+
builtins.sum: "sum",
147+
builtins.max: "max",
148+
builtins.min: "min",
149+
np.all: "all",
150+
np.any: "any",
151+
np.sum: "sum",
152+
np.nansum: "sum",
153+
np.mean: "mean",
154+
np.nanmean: "mean",
155+
np.prod: "prod",
156+
np.nanprod: "prod",
157+
np.std: "std",
158+
np.nanstd: "std",
159+
np.var: "var",
160+
np.nanvar: "var",
161+
np.median: "median",
162+
np.nanmedian: "median",
163+
np.max: "max",
164+
np.nanmax: "max",
165+
np.min: "min",
166+
np.nanmin: "min",
167+
np.cumprod: "cumprod",
168+
np.nancumprod: "cumprod",
169+
np.cumsum: "cumsum",
170+
np.nancumsum: "cumsum",
171+
}
177172

178173
@property
179174
def _selection_name(self):
@@ -328,7 +323,7 @@ def _aggregate(self, arg, *args, **kwargs):
328323
# eg. {'A' : ['mean']}, normalize all to
329324
# be list-likes
330325
if any(is_aggregator(x) for x in arg.values()):
331-
new_arg = OrderedDict()
326+
new_arg = {}
332327
for k, v in arg.items():
333328
if not isinstance(v, (tuple, list, dict)):
334329
new_arg[k] = [v]
@@ -386,16 +381,16 @@ def _agg_2dim(name, how):
386381
def _agg(arg, func):
387382
"""
388383
run the aggregations over the arg with func
389-
return an OrderedDict
384+
return a dict
390385
"""
391-
result = OrderedDict()
386+
result = {}
392387
for fname, agg_how in arg.items():
393388
result[fname] = func(fname, agg_how)
394389
return result
395390

396391
# set the final keys
397392
keys = list(arg.keys())
398-
result = OrderedDict()
393+
result = {}
399394

400395
if self._selection is not None:
401396

pandas/core/dtypes/cast.py

+2-6
Original file line numberDiff line numberDiff line change
@@ -820,9 +820,7 @@ def astype_nansafe(arr, dtype, copy: bool = True, skipna: bool = False):
820820
if dtype.kind == "M":
821821
return arr.astype(dtype)
822822

823-
raise TypeError(
824-
f"cannot astype a datetimelike from [{arr.dtype}] " f"to [{dtype}]"
825-
)
823+
raise TypeError(f"cannot astype a datetimelike from [{arr.dtype}] to [{dtype}]")
826824

827825
elif is_timedelta64_dtype(arr):
828826
if is_object_dtype(dtype):
@@ -842,9 +840,7 @@ def astype_nansafe(arr, dtype, copy: bool = True, skipna: bool = False):
842840
elif dtype == _TD_DTYPE:
843841
return arr.astype(_TD_DTYPE, copy=copy)
844842

845-
raise TypeError(
846-
f"cannot astype a timedelta from [{arr.dtype}] " f"to [{dtype}]"
847-
)
843+
raise TypeError(f"cannot astype a timedelta from [{arr.dtype}] to [{dtype}]")
848844

849845
elif np.issubdtype(arr.dtype, np.floating) and np.issubdtype(dtype, np.integer):
850846

0 commit comments

Comments
 (0)