Skip to content

Commit dd9efd7

Browse files
committed
Merge branch 'master' into cleanup/matplotlib-style
2 parents 4479e37 + de5349a commit dd9efd7

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+1311
-1092
lines changed

.pre-commit-config.yaml

+5
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,11 @@ repos:
6262
|math|module|note|raw|seealso|toctree|versionadded
6363
|versionchanged|warning):[^:]
6464
files: \.(py|pyx|rst)$
65+
- id: incorrect-code-directives
66+
name: Check for incorrect code block or IPython directives
67+
language: pygrep
68+
entry: (\.\. code-block ::|\.\. ipython ::)
69+
files: \.(py|pyx|rst)$
6570
- repo: https://github.com/asottile/yesqa
6671
rev: v1.2.2
6772
hooks:

ci/code_checks.sh

-18
Original file line numberDiff line numberDiff line change
@@ -207,18 +207,6 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
207207
invgrep -r -E --include '*.py' '(unittest(\.| import )mock|mock\.Mock\(\)|mock\.patch)' pandas/tests/
208208
RET=$(($RET + $?)) ; echo $MSG "DONE"
209209

210-
MSG='Check for wrong space after code-block directive and before colon (".. code-block ::" instead of ".. code-block::")' ; echo $MSG
211-
invgrep -R --include="*.rst" ".. code-block ::" doc/source
212-
RET=$(($RET + $?)) ; echo $MSG "DONE"
213-
214-
MSG='Check for wrong space after ipython directive and before colon (".. ipython ::" instead of ".. ipython::")' ; echo $MSG
215-
invgrep -R --include="*.rst" ".. ipython ::" doc/source
216-
RET=$(($RET + $?)) ; echo $MSG "DONE"
217-
218-
MSG='Check for extra blank lines after the class definition' ; echo $MSG
219-
invgrep -R --include="*.py" --include="*.pyx" -E 'class.*:\n\n( )+"""' .
220-
RET=$(($RET + $?)) ; echo $MSG "DONE"
221-
222210
MSG='Check for use of {foo!r} instead of {repr(foo)}' ; echo $MSG
223211
invgrep -R --include=*.{py,pyx} '!r}' pandas
224212
RET=$(($RET + $?)) ; echo $MSG "DONE"
@@ -243,12 +231,6 @@ if [[ -z "$CHECK" || "$CHECK" == "patterns" ]]; then
243231
invgrep -R --include=*.{py,pyx} '\.__class__' pandas
244232
RET=$(($RET + $?)) ; echo $MSG "DONE"
245233

246-
MSG='Check that no file in the repo contains trailing whitespaces' ; echo $MSG
247-
INVGREP_APPEND=" <- trailing whitespaces found"
248-
invgrep -RI --exclude=\*.{svg,c,cpp,html,js} --exclude-dir=env "\s$" *
249-
RET=$(($RET + $?)) ; echo $MSG "DONE"
250-
unset INVGREP_APPEND
251-
252234
MSG='Check code for instances of os.remove' ; echo $MSG
253235
invgrep -R --include="*.py*" --exclude "common.py" --exclude "test_writers.py" --exclude "test_store.py" -E "os\.remove" pandas/tests/
254236
RET=$(($RET + $?)) ; echo $MSG "DONE"

doc/source/ecosystem.rst

+18-11
Original file line numberDiff line numberDiff line change
@@ -230,7 +230,7 @@ allows users to view, manipulate and edit pandas ``Index``, ``Series``,
230230
and ``DataFrame`` objects like a "spreadsheet", including copying and modifying
231231
values, sorting, displaying a "heatmap", converting data types and more.
232232
pandas objects can also be renamed, duplicated, new columns added,
233-
copyed/pasted to/from the clipboard (as TSV), and saved/loaded to/from a file.
233+
copied/pasted to/from the clipboard (as TSV), and saved/loaded to/from a file.
234234
Spyder can also import data from a variety of plain text and binary files
235235
or the clipboard into a new pandas DataFrame via a sophisticated import wizard.
236236

@@ -376,6 +376,23 @@ Dask-ML enables parallel and distributed machine learning using Dask alongside e
376376

377377
Koalas provides a familiar pandas DataFrame interface on top of Apache Spark. It enables users to leverage multi-cores on one machine or a cluster of machines to speed up or scale their DataFrame code.
378378

379+
`Modin <https://github.com/modin-project/modin>`__
380+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
381+
382+
The ``modin.pandas`` DataFrame is a parallel and distributed drop-in replacement
383+
for pandas. This means that you can use Modin with existing pandas code or write
384+
new code with the existing pandas API. Modin can leverage your entire machine or
385+
cluster to speed up and scale your pandas workloads, including traditionally
386+
time-consuming tasks like ingesting data (``read_csv``, ``read_excel``,
387+
``read_parquet``, etc.).
388+
389+
.. code:: python
390+
391+
# import pandas as pd
392+
import modin.pandas as pd
393+
394+
df = pd.read_csv("big.csv") # use all your cores!
395+
379396
`Odo <http://odo.pydata.org>`__
380397
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
381398

@@ -400,16 +417,6 @@ If also displays progress bars.
400417
# df.apply(func)
401418
df.parallel_apply(func)
402419
403-
`Ray <https://ray.readthedocs.io/en/latest/pandas_on_ray.html>`__
404-
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
405-
406-
pandas on Ray is an early stage DataFrame library that wraps pandas and transparently distributes the data and computation. The user does not need to know how many cores their system has, nor do they need to specify how to distribute the data. In fact, users can continue using their previous pandas notebooks while experiencing a considerable speedup from pandas on Ray, even on a single machine. Only a modification of the import statement is needed, as we demonstrate below. Once you’ve changed your import statement, you’re ready to use pandas on Ray just like you would pandas.
407-
408-
.. code:: python
409-
410-
# import pandas as pd
411-
import ray.dataframe as pd
412-
413420
414421
`Vaex <https://docs.vaex.io/>`__
415422
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/source/whatsnew/v0.16.2.rst

+1-1
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ Bug fixes
147147
- Bug in ``setitem`` where type promotion is applied to the entire block (:issue:`10280`)
148148
- Bug in ``Series`` arithmetic methods may incorrectly hold names (:issue:`10068`)
149149
- Bug in ``GroupBy.get_group`` when grouping on multiple keys, one of which is categorical. (:issue:`10132`)
150-
- Bug in ``DatetimeIndex`` and ``TimedeltaIndex`` names are lost after timedelta arithmetics ( :issue:`9926`)
150+
- Bug in ``DatetimeIndex`` and ``TimedeltaIndex`` names are lost after timedelta arithmetic ( :issue:`9926`)
151151
- Bug in ``DataFrame`` construction from nested ``dict`` with ``datetime64`` (:issue:`10160`)
152152
- Bug in ``Series`` construction from ``dict`` with ``datetime64`` keys (:issue:`9456`)
153153
- Bug in ``Series.plot(label="LABEL")`` not correctly setting the label (:issue:`10119`)

doc/source/whatsnew/v0.24.1.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0241:
22

3-
Whats new in 0.24.1 (February 3, 2019)
4-
--------------------------------------
3+
What's new in 0.24.1 (February 3, 2019)
4+
---------------------------------------
55

66
.. warning::
77

doc/source/whatsnew/v0.24.2.rst

+2-2
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
.. _whatsnew_0242:
22

3-
Whats new in 0.24.2 (March 12, 2019)
4-
------------------------------------
3+
What's new in 0.24.2 (March 12, 2019)
4+
-------------------------------------
55

66
.. warning::
77

doc/source/whatsnew/v1.1.4.rst

+2
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ Fixed regressions
2020
- Fixed regression in :class:`RollingGroupby` with ``sort=False`` not being respected (:issue:`36889`)
2121
- Fixed regression in :meth:`Series.astype` converting ``None`` to ``"nan"`` when casting to string (:issue:`36904`)
2222
- Fixed regression in :class:`RollingGroupby` causing a segmentation fault with Index of dtype object (:issue:`36727`)
23+
- Fixed regression in :meth:`DataFrame.resample(...).apply(...)` raised ``AttributeError`` when input was a :class:`DataFrame` and only a :class:`Series` was evaluated (:issue:`36951`)
2324

2425
.. ---------------------------------------------------------------------------
2526
@@ -30,6 +31,7 @@ Bug fixes
3031
- Bug causing ``groupby(...).sum()`` and similar to not preserve metadata (:issue:`29442`)
3132
- Bug in :meth:`Series.isin` and :meth:`DataFrame.isin` raising a ``ValueError`` when the target was read-only (:issue:`37174`)
3233
- Bug in :meth:`GroupBy.fillna` that introduced a performance regression after 1.0.5 (:issue:`36757`)
34+
- Bug in :meth:`DataFrame.info` was raising a ``KeyError`` when the DataFrame has integer column names (:issue:`37245`)
3335

3436
.. ---------------------------------------------------------------------------
3537

doc/source/whatsnew/v1.2.0.rst

+2-1
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ Alternatively, you can also use the dtype object:
180180
.. warning::
181181

182182
Experimental: the new floating data types are currently experimental, and its
183-
behaviour or API may still change without warning. Expecially the behaviour
183+
behaviour or API may still change without warning. Especially the behaviour
184184
regarding NaN (distinct from NA missing values) is subject to change.
185185

186186
.. _whatsnew_120.index_name_preservation:
@@ -523,6 +523,7 @@ Other
523523

524524
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` incorrectly raising ``AssertionError`` instead of ``ValueError`` when invalid parameter combinations are passed (:issue:`36045`)
525525
- Bug in :meth:`DataFrame.replace` and :meth:`Series.replace` with numeric values and string ``to_replace`` (:issue:`34789`)
526+
- Fixed bug in metadata propagation incorrectly copying DataFrame columns as metadata when the column name overlaps with the metadata name (:issue:`37037`)
526527
- Fixed metadata propagation in the :class:`Series.dt` and :class:`Series.str` accessors (:issue:`28283`)
527528
- Bug in :meth:`Index.union` behaving differently depending on whether operand is a :class:`Index` or other list-like (:issue:`36384`)
528529
- Passing an array with 2 or more dimensions to the :class:`Series` constructor now raises the more specific ``ValueError``, from a bare ``Exception`` previously (:issue:`35744`)

pandas/conftest.py

+13
Original file line numberDiff line numberDiff line change
@@ -361,6 +361,19 @@ def multiindex_year_month_day_dataframe_random_data():
361361
return ymd
362362

363363

364+
@pytest.fixture
365+
def multiindex_dataframe_random_data():
366+
"""DataFrame with 2 level MultiIndex with random data"""
367+
index = MultiIndex(
368+
levels=[["foo", "bar", "baz", "qux"], ["one", "two", "three"]],
369+
codes=[[0, 0, 0, 1, 1, 2, 2, 3, 3, 3], [0, 1, 2, 0, 1, 1, 2, 0, 1, 2]],
370+
names=["first", "second"],
371+
)
372+
return DataFrame(
373+
np.random.randn(10, 3), index=index, columns=Index(["A", "B", "C"], name="exp")
374+
)
375+
376+
364377
def _create_multiindex():
365378
"""
366379
MultiIndex used to test the general functionality of this object

pandas/core/arrays/base.py

+14-2
Original file line numberDiff line numberDiff line change
@@ -507,7 +507,12 @@ def _values_for_argsort(self) -> np.ndarray:
507507
return np.array(self)
508508

509509
def argsort(
510-
self, ascending: bool = True, kind: str = "quicksort", *args, **kwargs
510+
self,
511+
ascending: bool = True,
512+
kind: str = "quicksort",
513+
na_position: str = "last",
514+
*args,
515+
**kwargs,
511516
) -> np.ndarray:
512517
"""
513518
Return the indices that would sort this array.
@@ -538,7 +543,14 @@ def argsort(
538543
# 2. argsort : total control over sorting.
539544
ascending = nv.validate_argsort_with_ascending(ascending, args, kwargs)
540545

541-
result = nargsort(self, kind=kind, ascending=ascending, na_position="last")
546+
values = self._values_for_argsort()
547+
result = nargsort(
548+
values,
549+
kind=kind,
550+
ascending=ascending,
551+
na_position=na_position,
552+
mask=np.asarray(self.isna()),
553+
)
542554
return result
543555

544556
def argmin(self):

pandas/core/generic.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -5364,7 +5364,7 @@ def __finalize__(
53645364

53655365
self.flags.allows_duplicate_labels = other.flags.allows_duplicate_labels
53665366
# For subclasses using _metadata.
5367-
for name in self._metadata:
5367+
for name in set(self._metadata) & set(other._metadata):
53685368
assert isinstance(name, str)
53695369
object.__setattr__(self, name, getattr(other, name, None))
53705370

pandas/core/indexes/base.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -4994,7 +4994,7 @@ def isin(self, values, level=None):
49944994
self._validate_index_level(level)
49954995
return algos.isin(self, values)
49964996

4997-
def _get_string_slice(self, key: str_t, use_lhs: bool = True, use_rhs: bool = True):
4997+
def _get_string_slice(self, key: str_t):
49984998
# this is for partial string indexing,
49994999
# overridden in DatetimeIndex, TimedeltaIndex and PeriodIndex
50005000
raise NotImplementedError

pandas/core/indexes/datetimelike.py

+5-10
Original file line numberDiff line numberDiff line change
@@ -398,16 +398,12 @@ def _partial_date_slice(
398398
self,
399399
reso: Resolution,
400400
parsed: datetime,
401-
use_lhs: bool = True,
402-
use_rhs: bool = True,
403401
):
404402
"""
405403
Parameters
406404
----------
407405
reso : Resolution
408406
parsed : datetime
409-
use_lhs : bool, default True
410-
use_rhs : bool, default True
411407
412408
Returns
413409
-------
@@ -422,8 +418,7 @@ def _partial_date_slice(
422418
if self.is_monotonic:
423419

424420
if len(self) and (
425-
(use_lhs and t1 < self[0] and t2 < self[0])
426-
or (use_rhs and t1 > self[-1] and t2 > self[-1])
421+
(t1 < self[0] and t2 < self[0]) or (t1 > self[-1] and t2 > self[-1])
427422
):
428423
# we are out of range
429424
raise KeyError
@@ -432,13 +427,13 @@ def _partial_date_slice(
432427

433428
# a monotonic (sorted) series can be sliced
434429
# Use asi8.searchsorted to avoid re-validating Periods/Timestamps
435-
left = i8vals.searchsorted(unbox(t1), side="left") if use_lhs else None
436-
right = i8vals.searchsorted(unbox(t2), side="right") if use_rhs else None
430+
left = i8vals.searchsorted(unbox(t1), side="left")
431+
right = i8vals.searchsorted(unbox(t2), side="right")
437432
return slice(left, right)
438433

439434
else:
440-
lhs_mask = (i8vals >= unbox(t1)) if use_lhs else True
441-
rhs_mask = (i8vals <= unbox(t2)) if use_rhs else True
435+
lhs_mask = i8vals >= unbox(t1)
436+
rhs_mask = i8vals <= unbox(t2)
442437

443438
# try to find the dates
444439
return (lhs_mask & rhs_mask).nonzero()[0]

pandas/core/indexes/datetimes.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -729,11 +729,11 @@ def _maybe_cast_slice_bound(self, label, side: str, kind):
729729
self._deprecate_mismatched_indexing(label)
730730
return self._maybe_cast_for_get_loc(label)
731731

732-
def _get_string_slice(self, key: str, use_lhs: bool = True, use_rhs: bool = True):
732+
def _get_string_slice(self, key: str):
733733
freq = getattr(self, "freqstr", getattr(self, "inferred_freq", None))
734734
parsed, reso = parsing.parse_time_string(key, freq)
735735
reso = Resolution.from_attrname(reso)
736-
loc = self._partial_date_slice(reso, parsed, use_lhs=use_lhs, use_rhs=use_rhs)
736+
loc = self._partial_date_slice(reso, parsed)
737737
return loc
738738

739739
def slice_indexer(self, start=None, end=None, step=None, kind=None):

pandas/core/indexes/period.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -622,12 +622,11 @@ def _validate_partial_date_slice(self, reso: Resolution):
622622
# why is that check not needed?
623623
raise ValueError
624624

625-
def _get_string_slice(self, key: str, use_lhs: bool = True, use_rhs: bool = True):
626-
# TODO: Check for non-True use_lhs/use_rhs
625+
def _get_string_slice(self, key: str):
627626
parsed, reso = parse_time_string(key, self.freq)
628627
reso = Resolution.from_attrname(reso)
629628
try:
630-
return self._partial_date_slice(reso, parsed, use_lhs, use_rhs)
629+
return self._partial_date_slice(reso, parsed)
631630
except KeyError as err:
632631
raise KeyError(key) from err
633632

pandas/core/internals/concat.py

+1-3
Original file line numberDiff line numberDiff line change
@@ -217,9 +217,7 @@ def is_na(self) -> bool:
217217
# a block is NOT null, chunks should help in such cases. 1000 value
218218
# was chosen rather arbitrarily.
219219
values = self.block.values
220-
if self.block.is_categorical:
221-
values_flat = values.categories
222-
elif is_sparse(self.block.values.dtype):
220+
if is_sparse(self.block.values.dtype):
223221
return False
224222
elif self.block.is_extension:
225223
# TODO(EA2D): no need for special case with 2D EAs

pandas/core/ops/__init__.py

+5-63
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
from pandas.util._decorators import Appender
1515

1616
from pandas.core.dtypes.common import is_array_like, is_list_like
17-
from pandas.core.dtypes.generic import ABCDataFrame, ABCIndexClass, ABCSeries
17+
from pandas.core.dtypes.generic import ABCDataFrame, ABCSeries
1818
from pandas.core.dtypes.missing import isna
1919

2020
from pandas.core import algorithms
@@ -25,7 +25,10 @@
2525
get_array_op,
2626
logical_op,
2727
)
28-
from pandas.core.ops.common import unpack_zerodim_and_defer # noqa:F401
28+
from pandas.core.ops.common import ( # noqa:F401
29+
get_op_result_name,
30+
unpack_zerodim_and_defer,
31+
)
2932
from pandas.core.ops.docstrings import (
3033
_flex_comp_doc_FRAME,
3134
_op_descriptions,
@@ -76,67 +79,6 @@
7679

7780
COMPARISON_BINOPS: Set[str] = {"eq", "ne", "lt", "gt", "le", "ge"}
7881

79-
# -----------------------------------------------------------------------------
80-
# Ops Wrapping Utilities
81-
82-
83-
def get_op_result_name(left, right):
84-
"""
85-
Find the appropriate name to pin to an operation result. This result
86-
should always be either an Index or a Series.
87-
88-
Parameters
89-
----------
90-
left : {Series, Index}
91-
right : object
92-
93-
Returns
94-
-------
95-
name : object
96-
Usually a string
97-
"""
98-
# `left` is always a Series when called from within ops
99-
if isinstance(right, (ABCSeries, ABCIndexClass)):
100-
name = _maybe_match_name(left, right)
101-
else:
102-
name = left.name
103-
return name
104-
105-
106-
def _maybe_match_name(a, b):
107-
"""
108-
Try to find a name to attach to the result of an operation between
109-
a and b. If only one of these has a `name` attribute, return that
110-
name. Otherwise return a consensus name if they match of None if
111-
they have different names.
112-
113-
Parameters
114-
----------
115-
a : object
116-
b : object
117-
118-
Returns
119-
-------
120-
name : str or None
121-
122-
See Also
123-
--------
124-
pandas.core.common.consensus_name_attr
125-
"""
126-
a_has = hasattr(a, "name")
127-
b_has = hasattr(b, "name")
128-
if a_has and b_has:
129-
if a.name == b.name:
130-
return a.name
131-
else:
132-
# TODO: what if they both have np.nan for their names?
133-
return None
134-
elif a_has:
135-
return a.name
136-
elif b_has:
137-
return b.name
138-
return None
139-
14082

14183
# -----------------------------------------------------------------------------
14284
# Masking NA values and fallbacks for operations numpy does not support

0 commit comments

Comments
 (0)