API: Make Series.searchsorted return a scalar, when supplied a scalar #23801

topper-123 · 2018-11-20T00:30:53Z

xref API: Make Categorical.searchsorted returns a scalar when supplied a scalar #23466 & PERF/REF: improve performance of Series.searchsorted, PandasArray.searchsorted, collect functionality #22034
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

Series.searchsorted returns the wrong shape for scalar input. Numpy arrays and all other array types return a scalar if the input is a scalar, but Series does not.

For example:

>>> import numpy as np
>>> np.array([1, 2, 3]).searchsorted(1)
0
>>> np.array([1, 2, 3]).searchsorted([1])
array([0])
>>> import pandas as pd
>>> d = pd.date_range('2018', periods=4)
>>> d.searchsorted(d[0])
0
>>> d.searchsorted(d[:1])
array([0])

>>> n = 100_000
>>> s = pd.Series(([0]*n + [1]*n + + [2]*n))
>>> s.searchsorted(1)
array([100000], dtype=int32)  # master
100000  # this PR. Scalar input should lead to scalar output
>>> s.searchsorted([1])
array([100000], dtype=int32)  # master and this PR

pep8speaks · 2018-11-20T00:31:01Z

Hello @topper-123! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/base.py !
There are no PEP8 issues in the file pandas/core/series.py !
There are no PEP8 issues in the file pandas/tests/arrays/categorical/test_analytics.py !
There are no PEP8 issues in the file pandas/tests/series/test_analytics.py !
There are no PEP8 issues in the file pandas/util/testing.py !

Comment last updated on November 20, 2018 at 00:41 Hours UTC

jreback · 2018-11-20T01:39:15Z

pandas/util/testing.py

@@ -27,7 +27,8 @@
    is_bool, is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype,
    is_datetimelike_v_numeric, is_datetimelike_v_object,
    is_extension_array_dtype, is_interval_dtype, is_list_like, is_number,
-    is_period_dtype, is_sequence, is_timedelta64_dtype, needs_i8_conversion)
+    is_period_dtype, is_scalar, is_sequence, is_timedelta64_dtype,
+    needs_i8_conversion)  # noqa


why did you change this?

´´is_scalar`` wasn't in testing, I need it in my tests and think it's helpful to have in this common testing module.

I need to to for scalar because assert np.array([2]) == 2 passes without an AssertionError...

Though I do get an flake8 error for unused import. Trying to resolve that...

jreback · 2018-11-20T01:39:30Z

doc/source/whatsnew/v0.24.0.rst

@@ -1008,6 +1008,7 @@ Other API Changes
 - Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
 - :class:`DateOffset` attribute `_cacheable` and method `_should_cache` have been removed (:issue:`23118`)
 - Comparing :class:`Timedelta` to be less or greater than unknown types now raises a ``TypeError`` instead of returning ``False`` (:issue:`20829`)
+- :meth:`Series.searchsorted`, when supplied a scalar value to search for, now returns a scalar instead of an array (:issue:`xxxxx`).


use this PR number

jschendel · 2018-11-20T02:38:24Z

pandas/core/base.py

+            same shape as `value`.
+
+            .. versionchanged :: 0.24.0
+                Ìf `value`is a scalar, an int is now always returned.


"Ìf" -- > "If"

missing a space between "`value`" and "is"

topper-123 · 2018-11-20T11:45:33Z

pandas/tests/arrays/categorical/test_analytics.py

@@ -86,9 +86,11 @@ def test_searchsorted(self):
        # Searching for single item argument, side='left' (default)
        res_cat = c1.searchsorted('apple')
        assert res_cat == 2
+        assert tm.is_scalar(res_cat)


tm.is_scalar guards against res_cat being an array, as assert np.array(...) == 2 would always pass the assertion test, as long as the array is non-empty.

Another way would be to to do assert (res_cat == 2) is True.

topper-123 · 2018-11-20T12:35:47Z

TheTravis failure is unrelated (hypothesis related)

>   @given(index=indices(max_length=5), num_columns=integers(0, 5))
>   def test_frequency_is_original(self, index, num_columns):
pandas/tests/frame/test_apply.py:1160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/pandas/lib/python2.7/site-packages/hypothesis/core.py:638: in execute
    ) % (test.__name__, text_repr[0],))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <hypothesis.core.StateForActualGivenExecution object at 0x7f2c035ae050>
message = "Hypothesis test_frequency_is_original(self=<pandas.tests.frame.test_apply.TestDataFrameAggregate instance at 0x7f2c3f...', freq=None), num_columns=3) produces unreliable results: Falsified on the first call but did not on a subsequent one"
    def __flaky(self, message):
        if len(self.falsifying_examples) <= 1:
>           raise Flaky(message)
E           Flaky: Hypothesis test_frequency_is_original(self=<pandas.tests.frame.test_apply.TestDataFrameAggregate instance at 0x7f2c3f1955a8>, index=DatetimeIndex(['2237-05-12', '2237-05-13', '2237-05-14', '2237-05-15'], dtype='datetime64[ns]', freq=None), num_columns=3) produces unreliable results: Falsified on the first call but did not on a subsequent one
../../../miniconda3/envs/pandas/lib/python2.7/site-packages/hypothesis/core.py:884: Flaky

jreback · 2018-11-20T13:04:22Z

pandas/tests/indexes/multi/test_monotonic.py

@@ -182,22 +183,28 @@ def test_searchsorted_monotonic(indices):
    # test searchsorted only for increasing
    if indices.is_monotonic_increasing:


pls don't import is_scalar from testing, import it from the canonical location

pandas.api.types.is_scalar.

jreback · 2018-11-20T13:04:41Z

pandas/util/testing.py

@@ -28,6 +28,7 @@
    is_datetimelike_v_numeric, is_datetimelike_v_object,
    is_extension_array_dtype, is_interval_dtype, is_list_like, is_number,
    is_period_dtype, is_sequence, is_timedelta64_dtype, needs_i8_conversion)
+from pandas.core.dtypes.common import is_scalar  # noqa: F401


don't add this

these are internally used imports. this is not exporting these functions.

jreback · 2018-12-09T20:12:56Z

@topper-123 happy to have this if it can be done in a reasonbale way. pls update or close.

codecov · 2018-12-09T23:56:26Z

Codecov Report

Merging #23801 into master will increase coverage by 0.07%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23801      +/-   ##
==========================================
+ Coverage   92.21%   92.28%   +0.07%     
==========================================
  Files         162      161       -1     
  Lines       51723    51459     -264     
==========================================
- Hits        47695    47491     -204     
+ Misses       4028     3968      -60

Flag	Coverage Δ
#multiple	`90.68% <100%> (+0.07%)`	⬆️
#single	`42.31% <33.33%> (-0.73%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/base.py	`97.61% <ø> (-0.03%)`	⬇️
pandas/core/series.py	`93.68% <100%> (-0.02%)`	⬇️
pandas/util/testing.py	`86.05% <100%> (-1.36%)`	⬇️
pandas/core/sparse/scipy_sparse.py	`97.05% <0%> (-2.95%)`	⬇️
pandas/io/common.py	`70.54% <0%> (-1.56%)`	⬇️
pandas/io/html.py	`91.2% <0%> (-1.47%)`	⬇️
pandas/core/dtypes/common.py	`94.37% <0%> (-1.26%)`	⬇️
pandas/core/indexes/timedeltas.py	`89.26% <0%> (-1.16%)`	⬇️
pandas/io/formats/html.py	`97.68% <0%> (-0.93%)`	⬇️
... and 89 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77278ba...9d74e8c. Read the comment docs.

codecov · 2018-12-09T23:56:27Z

Codecov Report

Merging #23801 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #23801      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         162      162              
  Lines       51845    51846       +1     
==========================================
+ Hits        47852    47853       +1     
  Misses       3993     3993

Flag	Coverage Δ
#multiple	`90.7% <100%> (ø)`	⬆️
#single	`42.97% <16.66%> (-0.01%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/series.py	`93.71% <100%> (ø)`	⬆️
pandas/core/base.py	`97.66% <100%> (ø)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04a0eac...c09a411. Read the comment docs.

topper-123 · 2018-12-10T08:39:07Z

I've updated the PR.

topper-123 · 2018-12-13T22:38:57Z

Is thia ok? This would make the searchsorted method for all of the pandas objects return scalars for scalar inputs and make the API consistent.

jreback · 2018-12-15T22:07:00Z

this PR looked ok, merge master.

topper-123 · 2018-12-16T13:18:36Z

Rebased.

jreback · 2018-12-17T13:05:43Z

pandas/core/base.py

-            Array of insertion points with the same shape as `value`.
+        int or array of ints
+            A scalar or array of insertion points with the
+            same shape as `value`.


@datapythonista is this the correct format?

looks good, but can you use int or array of int (I'd like to parse at some point the types from these, so I prefer the type name int over ints)

datapythonista · 2018-12-18T00:17:58Z

pandas/core/base.py

-            Array of insertion points with the same shape as `value`.
+        int or array of ints
+            A scalar or array of insertion points with the
+            same shape as `value`.


looks good, but can you use int or array of int (I'd like to parse at some point the types from these, so I prefer the type name int over ints)

datapythonista · 2018-12-18T00:18:58Z

pandas/core/base.py

-        value : array_like
-            Values to insert into `self`.
+        value : scalar or array_like
+            Value(s) to insert into `self`.t


Suggested change

Value(s) to insert into `self`.t

Value(s) to insert into ``self.t``.

I think render the whole self.t as code makes more sense (also, we end parameter descriptions with a period).

I think this is actually better: Value(s) to search for.

As we're nor actually inserting, but searching?

…pandas-dev#23801)

topper-123 force-pushed the Series.searchsorted_api branch from 9323f92 to ef5c259 Compare November 20, 2018 00:40

jreback requested changes Nov 20, 2018

View reviewed changes

jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Nov 20, 2018

jschendel reviewed Nov 20, 2018

View reviewed changes

topper-123 force-pushed the Series.searchsorted_api branch 2 times, most recently from c9b3c91 to e998d7c Compare November 20, 2018 11:39

topper-123 commented Nov 20, 2018

View reviewed changes

jreback requested changes Nov 20, 2018

View reviewed changes

topper-123 force-pushed the Series.searchsorted_api branch from e998d7c to 9d74e8c Compare December 9, 2018 23:56

topper-123 force-pushed the Series.searchsorted_api branch from 9d74e8c to 49bcca5 Compare December 10, 2018 01:04

topper-123 force-pushed the Series.searchsorted_api branch from 49bcca5 to 4b31c78 Compare December 16, 2018 09:50

jreback requested changes Dec 17, 2018

View reviewed changes

jreback added this to the 0.24.0 milestone Dec 17, 2018

datapythonista reviewed Dec 18, 2018

View reviewed changes

topper-123 force-pushed the Series.searchsorted_api branch from b7e502e to 27af08e Compare December 18, 2018 14:57

topper-123 added 3 commits December 21, 2018 15:47

API: Make Series.searchsorted return a scalar, when supplied a scalar

bf8ec16

Adjust according to comments

e8aef0b

adjust for comments

b59e640

topper-123 force-pushed the Series.searchsorted_api branch 2 times, most recently from 2545f4a to 8f9c917 Compare December 21, 2018 16:00

adjust for comments

c09a411

topper-123 force-pushed the Series.searchsorted_api branch from 8f9c917 to c09a411 Compare December 21, 2018 16:05

jreback approved these changes Dec 21, 2018

View reviewed changes

jreback merged commit 8c58817 into pandas-dev:master Dec 21, 2018

topper-123 deleted the Series.searchsorted_api branch December 21, 2018 18:37

jorisvandenbossche mentioned this pull request Jan 8, 2019

COMPAT: use numpy searchsorted geopandas/contextily#52

Merged

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

API: Make Series.searchsorted return a scalar, when supplied a scalar (…

80a8b67

…pandas-dev#23801)

Pingviinituutti pushed a commit to Pingviinituutti/pandas that referenced this pull request Feb 28, 2019

API: Make Series.searchsorted return a scalar, when supplied a scalar (…

f8df4b7

…pandas-dev#23801)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API: Make Series.searchsorted return a scalar, when supplied a scalar #23801

API: Make Series.searchsorted return a scalar, when supplied a scalar #23801

topper-123 commented Nov 20, 2018

pep8speaks commented Nov 20, 2018 •

edited

Loading

jreback Nov 20, 2018

topper-123 Nov 20, 2018

topper-123 Nov 20, 2018

jreback Nov 20, 2018

jschendel Nov 20, 2018

topper-123 Nov 20, 2018

topper-123 commented Nov 20, 2018

jreback Nov 20, 2018

jreback Nov 20, 2018

jreback Nov 20, 2018

jreback commented Dec 9, 2018

codecov bot commented Dec 9, 2018

codecov bot commented Dec 9, 2018 •

edited

Loading

topper-123 commented Dec 10, 2018

topper-123 commented Dec 13, 2018

jreback commented Dec 15, 2018

topper-123 commented Dec 16, 2018

jreback Dec 17, 2018

datapythonista Dec 18, 2018

datapythonista Dec 18, 2018

datapythonista Dec 18, 2018

topper-123 Dec 18, 2018 •

edited

Loading

		@@ -182,22 +183,28 @@ def test_searchsorted_monotonic(indices):
		# test searchsorted only for increasing
		if indices.is_monotonic_increasing:

	Value(s) to insert into `self`.t
	Value(s) to insert into ``self.t``.

API: Make Series.searchsorted return a scalar, when supplied a scalar #23801

API: Make Series.searchsorted return a scalar, when supplied a scalar #23801

Conversation

topper-123 commented Nov 20, 2018

pep8speaks commented Nov 20, 2018 • edited Loading

Comment last updated on November 20, 2018 at 00:41 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 commented Nov 20, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 9, 2018

codecov bot commented Dec 9, 2018

Codecov Report

codecov bot commented Dec 9, 2018 • edited Loading

Codecov Report

topper-123 commented Dec 10, 2018

topper-123 commented Dec 13, 2018

jreback commented Dec 15, 2018

topper-123 commented Dec 16, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

topper-123 Dec 18, 2018 • edited Loading

Choose a reason for hiding this comment

pep8speaks commented Nov 20, 2018 •

edited

Loading

codecov bot commented Dec 9, 2018 •

edited

Loading

topper-123 Dec 18, 2018 •

edited

Loading