Skip to content

API: Make Series.searchsorted return a scalar, when supplied a scalar #23801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 21, 2018

Conversation

topper-123
Copy link
Contributor

Series.searchsorted returns the wrong shape for scalar input. Numpy arrays and all other array types return a scalar if the input is a scalar, but Series does not.

For example:

>>> import numpy as np
>>> np.array([1, 2, 3]).searchsorted(1)
0
>>> np.array([1, 2, 3]).searchsorted([1])
array([0])
>>> import pandas as pd
>>> d = pd.date_range('2018', periods=4)
>>> d.searchsorted(d[0])
0
>>> d.searchsorted(d[:1])
array([0])

>>> n = 100_000
>>> s = pd.Series(([0]*n + [1]*n + + [2]*n))
>>> s.searchsorted(1)
array([100000], dtype=int32)  # master
100000  # this PR. Scalar input should lead to scalar output
>>> s.searchsorted([1])
array([100000], dtype=int32)  # master and this PR

@pep8speaks
Copy link

pep8speaks commented Nov 20, 2018

Hello @topper-123! Thanks for updating the PR.

Comment last updated on November 20, 2018 at 00:41 Hours UTC

@topper-123 topper-123 force-pushed the Series.searchsorted_api branch from 9323f92 to ef5c259 Compare November 20, 2018 00:40
@@ -27,7 +27,8 @@
is_bool, is_categorical_dtype, is_datetime64_dtype, is_datetime64tz_dtype,
is_datetimelike_v_numeric, is_datetimelike_v_object,
is_extension_array_dtype, is_interval_dtype, is_list_like, is_number,
is_period_dtype, is_sequence, is_timedelta64_dtype, needs_i8_conversion)
is_period_dtype, is_scalar, is_sequence, is_timedelta64_dtype,
needs_i8_conversion) # noqa
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you change this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

´´is_scalar`` wasn't in testing, I need it in my tests and think it's helpful to have in this common testing module.

I need to to for scalar because assert np.array([2]) == 2 passes without an AssertionError...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I do get an flake8 error for unused import. Trying to resolve that...

@@ -1008,6 +1008,7 @@ Other API Changes
- Slicing a single row of a DataFrame with multiple ExtensionArrays of the same type now preserves the dtype, rather than coercing to object (:issue:`22784`)
- :class:`DateOffset` attribute `_cacheable` and method `_should_cache` have been removed (:issue:`23118`)
- Comparing :class:`Timedelta` to be less or greater than unknown types now raises a ``TypeError`` instead of returning ``False`` (:issue:`20829`)
- :meth:`Series.searchsorted`, when supplied a scalar value to search for, now returns a scalar instead of an array (:issue:`xxxxx`).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use this PR number

@jreback jreback added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Dtype Conversions Unexpected or buggy dtype conversions labels Nov 20, 2018
same shape as `value`.

.. versionchanged :: 0.24.0
Ìf `value`is a scalar, an int is now always returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • "Ìf" -- > "If"
  • missing a space between "`value`" and "is"

@topper-123 topper-123 force-pushed the Series.searchsorted_api branch 2 times, most recently from c9b3c91 to e998d7c Compare November 20, 2018 11:39
@@ -86,9 +86,11 @@ def test_searchsorted(self):
# Searching for single item argument, side='left' (default)
res_cat = c1.searchsorted('apple')
assert res_cat == 2
assert tm.is_scalar(res_cat)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tm.is_scalar guards against res_cat being an array, as assert np.array(...) == 2 would always pass the assertion test, as long as the array is non-empty.

Another way would be to to do assert (res_cat == 2) is True.

@topper-123
Copy link
Contributor Author

TheTravis failure is unrelated (hypothesis related)

>   @given(index=indices(max_length=5), num_columns=integers(0, 5))
>   def test_frequency_is_original(self, index, num_columns):
pandas/tests/frame/test_apply.py:1160: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../../miniconda3/envs/pandas/lib/python2.7/site-packages/hypothesis/core.py:638: in execute
    ) % (test.__name__, text_repr[0],))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <hypothesis.core.StateForActualGivenExecution object at 0x7f2c035ae050>
message = "Hypothesis test_frequency_is_original(self=<pandas.tests.frame.test_apply.TestDataFrameAggregate instance at 0x7f2c3f...', freq=None), num_columns=3) produces unreliable results: Falsified on the first call but did not on a subsequent one"
    def __flaky(self, message):
        if len(self.falsifying_examples) <= 1:
>           raise Flaky(message)
E           Flaky: Hypothesis test_frequency_is_original(self=<pandas.tests.frame.test_apply.TestDataFrameAggregate instance at 0x7f2c3f1955a8>, index=DatetimeIndex(['2237-05-12', '2237-05-13', '2237-05-14', '2237-05-15'], dtype='datetime64[ns]', freq=None), num_columns=3) produces unreliable results: Falsified on the first call but did not on a subsequent one
../../../miniconda3/envs/pandas/lib/python2.7/site-packages/hypothesis/core.py:884: Flaky

@@ -182,22 +183,28 @@ def test_searchsorted_monotonic(indices):
# test searchsorted only for increasing
if indices.is_monotonic_increasing:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls don't import is_scalar from testing, import it from the canonical location

pandas.api.types.is_scalar.

@@ -28,6 +28,7 @@
is_datetimelike_v_numeric, is_datetimelike_v_object,
is_extension_array_dtype, is_interval_dtype, is_list_like, is_number,
is_period_dtype, is_sequence, is_timedelta64_dtype, needs_i8_conversion)
from pandas.core.dtypes.common import is_scalar # noqa: F401
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't add this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are internally used imports. this is not exporting these functions.

@jreback
Copy link
Contributor

jreback commented Dec 9, 2018

@topper-123 happy to have this if it can be done in a reasonbale way. pls update or close.

@topper-123 topper-123 force-pushed the Series.searchsorted_api branch from e998d7c to 9d74e8c Compare December 9, 2018 23:56
@codecov
Copy link

codecov bot commented Dec 9, 2018

Codecov Report

Merging #23801 into master will increase coverage by 0.07%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23801      +/-   ##
==========================================
+ Coverage   92.21%   92.28%   +0.07%     
==========================================
  Files         162      161       -1     
  Lines       51723    51459     -264     
==========================================
- Hits        47695    47491     -204     
+ Misses       4028     3968      -60
Flag Coverage Δ
#multiple 90.68% <100%> (+0.07%) ⬆️
#single 42.31% <33.33%> (-0.73%) ⬇️
Impacted Files Coverage Δ
pandas/core/base.py 97.61% <ø> (-0.03%) ⬇️
pandas/core/series.py 93.68% <100%> (-0.02%) ⬇️
pandas/util/testing.py 86.05% <100%> (-1.36%) ⬇️
pandas/core/sparse/scipy_sparse.py 97.05% <0%> (-2.95%) ⬇️
pandas/io/common.py 70.54% <0%> (-1.56%) ⬇️
pandas/io/html.py 91.2% <0%> (-1.47%) ⬇️
pandas/core/dtypes/common.py 94.37% <0%> (-1.26%) ⬇️
pandas/core/indexes/timedeltas.py 89.26% <0%> (-1.16%) ⬇️
pandas/io/formats/html.py 97.68% <0%> (-0.93%) ⬇️
... and 89 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 77278ba...9d74e8c. Read the comment docs.

@codecov
Copy link

codecov bot commented Dec 9, 2018

Codecov Report

Merging #23801 into master will increase coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #23801      +/-   ##
==========================================
+ Coverage   92.29%   92.29%   +<.01%     
==========================================
  Files         162      162              
  Lines       51845    51846       +1     
==========================================
+ Hits        47852    47853       +1     
  Misses       3993     3993
Flag Coverage Δ
#multiple 90.7% <100%> (ø) ⬆️
#single 42.97% <16.66%> (-0.01%) ⬇️
Impacted Files Coverage Δ
pandas/core/series.py 93.71% <100%> (ø) ⬆️
pandas/core/base.py 97.66% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 04a0eac...c09a411. Read the comment docs.

@topper-123 topper-123 force-pushed the Series.searchsorted_api branch from 9d74e8c to 49bcca5 Compare December 10, 2018 01:04
@topper-123
Copy link
Contributor Author

I've updated the PR.

@topper-123
Copy link
Contributor Author

Is thia ok? This would make the searchsorted method for all of the pandas objects return scalars for scalar inputs and make the API consistent.

@jreback
Copy link
Contributor

jreback commented Dec 15, 2018

this PR looked ok, merge master.

@topper-123 topper-123 force-pushed the Series.searchsorted_api branch from 49bcca5 to 4b31c78 Compare December 16, 2018 09:50
@topper-123
Copy link
Contributor Author

Rebased.

Array of insertion points with the same shape as `value`.
int or array of ints
A scalar or array of insertion points with the
same shape as `value`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datapythonista is this the correct format?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but can you use int or array of int (I'd like to parse at some point the types from these, so I prefer the type name int over ints)

@jreback jreback added this to the 0.24.0 milestone Dec 17, 2018
Array of insertion points with the same shape as `value`.
int or array of ints
A scalar or array of insertion points with the
same shape as `value`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, but can you use int or array of int (I'd like to parse at some point the types from these, so I prefer the type name int over ints)

value : array_like
Values to insert into `self`.
value : scalar or array_like
Value(s) to insert into `self`.t
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Value(s) to insert into `self`.t
Value(s) to insert into ``self.t``.

I think render the whole self.t as code makes more sense (also, we end parameter descriptions with a period).

Copy link
Contributor Author

@topper-123 topper-123 Dec 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is actually better: Value(s) to search for.

As we're nor actually inserting, but searching?

@topper-123 topper-123 force-pushed the Series.searchsorted_api branch from b7e502e to 27af08e Compare December 18, 2018 14:57
@topper-123 topper-123 force-pushed the Series.searchsorted_api branch 2 times, most recently from 2545f4a to 8f9c917 Compare December 21, 2018 16:00
@topper-123 topper-123 force-pushed the Series.searchsorted_api branch from 8f9c917 to c09a411 Compare December 21, 2018 16:05
@jreback jreback merged commit 8c58817 into pandas-dev:master Dec 21, 2018
@topper-123 topper-123 deleted the Series.searchsorted_api branch December 21, 2018 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants