Skip to content

Dispatch Index ops to Series #27352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jul 12, 2019
Merged

Dispatch Index ops to Series #27352

merged 6 commits into from
Jul 12, 2019

Conversation

jbrockmendel
Copy link
Member

Getting ever closer to a single implementation.

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Jul 12, 2019

Looks reasonable. What's the ideal final state? For future mico-optimization, I could imagine an implementation that does index-specific things (e.g. deferring for op(Index, Series)) and then just box(op(array1, array2)). It's a bit unfortunate that we have to pay the Series constructor performance tax on each op.

@jbrockmendel
Copy link
Member Author

What's the ideal final state?

  • Contingent on 2D EA happening, the ops are defined directly on PandasArray and either Index._data is a PandasArray or calls PandasArray(self._data) in the op instead of Series(self)
  • Otherwise, it'll take some closure-rearrangement, but I think we can get a version of those ops that operates directly on ndarray.
  • Regardless: a single implementation so we get internal consistency for free (not-yet-conformant: RangeIndex, IntegerArray, SparseArray)

@gfyoung gfyoung added Internals Related to non-user accessible pandas implementation Refactor Internal refactoring of code Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff and removed Internals Related to non-user accessible pandas implementation labels Jul 12, 2019
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, minor comments ping on green. note still want to move things out of pandas/core/ops/init.py to a named module.

elif isinstance(right, (ABCDatetimeArray, pd.DatetimeIndex)):
result = op(left._values, right)
return construct_result(left, result, index=left.index, name=res_name)

lvalues = left.values
rvalues = right
if isinstance(rvalues, ABCSeries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these cases are the same, yes?

isinstance(rvalues, (ABCSeries, ABCIndexClass))

left : object (Index for non-reversed ops)
right : object (Index fof reversed ops)
left : object (np.ndarray for non-reversed ops)
right : object (np.ndarray fof reversed ops)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fof -> for

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed+green

@jreback jreback added this to the 0.25.0 milestone Jul 12, 2019
@jreback jreback merged commit 84136d5 into pandas-dev:master Jul 12, 2019
@jreback
Copy link
Contributor

jreback commented Jul 12, 2019

thanks @jbrockmendel

@jbrockmendel jbrockmendel deleted the same_ops branch July 12, 2019 16:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Algos Non-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diff Refactor Internal refactoring of code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants