Skip to content

Additional tests for ufunc(Series) #26951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 21, 2019

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Jun 19, 2019

This adds a set of tests for ufuncs on Series. The goal is to
establish the correct behavior prior to implementing Series.__array_ufunc__.

There are two kinds of xfails right now

  1. Series[Sparse] fails because Series.__array_ufunc__ doesn't
    yet dispatch to Series.array.__array_ufunc__
  2. ufunc(series, series) when the two series are unaligned. It's
    been determined that these should align, but isn't currently
    implemented.

LMK if there's anything I can do to make these tests clearer. The heavy parameterization is a bit unfortunate, but I think the best option.

xref #23293

This adds a set of tests for ufuncs on Series. The goal is to
establish the correct behavior prior to implementing `Series.__array_ufunc__`.

There are two kinds of xfails right now

1. Series[Sparse] fails because `Series.__array_ufunc__` doesn't
   yet dispatch to `Series.array.__array_ufunc__`
2. `ufunc(series, series)` when the two series are unaligned. It's
   been determined that these should align, but isn't currently
   implemented.
@TomAugspurger TomAugspurger added Numeric Operations Arithmetic, Comparison, and Logical operations Testing pandas testing functions or related to the test suite labels Jun 19, 2019
@TomAugspurger TomAugspurger added this to the 0.25.0 milestone Jun 19, 2019
@jbrockmendel
Copy link
Member

Should DTA[tz] be included along with Sparse?

@codecov
Copy link

codecov bot commented Jun 19, 2019

Codecov Report

Merging #26951 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #26951      +/-   ##
==========================================
+ Coverage   91.98%   91.98%   +<.01%     
==========================================
  Files         180      180              
  Lines       50772    50774       +2     
==========================================
+ Hits        46704    46706       +2     
  Misses       4068     4068
Flag Coverage Δ
#multiple 90.62% <ø> (+0.05%) ⬆️
#single 41.85% <ø> (-0.06%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 88.88% <0%> (-11.12%) ⬇️
pandas/core/frame.py 96.89% <0%> (-0.12%) ⬇️
pandas/io/pytables.py 90.3% <0%> (ø) ⬆️
pandas/core/arrays/sparse.py 94.19% <0%> (+0.46%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2243629...d1788b0. Read the comment docs.

@TomAugspurger
Copy link
Contributor Author

Should DTA[tz] be included along with Sparse?

IMO, the intention of these tests is to verify the correctness of the future Series.__array_ufunc__. I parametrize over Series[int] and Series[Sparse[int]] to verify the correctness of the dispatch to array logic. In theory the dispatch logic will handle all arrays the same. I'll add a small sanity test, separate from the rest, since the set of ufuncs valid for datetime64 is different.

@jbrockmendel
Copy link
Member

OK, many of the DTA[tz] ufuncs are currently broken, so its OK by me if you want to leave them out.

@TomAugspurger
Copy link
Contributor Author

TomAugspurger commented Jun 20, 2019

0b1e745 adds another xfail for ufunc(Index, Series), e.g. np.add(index, series). Right now that returns an Index

In [1]: import pandas as pd; import numpy as np

In [2]: a = pd.Series([1, 2])

In [3]: b = pd.Index([1, 2])

In [4]: np.add(b, a)
Out[4]: Int64Index([2, 4], dtype='int64')

I think that's a bug. It's inconsistent with the binop

In [5]: b + a
Out[5]:
0    2
1    4
dtype: int64

So the rule is that Index.__array_ufunc__ should check the types for Series. If any are found then it should return NotImplemented, so that Series.__array_ufunc__ can take over.

(cherry picked from commit ba5cb55)
@TomAugspurger
Copy link
Contributor Author

775c2ef fixes an issue with one of the tests. The test wasn't correct for ufunc(a, b) when we shuffle b (to force alignment). It was passing earlier, since we xfail shuffle=True for now.

I think this PR should be ready to go. I'd like to merge Joris' array_ufunc branch on top of it to actually start fixing the xfails.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seem reasonble, just some readability comments

def test_binary_ufunc(ufunc, sparse, shuffle, box_other,
flip,
arrays_for_binary_ufunc):
# Check the invariant that
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you give a little expl of what the parametizations do if not obvious, e.g. flip & shuffle actually are not immediately obvious

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may split those out to separate tests. It'll be a bit more code, but much clearer.

# Check the invariant that
# ufunc(Series(a), Series(b)) == Series(ufunc(a, b))
# with alignment.
a1, a2 = arrays_for_binary_ufunc
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you call these: left_array, right_array

I know its a bit longer, but more readable IMHO

a2 = pd.SparseArray(a2, dtype=pd.SparseDtype('int', 0))

name = "name"
s1 = pd.Series(a1, name=name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

left_series and right_series

a2 = a2.take(idx)

a, b = s1, s2
c, d = a1, a2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this very confusing, can you use the same terms and not use a, b, c, d?

series = pd.Series(array, name="name")

a, b = series, other
c, d = array, other
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above if you can make this more clear

array = pd.SparseArray(array)

series = pd.Series(array, name="name")
result = np.modf(series)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I c you are doing this here

@jreback
Copy link
Contributor

jreback commented Jun 21, 2019

do we have an issue for this? (or maybe just xref the actual ufunc issue at the top of the PR)

@TomAugspurger
Copy link
Contributor Author

Changes

  • split the binop ufunc test into three: with array, with index, with Series. Simplified things quite a bit I think
  • Cleaned up the names. Hopefully array_args and series_args make things much clearer that a, b, c, d.

@jreback
Copy link
Contributor

jreback commented Jun 21, 2019

lgtm.

@jreback jreback merged commit ba69f95 into pandas-dev:master Jun 21, 2019
@jreback
Copy link
Contributor

jreback commented Jun 21, 2019

thanks @TomAugspurger

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants