Skip to content

BUG/TST: Empty input arrays in cartesian_product and MultiIndex (#12258) #14151

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

pijucha
Copy link
Contributor

@pijucha pijucha commented Sep 5, 2016

This commit:

  1. fixes logic (and division by 0) in cartesian_product when some input arrays are empty
  2. adds tests for MultiIndex empty level construction with .from_arrays and .from_product

@@ -31,14 +31,20 @@ def cartesian_product(X):
array([1, 2, 1, 2, 1, 2])]

"""
if len(X) == 0:
return []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it as it is. Though, maybe this should raise ValueError.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

list like is fine
just should check and raise an error of not

@codecov-io
Copy link

codecov-io commented Sep 5, 2016

Current coverage is 85.26% (diff: 100%)

Merging #14151 into master will increase coverage by <.01%

@@             master     #14151   diff @@
==========================================
  Files           140        140          
  Lines         50579      50593    +14   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43121      43137    +16   
+ Misses         7458       7456     -2   
  Partials          0          0          

Powered by Codecov. Last update 7dedbed...b831516


lenX = np.fromiter((len(x) for x in X), dtype=int)
cumprodX = np.cumproduct(lenX)

a = np.roll(cumprodX, 1)
a[0] = 1

b = cumprodX[-1] / cumprodX
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try/except around this instead is more idomatic

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't raise:

0/np.array([0,1])
Out[83]: array([ nan,   0.])
/opt/eclipse/plugins/org.python.pydev_5.1.2.201606231256/pysrc/pydevconsole.py:1: RuntimeWarning: invalid value encountered in true_divide

Hence this strange behaviour in #12258.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@jreback jreback added MultiIndex Compat pandas objects compatability with Numpy or Python functions labels Sep 5, 2016
@@ -31,14 +31,20 @@ def cartesian_product(X):
array([1, 2, 1, 2, 1, 2])]

"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Parameters/Returns section

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a See Also

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

further I think that we require a list-like here (maybe its more strict, in that we require an actual array). can you add some tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code indicates it should be at least a list-like (or rather a list-like of list-likes). And pandas.compat.product (or itertools.product) is somewhat similar.
I'll fix the docstring and add tests for it. Unless you think we need something stricter.

@jreback
Copy link
Contributor

jreback commented Sep 8, 2016

ping when updated / green.

@pijucha
Copy link
Contributor Author

pijucha commented Sep 13, 2016

@jreback Sorry for the delay. I'll try to get to it before the weekend.

I think I'll have to touch also _factorize_from_iterable(s) - they have small issues with empty or incorrect inputs.

invalid_inputs = [1, [1], [1, 2], [[1], 2],
'a', ['a'], ['a', 'b'], [['a'], 'b']]
for i in invalid_inputs:
tm.assertRaises(TypeError, MultiIndex.from_arrays, arrays=i)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not the most descriptive test...
TypeError is raised at different places for different inputs. I couldn't quickly think of anything better and clean enough - but can try to improve it if needed.

@pijucha
Copy link
Contributor Author

pijucha commented Sep 27, 2016

@jreback It took me longer than I expected.
Added some checks and tests for incorrect input.

@jreback jreback added this to the 0.19.0 milestone Sep 27, 2016
@jreback jreback closed this in b81d444 Sep 27, 2016
@pijucha pijucha deleted the cartesian branch October 8, 2016 02:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions MultiIndex
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"ValueError" on specific MultiIndex.from_product initialization(s?)
3 participants