BUG: Add type promotion support for eval() expressions with many properties #6205

xenoscopic · 2014-01-31T14:54:46Z

This commit modifies the call to numpy.result_type to get around the
NPY_MAXARGS limit, which at the moment is 32. Instead of passing a
generator of all types involved in an expression, the type promotion
is done on a pair-wise basis with a call to reduce.

This fixes bugs for code such as the following:

from numpy.random import randn
from pandas import DataFrame

d = DataFrame(randn(10, 2), columns=list('ab'))

# Evaluates fine
print(d.eval('*'.join(['a'] * 32)))

# Fails to evaluate due to NumPy argument limits
print(d.eval('*'.join(['a'] * 33)))

xenoscopic · 2014-01-31T14:58:26Z

If someone could chime in to just double check my understanding of the behavior of numpy.result_type, that would be nice.

I think this is an important fix, otherwise it is essentially impossible to evaluate expressions whose tree representation have more than 32 leaves. Given that this limit seems imposed by NumPy, I assume it was not a conscious decision in the design of Pandas.

cpcloud · 2014-01-31T15:15:48Z

@havoc-io Gimme some time to review

cpcloud · 2014-01-31T15:17:09Z

I wasn't aware of this limit. Thanks for catching it.

cpcloud · 2014-01-31T15:19:06Z

pandas/computation/ops.py

+        # use reduce instead of passing the full generator as *args to
+        # result_type, because for large expressions there are often more than
+        # NPY_MAXARGS (32) types
+        return reduce(np.result_type,


You should move reduce to pandas.compat and import from there since there's no builtin reduce in Python 3

It looks like reduce is actually already in pandas.compat, I can go ahead and change to that. Do you have any qualms about just doing:

from pandas.compat import reduce

and overriding the built-in reduce for Python 2.x? It's actually the same as the built-in for 2.x, but I can rename it something like compat_reduce depending on your coding preferences.

xenoscopic · 2014-01-31T15:19:13Z

No rush or anything on the review. I want to make sure that my understanding of the result_type function (particularly in the context of the eval() infrastructure) is correct so this doesn't start returning some weird types from eval.

xenoscopic · 2014-01-31T15:33:16Z

I've gone ahead and added a reduce import from pandas.compat as well and squashed it into this commit.

jreback · 2014-01-31T15:39:55Z

@havoc-io pls add the above test (which should fail without this change, and pass with it)

cpcloud · 2014-01-31T15:51:04Z

Importing from compat is fine

xenoscopic · 2014-01-31T16:15:54Z

Added a test which fails without the change and succeeds with it. The test will raise an exception under the old code. Do you prefer to catch and then manually trigger failure - or just let the test framework catch the exceptions and mark them as E?

jreback · 2014-01-31T16:27:40Z

pandas/computation/tests/test_eval.py

+        # there is no way to get the value programmatically inside Python, so
+        # we just hardcode a value > 32 here
+        df.eval('*'.join(('a') * 33))
+


no, you need to assert that this works, e.g. compare with a known value and use assert_frame_equal
by definition w/o the change this will fail but with the change will return the correct value (and thus the test will pass)

so contruct the result in way and compare

Also, just use pd.eval here. There's no need to construct a DataFrame. Just something like

def check_many_exprs(parser, engine): a = 1 expr = ' * '.join('a' * 33) expected = 1 res = pd.eval(expr, parser=parser, engine=engine) tm.assert_equal(res, expected) def test_many_exprs(): for parser, engine in PARSERS_ENGINES: yield check_many_exprs, parser, engine

We generally frown upon yield style nose tests but they are useful to avoid repetition of test functions only differing in their arguments.

Does implementing the test in TestEvalNumexprPandas not implemented it for all parser engines? If not, where are you suggesting this test code be put?

It doesn't automatically give pd.eval the parser and engine, it's a class attribute (as opposed to an instance attribute). You have to pass self.parser and self.engine when you call pd.eval. You can probably just copy and paste what I wrote above and put that into test_eval.py. I'm not sure if PARSERS_ENGINES is called that, it might be something else, but it's just itertools.product(PARSERS, ENGINES). PARSERS and ENGINES are tuples or lists (can't remember) of the supported parsers/engines.

xenoscopic · 2014-02-03T13:06:30Z

I've updated the commit. The issue was marginally more extensive that I originally anticipated, so rather than scatter calls to reduce everywhere, I've added a pandas.computation.common._result_type_many utility function as a wrapper for calling numpy.result_type when passing an indefinite and potentially large number of arguments. It also slightly changes the behavior of my previous code to ensure that the call correctly succeeds for argument lists of length 1.

I've left in-place calls to numpy.result_type which have a small, definite number of arguments, since the wrapper function is not necessary.

In addition, I've added the test recommended by @cpcloud (thanks!).

cpcloud · 2014-02-03T13:49:13Z

Another thing to check is the associativity and commutativity of result_type. If it's neither then using reduce isn't going to give the same result as result_type. Intuitively I would expect it to be both, akin to a sort of max operation on dtypes, but maybe there's some edge case that I'm not thinking of.

xenoscopic · 2014-02-03T15:01:55Z

That was my intuition as well, and does appear to be the case. The NumPy result_type call stack (since NumPy 1.6) effectively looks like:

return_type (an alias for the C function array_result_type which effectively just repackages the input array to be C-friendly), which calls
PyArray_ResultType which calls
PyArray_PromoteTypes

The C code is a bit difficult to parse from a conceptional standpoint, but the documentation for the latter two functions gives a more concise explanation - which does indeed portray the functions as an effective "max" operation on dtypes. PyArray_PromoteTypes even explicitly states that it is symmetric and associative.

I don't think it would be possible to write an exhaustive test to confirm this, but I think the algorithm described in the documentation is quite clearly symmetric and associative. Aside from pinging the numpy developers, I'm not really sure how else to verify.

cpcloud · 2014-02-03T19:17:41Z

pandas/computation/common.py

+    try:
+        first = next(arg_iter)
+    except StopIteration:
+        raise ValueError('at least one array or dtype is required')


Just do

if not arrays_and_dtypes: raise ValueError("your message")

Creating an iterator and catching StopIteration is a bit overkill.

I understand why you originally did that, but you don't need to pass the initial argument to reduce, it'll use the first element of the sequence if you don't pass anything.

The problem is that if the argument sequence has a length of only 1, then reduce will never pass it to numpy.result_type, and consequently it won't come back as a dtype, it'll just be whatever the original argument is.

xenoscopic · 2014-02-03T19:37:22Z

Okay, done, have a peek at the revised version.

cpcloud · 2014-02-03T19:42:31Z

looks ok 2 me ... i 'll merge when it passes

jreback · 2014-02-03T20:49:04Z

@havoc-io pls add a release notes (you will have to rebase as I just added the 0.14.0 section); add in bug fixes pls (you can ref this PR)

xenoscopic · 2014-02-03T21:57:12Z

I've added an entry to the 0.14.0 bug fix log.

xenoscopic · 2014-02-06T14:52:49Z

Looks like CI build passed - good to merge?

jreback · 2014-02-06T14:54:51Z

pls rebase on master; the release notes are in conflict

jreback · 2014-02-06T14:55:01Z

as soon as you do I can merge

…erties This commit modifies the call to numpy.result_type to get around the NPY_MAXARGS limit, which at the moment is 32. Instead of passing a generator of all types involved in an expression, the type promotion is done on a pair-wise basis with a call to reduce. This fixes bugs for code such as the following: from numpy.random import randn from pandas import DataFrame d = DataFrame(randn(10, 2), columns=list('ab')) # Evaluates fine print(d.eval('*'.join(['a'] * 32))) # Fails to evaluate due to NumPy argument limits print(d.eval('*'.join(['a'] * 33)))

xenoscopic · 2014-02-06T14:58:14Z

Should be sorted now. Thanks.

BUG: Add type promotion support for eval() expressions with many properties

jreback · 2014-02-06T14:59:29Z

perfect thank you!

…kipping if not installed

jreback · 2014-02-06T15:32:04Z

@havoc-io FYI, I had to move this to a class (rather than the global checker), as it didn't properly disable numexpr when necessary (e.g. when it was not available

b0ba12a

it works, but pls confirm that it is still correctly checking

xenoscopic · 2014-02-06T15:34:42Z

Looks good to me.

jreback · 2014-02-06T15:36:45Z

gr8! this didn't show up till the windows builds started failing...they don't have numexpr installed at all

cpcloud reviewed Jan 31, 2014
View reviewed changes

jreback reviewed Jan 31, 2014
View reviewed changes

cpcloud reviewed Feb 3, 2014
View reviewed changes

jreback added a commit that referenced this pull request Feb 6, 2014

Merge pull request #6205 from havoc-io/master

352db84

BUG: Add type promotion support for eval() expressions with many properties

jreback merged commit 352db84 into pandas-dev:master Feb 6, 2014

jreback added a commit that referenced this pull request Feb 6, 2014

TST: move tests for #6205 to class infrastructure to enable numexpr s…

b0ba12a

…kipping if not installed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Add type promotion support for eval() expressions with many properties #6205

BUG: Add type promotion support for eval() expressions with many properties #6205

xenoscopic commented Jan 31, 2014

xenoscopic commented Jan 31, 2014

cpcloud commented Jan 31, 2014

cpcloud commented Jan 31, 2014

cpcloud Jan 31, 2014

xenoscopic Jan 31, 2014

xenoscopic commented Jan 31, 2014

xenoscopic commented Jan 31, 2014

jreback commented Jan 31, 2014

cpcloud commented Jan 31, 2014

xenoscopic commented Jan 31, 2014

jreback Jan 31, 2014

cpcloud Jan 31, 2014

xenoscopic Jan 31, 2014

cpcloud Jan 31, 2014

xenoscopic commented Feb 3, 2014

cpcloud commented Feb 3, 2014

xenoscopic commented Feb 3, 2014

cpcloud Feb 3, 2014

cpcloud Feb 3, 2014

xenoscopic Feb 3, 2014

xenoscopic commented Feb 3, 2014

cpcloud commented Feb 3, 2014

jreback commented Feb 3, 2014

xenoscopic commented Feb 3, 2014

xenoscopic commented Feb 6, 2014

jreback commented Feb 6, 2014

jreback commented Feb 6, 2014

xenoscopic commented Feb 6, 2014

jreback commented Feb 6, 2014

jreback commented Feb 6, 2014

xenoscopic commented Feb 6, 2014

jreback commented Feb 6, 2014

BUG: Add type promotion support for eval() expressions with many properties #6205

BUG: Add type promotion support for eval() expressions with many properties #6205

Conversation

xenoscopic commented Jan 31, 2014

xenoscopic commented Jan 31, 2014

cpcloud commented Jan 31, 2014

cpcloud commented Jan 31, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xenoscopic commented Jan 31, 2014

xenoscopic commented Jan 31, 2014

jreback commented Jan 31, 2014

cpcloud commented Jan 31, 2014

xenoscopic commented Jan 31, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xenoscopic commented Feb 3, 2014

cpcloud commented Feb 3, 2014

xenoscopic commented Feb 3, 2014

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xenoscopic commented Feb 3, 2014

cpcloud commented Feb 3, 2014

jreback commented Feb 3, 2014

xenoscopic commented Feb 3, 2014

xenoscopic commented Feb 6, 2014

jreback commented Feb 6, 2014

jreback commented Feb 6, 2014

xenoscopic commented Feb 6, 2014

jreback commented Feb 6, 2014

jreback commented Feb 6, 2014

xenoscopic commented Feb 6, 2014

jreback commented Feb 6, 2014