Skip to content

ENH: fix eval scoping issues #6366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 20, 2014
Merged

ENH: fix eval scoping issues #6366

merged 6 commits into from
Feb 20, 2014

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Feb 16, 2014

closes #5987
closes #5087

Relevant user-facing changes:

In [9]: df = DataFrame(randn(5, 2), columns=list('ab'))

In [10]: a, b = 1, 2

In [11]: # column names take precedence

In [12]: df.query('a < b')
Out[12]:
          a         b
0 -0.805549 -0.090572
1 -1.782325 -1.594079
2 -0.984364  0.934457
3 -1.963798  1.122112

[4 rows x 2 columns]

In [13]: # we must use @ whenever we want a local variable

In [14]: df.query('@a < b')
Out[14]:
          a         b
3 -1.963798  1.122112

[1 rows x 2 columns]

In [15]: # we cannot use @ in eval calls

In [16]: pd.eval('@a + b')
  File "<string>", line unknown
SyntaxError: The '@' prefix is not allowed in top-level eval calls, please refer to your variables by name without the '@' prefix

In [17]: pd.eval('@a + b', parser='python')
  File "<string>", line unknown
SyntaxError: The '@' prefix is only supported by the pandas parser
  • update query/eval docstrings/indexing/perfenhancing
  • make sure docs build
  • release notes
  • pytables
  • make the repr of Scope objects work or revert to previous version
  • more tests for new local variable scoping API
  • disallow (and provide a useful error message for) locals in expressions like pd.eval('@a + b')
  • Raise when your variables have the same name as the builtin math functions that numexpr supports, since you cannot override them in numexpr.evaluate, even when explicitly passing them. For example
import numexpr as ne
sin = randn(10)
d = {'sin': sin}
result = ne.evaluate('sin > 1', local_dict=d, global_dict=d)
result == array(True)

For reference, after this PR local variables are given lower precedence than column names. For example

a, b = 1, 2
df = DataFrame(randn(10, 2), columns=list('ab'))
res = df.query('a > b')

will no longer raise an exception about overlapping variable names. If you want the local a (as opposed to the column a) you must do

df.query('@a > b')

@cpcloud cpcloud self-assigned this Feb 16, 2014
@jreback
Copy link
Contributor

jreback commented Feb 16, 2014

oh boy! looks like the right way to do this..... 👍

@cpcloud
Copy link
Member Author

cpcloud commented Feb 18, 2014

Last issue is a scoping issue with py3 ... somehow not capturing scope from sys._getlevel.

@jreback
Copy link
Contributor

jreback commented Feb 18, 2014

guess this was not a small change 🔢

@cpcloud
Copy link
Member Author

cpcloud commented Feb 18, 2014

Nope. But scope is less of a hack and I think overall implementation is easier to understand. Actually got up early to try and fix this but all the back and forth of comparing py2 and 3 is huge time sink

@cpcloud
Copy link
Member Author

cpcloud commented Feb 19, 2014

turns out that list comprehensions work differently re the stack in python 3....each iteration generates a new stack frame. std for loops don't work that way so that should do it

@cpcloud
Copy link
Member Author

cpcloud commented Feb 19, 2014

Ok this is passing locally on py27 and py33

@cpcloud
Copy link
Member Author

cpcloud commented Feb 19, 2014

@jreback Whenever you get a chance .... this is passing.

@jreback
Copy link
Contributor

jreback commented Feb 19, 2014

won't pretend I understand! hahah

can you build the docs and just check that everything looks ok their?

I find sometimes they are good tests in and of themselves (aside which they need slight changing anyhow, right?)

@cpcloud
Copy link
Member Author

cpcloud commented Feb 19, 2014

Oh right! Those have to be changed anyway becuase of the new API ... let me add some new check boxes..

@cpcloud
Copy link
Member Author

cpcloud commented Feb 19, 2014

Biggest changes are

  • Use of DeepChainMap (a nice wrapper around dictionaries that lets you store/search/update multiple dictionaries in a particular order). This is the most major change and most important since it obviates a lot of my scope hacks that were there before.
  • local API is slightly different more explicit as we discussed
  • stronger syntax checks for locals and a helpful err msg

@jreback
Copy link
Contributor

jreback commented Feb 19, 2014

maybe post on the top the relevant changes (from an ipython session).....

@cpcloud
Copy link
Member Author

cpcloud commented Feb 19, 2014

sure

@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

@y-p @jreback @jorisvandenbossche Would be very grateful for any comments y'all might have.

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

your exceptions should be multi line (looking at the syntax error one for using a @ in eval)

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

u should put what u put in release notes in v0.14 API change section

@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

done

@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

adding a test for #5087. this closes that too!

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

u create temporaries for the locals to avoid name collisions?

@jreback jreback added this to the 0.14.0 milestone Feb 20, 2014
@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

looks fine to me
go ahead and squash

@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

Temporaries are created for strings and lists

@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

For locals I just rename them when the terms are created during parsing.

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

oh ok

looks fine 2 me otherwise

can merge when u r ready


df.eval('@a + b')
With :func:`~pandas.eval` you cannot use the ``@`` prefix *at all*, because it
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you can leave out the ~ here (so just :func:pandas.eval``, this will show it in the docs as pandas.eval() instead of `eval()`) so it is clear it is not about the same eval as the paragraph above

@@ -1738,6 +1738,8 @@ def _getitem_frame(self, key):
def query(self, expr, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe mention something of @ in the docstring?

@jorisvandenbossche
Copy link
Member

I am not really familiar with this code, but just added some minor doc comments.

@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

Thanks for the comments guys.

@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

@jreback Should I squash some more? This is ready for merging if there are no more comments.

@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

maybe a bit more?

@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

Okay squashed and cleaned up the query docstring. Would like to keep some history so that I remember why I made change X especially some of these python version differences

ChainMap implements a list of mappings that effectively functions as a single
dictionary. This class is very useful for implementing scope. This commit also
adds a DeepChainMap subclass of ChainMap for writing and deleting keys.
Also useful for replacing local variable names with their mangled versions.
@jreback
Copy link
Contributor

jreback commented Feb 20, 2014

cool....go for it

In Python 3.x list comprehension build up stack frames, thus providing the
stack level to search is not the same as it is in Python 2.x.
@cpcloud
Copy link
Member Author

cpcloud commented Feb 20, 2014

bombs away

cpcloud added a commit that referenced this pull request Feb 20, 2014
@cpcloud cpcloud merged commit 91b2b00 into pandas-dev:master Feb 20, 2014
@cpcloud cpcloud deleted the eval-fix-scope branch February 20, 2014 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Bug Internals Related to non-user accessible pandas implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH/API: Change query/eval local variable API query fails with local variables in expressions
3 participants