-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
WIP: add top-level eval function #4037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return isinstance(obj, pd.Panel) | ||
|
||
|
||
def is_pd_obj(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note to self these should most likely be removed or should replace scattered functions that are similar to these
quick pass I didn't see a doc-string on eval (its prob there...but I didn't delve too deep) also a quick writeup for enhancedperf section? just a couple of examples |
question:
result being that only strings are allowed (which will be converted to going to take much longer to finish this up to support all of those expression forms, but if it needs to be done that's ok too |
for a string expression, that is exactly right (its conceptually simpler, because only scalars can be on the rhs in any expression), however also need to allow something like this:
where unless you think that you can evaluate:
I think that the current |
ok i think i'm starting to see the bigger picture here. in this code in return self.table.table.readWhere(self.condition, start=self.start, stop=self.stop)
is that right? |
and |
that's about right. in
and no alignment necessary, just some conversions on and
|
cool maybe this will be easier than i thought...to the pytables-mobile! |
advertising! might be a good benchmark.....and maybe add this 'example' to the |
nice! thanks. i gave u +1 |
just an update on the essentially: what you want to write if u like python pd.eval('not x and y and z or w') if you want to be terse pd.eval('~(x & y & z | w)') what the engine actually sees (you could write this too, if you're practicing typing parens fast w/o looking!) pd.eval('~((((x) & (y)) & (z)) | (w))') and of course u can mix and match if you like partial clarity + terseness what's nice is that the python precedence is preserved, e.g., for reference: operator precedence if u want i can give more details, but essentially the the ast manip is to "rewrite" the here was the trickiest part def visit_BoolOp(self, node):
op = self.visit(node.op)
def visitor(x, y):
try:
lhs = self.visit(x)
except TypeError:
lhs = x
try:
rhs = self.visit(y)
except TypeError:
rhs = y
return op(lhs, rhs)
operands = node.values
return reduce(visitor, operands) key part is the last two lines which because the python parser treats boolean expressions similar to how lisp does you have to reconstruct the |
I would have an engine option, maybe parsing=strict as the default |
ok. there's not really any automatic way to test that though since |
I would put it on back burner anyhow (i know u did it) |
sounds good, not worth too much time. the pytables changes are quite daunting. i know those are most important for this to be merged, still sifting through how it works...reading tests and trying to understand what the classes do...which ones are important etc... plus the issues with dt and period indexes i think after those two then this might warrant some serious tire kicking |
I think i'm going to allow callable parsing in the |
i think in 0.14 (not a typo) that |
@jreback should arbitrary expressions be allowed with using |
nvm. should allow them. |
whatever is parsable by ne should be allowed |
@jreback i feel a bit silly asking this but how do i pull down ur changes? |
@cpcloud i think you have to add me as a remote, then cherry-pick i pseudo have it working....but maybe you can help with this..... say I haev a call like this:
I can't seem to get the variable scope for any idea? |
can u point me to a commit? |
oh same one |
somewhat old .....will push again in a bit |
new one....still very WIP though! |
figured out the scoping issue.....I AM capturing scope, but then I was throwing it away because I allow a 'loose' sytax.. e.g.
is translated to:
|
wait where does that commit live? it's on master but not a branch but a tree? how are you parsing the slice? |
@jreback i've added u as a remote but how can i cherry-pick that commit? it looks like its on master, but where is being pushed? is it detached or something? |
its on a separate branch of my master try this: https://github.com/jreback/pandas/tree/eval its build on top of your branch (which at some point i will rebase; i am a few behind you) eventually you can just grab some of my commits and add to your tree as you have the PR not 100% sure how to do that, but since we can't push to each other I think only way though once you have a PR to pandas master I think I can push to that (I think this is a psudoe private branch.... |
on #4151 now |
of course going to fail all tests...but stil a WIP |
i like how github pulse double counts my commits here....not :| |
https://github.com/newville/asteval/blob/master/lib/asteval.py nice ref for implementing the ast handlers |
nice. i think we can use slice, subscript, attribute access, would be nice to handle certain pandas functions like |
all pretty straightforward I have a new branch that does most of this could move some into base class |
great! can u post a link to a branch? |
already #4155 up, extension of your branch....all tests pass! take a look. I think some of what I did should actually be in your branch (e.g. see my code clean up point). This can also consolidate the initialization sutff of the binary/math ops as well other thing to note, I added a I didn't really check whether your tests break! more likely is that some 'invalid' ops would slip by I also eliminated the call to |
hmm..this not showing my commits? |
this won't show your commits because i've changed the remote of my local branch here... the remote for this pr is my AFAIK i don't think you can track one way might be: you could push to your |
hmm...I thought that you were creating a master branch that is push/pullable from either of us and sort of independent? maybe close that PR and just open another one from that branch (e.g. merge from eval-3393 to master) rather than from your branch to master? |
so i think we should close our respective PRs, i'll open a new one with |
yes.....had same thoughts (and I would basically copy your top-levevel lists and then I will do the same), for ToDo's and such |
right. a single PR to rule them all. i'll merge our TODO lists and other stuff in the initial text of the PR. |
great.... |
hopefully, this will scale down the bloated pulse report of my commits |
closes #3393.
This is a ways from being mergeable but I thought I would put up the PR for people to play with and to get some other eyes on it. Tire-kick at will. A consolidated list of things to be completed to come.
eval
RoadmapDocumentation
eval
docstringx = 3; y = 2; pd.eval('x // y', engine='python')
is 1000 times slower than the same operation in actual Python.Engines
eval('os.system("sudo rm -rf /")')
) is possible hereShould python be the default engine for small frames?
Functions
isexpr
eval
to top-level pandasError Handling
Alignment
'python'
engine when aSeries
andDataFrame
both haveDatetimeIndexes
(I would rather not do this and just enforce the to-be-deprecated index-aligning-if-both-indices-are-datetimeindex behavior)PeriodIndex
es don't work (well, they won't pass my tests) because of a stack overflow bug when joining aDatetimeIndex
and aPeriodIndex
(that bug is slated for 0.13, but a design decision needs to be made)NDFrame
(is this really necessary?)Scope
Expr
objects to be passed toeval
(Expr
will pull in the local and global variables from the calling stack frame)Miscellaneous
evaluate
functionExpr
inNDFrame.__getitem__
cc @jrebackdf + df2
should return aDataFrame
ifdf
anddf2
are bothDataFrame
objects.Expression
toExpr
truediv
keyword toExpr
, default toTrue
Operators
%
**
Tests
DatetimeIndex
andPeriodIndex
objects are untested because of a few join and alignment issues.//
operator works as it should (it works, but there may be edge cases that I haven't though of yet)//
numexpr
only supports scalar evaluation here: variables are not allowed so this will always eval in python space (half-assed it a bit in the tests here need to fix that)**
associativity issue in testingNear Future
and
,or
andnot
aliases for the operators&
,|
and~
, respectivelydf.A
syntaxsum
andprod
)Far Future