Skip to content

ENH: evaluate datetime ops in python with eval #4924

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Sep 27, 2013
Merged

ENH: evaluate datetime ops in python with eval #4924

merged 6 commits into from
Sep 27, 2013

Conversation

cpcloud
Copy link
Member

@cpcloud cpcloud commented Sep 21, 2013

closes #4897

Also adds DatetimeIndex comparisons in query expressions and a corresponding vbench for a simple Timestamp vs either Series or DatetimeIndex.

@jreback
Copy link
Contributor

jreback commented Sep 21, 2013

maybe add some tests where use is comparing a date column with a non-date
? so should raise

as an aside in future min/max/diff/shift on dates have to be done in python land

@ghost ghost assigned cpcloud Sep 21, 2013
@@ -493,8 +495,14 @@ def _possibly_evaluate_binop(self, op, op_class, lhs, rhs,
maybe_eval_in_python=('==', '!=')):
res = op(lhs, rhs)

# "in"/"not in" ops are always evaluated in python
if (res.op in _cmp_ops_syms and
lhs.kind in _date_kinds or lhs.kind in _date_kinds):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that should be rhs

@cpcloud
Copy link
Member Author

cpcloud commented Sep 24, 2013

@jreback @jtratner comments?

@cpcloud
Copy link
Member Author

cpcloud commented Sep 25, 2013

this should eventually handle timedelta as well

@jreback
Copy link
Contributor

jreback commented Sep 25, 2013

under 1.7! they r like ints so I bet ne can handle them but prob better to handle in python because of NaT

@cpcloud
Copy link
Member Author

cpcloud commented Sep 25, 2013

i ithink i'll do a new pr for that

@cpcloud
Copy link
Member Author

cpcloud commented Sep 25, 2013

💥

In [5]: td = timedelta64(1, 'D')

In [6]: ne.evaluate('td + 1')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-6-f2c950d373ae> in <module>()
----> 1 ne.evaluate('td + 1')

/home/phillip/.virtualenvs/pandas/lib/python2.7/site-packages/numexpr/necompiler.pyc in evaluate(ex, local_dict, global_dict, out, order, casting, **kwargs)
    728
    729     # Create a signature
--> 730     signature = [(name, getType(arg)) for (name, arg) in zip(names, arguments)]
    731
    732     # Look up numexpr if possible.

/home/phillip/.virtualenvs/pandas/lib/python2.7/site-packages/numexpr/necompiler.pyc in getType(a)
    627     if kind == 'S':
    628         return bytes
--> 629     raise ValueError("unkown type %s" % a.dtype.name)
    630
    631

ValueError: unkown type timedelta64[D]

return pd.Timestamp(self._value)
elif kind == 'timestamp':
return self._value.asm8.view('i8')
return np.datetime64(self._value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you setting the kind somewhere? I woldn't do this, instead Timestamp(self._value).value if you really need the i8 value. why would these need to be i8 anyhow? are they evaluated in python space? (for which you are calling functions)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not actually getting the i8 value. That's what I was doing previously, to evaluate datetime ops using numexpr. kind is just a string repr of either the class name or the dtype, it's a computed property

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok....I just wouldn't allow a datetime time at all (instead just wrap Timestamp around it)....makes consistent

@cpcloud
Copy link
Member Author

cpcloud commented Sep 27, 2013

@jreback comments?

@@ -1898,6 +1898,7 @@ def _get_index_resolvers(self, axis):
# index or columns
axis_index = getattr(self, axis)
d = dict()
prefix = axis[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future maybe this function should be in core/generic?

@jreback
Copy link
Contributor

jreback commented Sep 27, 2013

looks fine otherwise

@cpcloud
Copy link
Member Author

cpcloud commented Sep 27, 2013

moved to generic and added a smoke test for failing panel/panel4d with multiiindex

@cpcloud
Copy link
Member Author

cpcloud commented Sep 27, 2013

slight hit when using DatetimeIndex vs a column of Series, need to clear that up b4 merging

@cpcloud
Copy link
Member Author

cpcloud commented Sep 27, 2013

turns out it's because a datetime64 Series with an Int64Index compares faster than a datetime64 Series with a DatetimeIndex. This is the case before 8a9a4f2 (the just merged timestamp compare fix) so that didn't introduce any performance regressions (I also ran the full vbench suite on it and no changes). I ran a vbench on this as well and nothing changed.

@cpcloud
Copy link
Member Author

cpcloud commented Sep 27, 2013

bombs away

cpcloud added a commit that referenced this pull request Sep 27, 2013
ENH: evaluate datetime ops in python with eval
@cpcloud cpcloud merged commit 855d3d7 into pandas-dev:master Sep 27, 2013
@cpcloud cpcloud deleted the eval-datetime-in-python branch September 27, 2013 19:34
@jreback
Copy link
Contributor

jreback commented Mar 27, 2014

@cpcloud do you recall if we fixed this for timedelta ops as well (which should be evaluated in python space and not by numexpr)

@cpcloud
Copy link
Member Author

cpcloud commented Mar 27, 2014

Don't think there's a test for them in top level query/eval, but there are some tests in test_pytables.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove datetime comparisons from numexpr in eval (evaluate in Python)
2 participants