Skip to content

String comparison in query() #6155

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelbilow opened this issue Jan 29, 2014 · 10 comments · Fixed by #6158
Closed

String comparison in query() #6155

michaelbilow opened this issue Jan 29, 2014 · 10 comments · Fixed by #6158
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Strings String extension data type and string data
Milestone

Comments

@michaelbilow
Copy link

Hi, it seems that string comparisons aren't supported in query() yet, so maybe this isn't a bug yet. Anyway, hopefully this behavior will be fixed for future editions of pandas.

import pandas as pd
import numexpr as ne

a = list('abcdef')
b = range(6)
df = pd.DataFrame({'X':pd.Series(a),'Y': pd.Series(b)})

df_Y = df.query('Y < 3')             ## Works fine.
ne_works = ne.evaluate('"a" < "d"')  ## ne_works == np.array([True])
df_X = df.query('X < "d"')           ## RuntimeError: max recursion depth exceeded
@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

cc @cpcloud

IIRC this is not implemented ATM

@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2014

Yep only == and != are tested for strings.

@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2014

Let me take a look at why this happens and I'll clean up the error message or I'll implement it.

@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2014

Interesting ... it only fails on the numexpr engine ... yay for writing tests first!

@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2014

Ok this was simply a case of making those evaluate in Python space when object dtypes are being compared....

I think maybe a doc note about the fact that expressions that are evaluated on a case by case basis in terms of the types of their result is needed ... I could've sworn I wrote something about this, but maybe just in our conversations ....

@jreback
Copy link
Contributor

jreback commented Jan 29, 2014

hahah...

@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2014

@chuyelchulo Couple of details that might interest you:

  • Expressions that would result in an object dtype (including simple variable evaluation) have to be evaluated in Python space. The main reason is for back compat with older versions of numpy that will truncate a string if you call astype(str) on them if they have elements with more than 60 chars. We can't pass object arrays to numexpr thus string comparisons are evaluated in Python space.
  • The upshot is that that only applies to strings. So, if you have an expression for example, that's a string comparison and-ed together with another boolean expression that's from a numeric comparison, the numeric comparison will be evaluated by numpexpr. In fact, in general, query/eval will "pick out" the subexpressions that are eval-able by numexpr and those that must be evaluated in Python space transparently to the user.

@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2014

I'm adapting that for the docs, PR coming shortly

@cpcloud
Copy link
Member

cpcloud commented Jan 29, 2014

@chuyelchulo Thanks for reporting this.

@michaelbilow
Copy link
Author

Thanks, that was really interesting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Dtype Conversions Unexpected or buggy dtype conversions Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants