String dtype: allow string dtype in query/eval with default numexpr engine #59810

jorisvandenbossche · 2024-09-15T18:08:14Z

Currently (on main, with no future options enabled), you can use query on an object-dtype column of strings. For exeample:

>>> df = pd.DataFrame({"col" : ['a', 'b', 'c']})
>>> df.query("col < 'b'") 
  col
0   a

This works despite that the default engine is "numexpr" (when installed) and that this engine does not actually evaluate the above expression. We do have checks to essentially fall back to evaluate the expression in python if one of the operands is object dtype (and the operator is one of the allowed operators for that).

But with the future default string dtype, we trigger the warning about explicitly falling back to the python engine:

>>> pd.options.future.infer_string = True
>>> df = pd.DataFrame({"col" : ['a', 'b', 'c']})
>>> df.query("col < 'b'")
RuntimeWarning: Engine has switched to 'python' because numexpr does not support extension array dtypes.
Please set your engine to python manually.
  col
0   a

Given that we were already automatically falling back for object dtype strings, I would propose to keep doing that for the future default string dtype as well.
It is a bit misleading (you might think to get a speedup when using this, which won't be the case), but that is already the case right now anyway, and otherwise a lot of users will start to see that warning (and a lot of people also use query for the convenience of writing the expression, I think, and not only for the speed).

xref #54792

…ngine

mroeschke · 2024-09-16T17:26:09Z

Thanks @jorisvandenbossche

…ngine (pandas-dev#59810) String dtype: allow string dtype in query/eval with default mumexpr engine

…ngine (#59810) String dtype: allow string dtype in query/eval with default mumexpr engine

String dtype: allow string dtype in query/eval with default mumexpr e…

6b4d89b

…ngine

jorisvandenbossche added Strings String extension data type and string data expressions pd.eval, query labels Sep 15, 2024

mroeschke approved these changes Sep 16, 2024

View reviewed changes

mroeschke added this to the 2.3 milestone Sep 16, 2024

mroeschke merged commit 013ac67 into pandas-dev:main Sep 16, 2024
45 of 53 checks passed

jorisvandenbossche deleted the string-dtype-query-eval branch September 16, 2024 17:42

jorisvandenbossche added the backported label Oct 10, 2024

jorisvandenbossche added a commit to jorisvandenbossche/pandas that referenced this pull request Oct 10, 2024

String dtype: allow string dtype in query/eval with default numexpr e…

4ff2c68

…ngine (pandas-dev#59810) String dtype: allow string dtype in query/eval with default mumexpr engine

jorisvandenbossche added a commit that referenced this pull request Oct 10, 2024

String dtype: allow string dtype in query/eval with default numexpr e…

a790592

…ngine (#59810) String dtype: allow string dtype in query/eval with default mumexpr engine

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

String dtype: allow string dtype in query/eval with default numexpr engine #59810

String dtype: allow string dtype in query/eval with default numexpr engine #59810

jorisvandenbossche commented Sep 15, 2024

mroeschke commented Sep 16, 2024

String dtype: allow string dtype in query/eval with default numexpr engine #59810

String dtype: allow string dtype in query/eval with default numexpr engine #59810

Conversation

jorisvandenbossche commented Sep 15, 2024

mroeschke commented Sep 16, 2024