Skip to content

BUG: Expression xxxx has forbidden control characters - caused by new release of numexpr #54542

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
alexmloveless opened this issue Aug 14, 2023 · 4 comments
Closed
3 tasks done
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@alexmloveless
Copy link

alexmloveless commented Aug 14, 2023

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

$ pip install numexpr=2.8.5
import pandas as pd
df = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
a = 8
df.query("A == a@", engine="numexpr")

Issue Description

Traceback (most recent call last):
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\IPython\core\interactiveshell.py", line 3433, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-34-7f773e498449>", line 1, in <module>
    df.query("A == @a", engine="numexpr")
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\frame.py", line 4060, in query
    res = self.eval(expr, **kwargs)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\frame.py", line 4191, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\computation\eval.py", line 353, in eval
    ret = eng_inst.evaluate()
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\computation\engines.py", line 80, in evaluate
    res = self._evaluate()
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\pandas\core\computation\engines.py", line 121, in _evaluate
    return ne.evaluate(s, local_dict=scope)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\numexpr\necompiler.py", line 943, in evaluate
    raise e
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\numexpr\necompiler.py", line 851, in validate
    _names_cache[expr_key] = getExprNames(ex, context)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\numexpr\necompiler.py", line 714, in getExprNames
    ex = stringToExpression(text, {}, context)
  File "C:\Users\QW664QA\Miniconda3\lib\site-packages\numexpr\necompiler.py", line 274, in stringToExpression
    raise ValueError(f'Expression {s} has forbidden control characters.')
ValueError: Expression (A) == (__pd_eval_local_a) has forbidden control characters.

So this is actually an issue with numexpr release 2.8.5 which went live on Sunday 6th August 2023:

  • As an addendum to the use of NumExpr for parsing user inputs, is that NumExpr
    calls eval on the inputs. A regular expression is now applied to help sanitize
    the input expression string, forbidding '__', ':', and ';'. Attribute access

Not sure if this qualifies as a bug over there, but it breaks pandas if you have numexpr==2.8.5 installed

Expected Behavior

df.query("A == 8", engine="numexpr")

correctly queries the df and produces a valid response. So this is an issue with using @ variables in the query which produces those dunder variables, although I guess it may manifest elsewhere.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 66e3805 python : 3.9.12.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19044 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United Kingdom.1252 pandas : 1.3.5 numpy : 1.24.3 pytz : 2021.3 dateutil : 2.8.2 pip : 21.2.4 setuptools : 61.2.0 Cython : 3.0.0 pytest : None hypothesis : None sphinx : 6.1.3 blosc : None feather : None xlsxwriter : 3.0.3 lxml.etree : 4.9.1 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.7.0 pandas_datareader: 0.10.0 bs4 : 4.11.1 bottleneck : None fsspec : None fastparquet : None gcsfs : None matplotlib : 3.6.2 numexpr : 2.8.5 odfpy : None openpyxl : 3.0.10 pandas_gbq : None pyarrow : 11.0.0 pyxlsb : None s3fs : None scipy : 1.9.3 sqlalchemy : None tables : None tabulate : 0.9.0 xarray : None xlrd : 2.0.1 xlwt : None numba : None
@alexmloveless alexmloveless added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Aug 14, 2023
@phofl
Copy link
Member

phofl commented Aug 14, 2023

Hi, thanks for your report. Did you report this in numexpr as well?

@alexmloveless
Copy link
Author

Hi, thanks for your report. Did you report this in numexpr as well?

No, but I can do if you think it necessary. Because it’s listed as a new feature, I figured that it shouldn’t be reported as a bug. But it’s also not noted as a breaking change either!

@rebecca-palmer
Copy link
Contributor

This has already been reported as #54449 and pydata/numexpr#444

@mroeschke
Copy link
Member

Thanks but closing as a duplicate of #54449

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

4 participants