Skip to content

NotImplementedError: 'AnnAssign' nodes are not implemented but numpy-like syntax works #21525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
simtsc opened this issue Jun 18, 2018 · 11 comments · Fixed by #21548
Closed

NotImplementedError: 'AnnAssign' nodes are not implemented but numpy-like syntax works #21525

simtsc opened this issue Jun 18, 2018 · 11 comments · Fixed by #21548
Labels

Comments

@simtsc
Copy link

simtsc commented Jun 18, 2018

My dataframe df has a column called "Unnamed:_6" which is of dtype object.

Running the following command works just fine:

df[df['Unnamed:_6'] == "Foo"]

However, the equivalent line:

df.query('Unnamed:_6 == "Foo"')

raises the following exception:

NotImplementedError: 'AnnAssign' nodes are not implemented
@uds5501
Copy link
Contributor

uds5501 commented Jun 18, 2018

@simtsc Could you please attach a code to replicate your dataframe?

@WillAyd
Copy link
Member

WillAyd commented Jun 18, 2018

You'll also need to provide more information per the contributing guide:

https://pandas.pydata.org/pandas-docs/stable/contributing.html#bug-reports-and-enhancement-requests

i.e. pd.show_versions(). A quick google on the error makes me think it has something to do with annotations but you need to provide the extra information before this can be looked at in any detail

@WillAyd WillAyd added the Needs Info Clarification about behavior needed to assess issue label Jun 18, 2018
@uds5501
Copy link
Contributor

uds5501 commented Jun 18, 2018

@WillAyd @simtsc
Okay, I think I may have found the source of this problem. I tried to replicate the situation as mentioned in the issue and chalked out the following errors:

>>> x=pd.DataFrame([['a','b','c'],['Foo','e','f']] , columns=['Unnamed:_6','guy','walks'])     
>>> x.query('Unnamed:_6 == "Foo"')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2297, in query
    res = self.eval(expr, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2366, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/computation/eval.py", line 290, in eval
    truediv=truediv)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/computation/expr.py", line 732, in __init__
    self.terms = self.parse()
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/computation/expr.py", line 749, in parse
    return self._visitor.visit(self.expr)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/computation/expr.py", line 310, in visit
    node = ast.fix_missing_locations(ast.parse(clean))
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    Unnamed :_6 =="Foo"
            ^
SyntaxError: invalid syntax

Then I decided to work on another column name and got this:

>>> x=pd.DataFrame([['a','b','c'],['Foo','e','f']] , columns=['a:b','guy','walks'])     
>>> x
   a:b guy walks
0    a   b     c
1  Foo   e     f
>>> x.query('a:b == "Foo"')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2297, in query
    res = self.eval(expr, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/frame.py", line 2366, in eval
    return _eval(expr, inplace=inplace, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/computation/eval.py", line 290, in eval
    truediv=truediv)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/computation/expr.py", line 732, in __init__
    self.terms = self.parse()
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/computation/expr.py", line 749, in parse
    return self._visitor.visit(self.expr)
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/computation/expr.py", line 310, in visit
    node = ast.fix_missing_locations(ast.parse(clean))
  File "/usr/lib/python2.7/ast.py", line 37, in parse
    return compile(source, filename, mode, PyCF_ONLY_AST)
  File "<unknown>", line 1
    a :b =="Foo"
      ^
SyntaxError: invalid syntax

Apparently, the : in column name upsets this particular query.
Note : Error type and message is different

@WillAyd
Copy link
Member

WillAyd commented Jun 18, 2018

Since the error type and message are different I doubt that is the same. I'd wait until @simtsc can provide more information - otherwise this is a guessing game

@simtsc
Copy link
Author

simtsc commented Jun 18, 2018

Hi guys, sorry, I forgot to mention the version information, indeed. I am working in a jupyter notebook. I am running jupyter notebook server 5.5.0 with Python 3.6.4 |Anaconda custom (64-bit)| (default, Jan 16 2018, 10:22:32) [MSC v.1900 64 bit (AMD64)]. The input data is coming from an Excel file. However, I am not able to share the file here. This is the exact call I am using to read the file:

df_dict = pd.read_excel(os.path.join('data', 'input.xlsm'), 
                        skiprows=14, usecols='G:Y', sheet_name=None) 

Here is what pandas gathers about my environment:
commit: None
python: 3.6.4.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.1
pytest: 3.3.2
pip: 10.0.1
setuptools: 38.4.0
Cython: 0.27.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 6.2.1
sphinx: 1.6.6
patsy: 0.5.0
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.4
feather: None
matplotlib: 2.1.2
openpyxl: 2.4.10
xlrd: 1.1.0
xlwt: 1.3.0
xlsxwriter: 1.0.2
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.1
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: 0.1.5
fastparquet: None
pandas_gbq: None
pandas_datareader: None

@simtsc
Copy link
Author

simtsc commented Jun 18, 2018

Here is a piece of code that replicates the issue:

df_test = pd.DataFrame([["Hello", "World"], ["Foo", "Bar"]], columns=["Unnamed:_5", "Unnamed:_6"])
df_test[df_test['Unnamed:_6'] == "Bar"]  # succeeds
df_test.eval('Unnamed:_6 == "Bar"')  # exception

This is the full stacktrace:

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-17-6f14251164a5> in <module>()
      1 df_test = pd.DataFrame([["Hello", "World"], ["Foo", "Bar"]], columns=["Unnamed:_5", "Unnamed:_6"])
      2 df_test[df_test['Unnamed:_6'] == "Bar"]
----> 3 df_test.eval('Unnamed:_6 == "Bar"')

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\frame.py in eval(self, expr, inplace, **kwargs)
   2960             kwargs['target'] = self
   2961         kwargs['resolvers'] = kwargs.get('resolvers', ()) + tuple(resolvers)
-> 2962         return _eval(expr, inplace=inplace, **kwargs)
   2963 
   2964     def select_dtypes(self, include=None, exclude=None):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\computation\eval.py in eval(expr, parser, engine, truediv, local_dict, global_dict, resolvers, level, target, inplace)
    289 
    290         parsed_expr = Expr(expr, engine=engine, parser=parser, env=env,
--> 291                            truediv=truediv)
    292 
    293         # construct the engine and evaluate the parsed expression

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\computation\expr.py in __init__(self, expr, engine, parser, env, truediv, level)
    737         self.env.scope['truediv'] = truediv
    738         self._visitor = _parsers[parser](self.env, self.engine, self.parser)
--> 739         self.terms = self.parse()
    740 
    741     @property

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\computation\expr.py in parse(self)
    754     def parse(self):
    755         """Parse an expression"""
--> 756         return self._visitor.visit(self.expr)
    757 
    758     @property

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\computation\expr.py in visit(self, node, **kwargs)
    319         method = 'visit_' + node.__class__.__name__
    320         visitor = getattr(self, method)
--> 321         return visitor(node, **kwargs)
    322 
    323     def visit_Module(self, node, **kwargs):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\computation\expr.py in visit_Module(self, node, **kwargs)
    325             raise SyntaxError('only a single expression is allowed')
    326         expr = node.body[0]
--> 327         return self.visit(expr, **kwargs)
    328 
    329     def visit_Expr(self, node, **kwargs):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\computation\expr.py in visit(self, node, **kwargs)
    319         method = 'visit_' + node.__class__.__name__
    320         visitor = getattr(self, method)
--> 321         return visitor(node, **kwargs)
    322 
    323     def visit_Module(self, node, **kwargs):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\pandas\core\computation\expr.py in f(self, *args, **kwargs)
    202     def f(self, *args, **kwargs):
    203         raise NotImplementedError("{name!r} nodes are not "
--> 204                                   "implemented".format(name=node_name))
    205     return f
    206 

NotImplementedError: 'AnnAssign' nodes are not implemented

@WillAyd WillAyd added Bug and removed Needs Info Clarification about behavior needed to assess issue labels Jun 18, 2018
@uds5501
Copy link
Contributor

uds5501 commented Jun 18, 2018

@simtsc @WillAyd I tried recreating this inmaster and still got the error I mentioned earlier. Shall I open another issue for that?

@simtsc
Copy link
Author

simtsc commented Jun 18, 2018

@uds5501 Since your reported issue is different from mine I would suggest to open another ticket for it and continue the discussion there.

@simtsc
Copy link
Author

simtsc commented Jun 19, 2018

Update: I was experimenting a little with the column names and found that the characters used in the name have a profound impact on whether the call succeeds. I was also able to replicate @uds5501 issue as well. Here are my findings. I hope this helps to synchronize the behavior of the numpy-like API and eval/query API.

df_test = pd.DataFrame([["Hello", "World"], ["Foo", "Bar"]], columns=['A:_', 'B:_'])
df_test[df_test['B:_'] == "Bar"]  # succeeds
df_test.query('B:_ == "Bar"')  # NotImplementedError: 'AnnAssign' nodes are not implemented
df_test = pd.DataFrame([["Hello", "World"], ["Foo", "Bar"]], columns=['A:', 'B:'])
df_test[df_test['B:'] == "Bar"]  # succeeds
df_test.query('B: == "Bar"')  # SyntaxError: invalid syntax
df_test = pd.DataFrame([["Hello", "World"], ["Foo", "Bar"]], columns=['A_', 'B_'])
df_test[df_test['B_'] == "Bar"]  # succeeds
df_test.query('B_ == "Bar"')  # succeeds
df_test = pd.DataFrame([["Hello", "World"], ["Foo", "Bar"]], columns=list('AB'))
df_test[df_test['B'] == "Bar"]  # succeeds
df_test.query('B == "Bar"')  # succeeds

@umerkhalifa
Copy link

Hi
Did you solve this issue. I do have a dataframe with similar column names with : but its more 278 columns with this issue. Can you Suggest me, how to solve this?

@ValentinFFM
Copy link

ValentinFFM commented May 11, 2022

Hi @unmerkhalifa,
it's been a while since your question, but maybe it's interesting for other as well. I solved this issues by working with backtick quoting as suggested in the official documentation of pandas.

Pandas Query Documentation

If you put a backtick quote around your column name (which contains a colon), the query function works properly and does not show any error anymore.

Code example

df.query('`Column: Name`=="value')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
5 participants