Skip to content

BUG: UndefinedVariableError in DataFrame.query with classname #49248

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v1.5.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ including other versions of pandas.

Fixed regressions
~~~~~~~~~~~~~~~~~
- Fixed regression in :meth:`DataFrame.query` raising ``UndefinedVariableError`` when calling it with class as a name (:issue:`48694`)
- Fixed regression in :meth:`MultiIndex.join` for extension array dtypes (:issue:`49277`)
- Fixed regression in :meth:`Series.replace` raising ``RecursionError`` with numeric dtype and when specifying ``value=None`` (:issue:`45725`)
- Fixed regression in arithmetic operations for :class:`DataFrame` with :class:`MultiIndex` columns with different dtypes (:issue:`49769`)
Expand Down
8 changes: 4 additions & 4 deletions pandas/core/computation/ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
result_type_many,
)
from pandas.core.computation.scope import DEFAULT_GLOBALS
from pandas.core.series import Series

from pandas.io.formats.printing import (
pprint_thing,
Expand Down Expand Up @@ -102,10 +103,9 @@ def evaluate(self, *args, **kwargs) -> Term:
def _resolve_name(self):
local_name = str(self.local_name)
is_local = self.is_local
if local_name in self.env.scope and isinstance(
self.env.scope[local_name], type
):
is_local = False
check = self.env.scope.get("column") == local_name
if not isinstance(check, Series) and check:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't self.env.scope.get("column") == local_name always produce a bool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, ideally it should.
Although the tests were fine, but it threw the error you mentioned up here #49248 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too familiar with this part of the codebase, but is there a way to check if it's local without checking if it's a series?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error on the docs were specific to Series (either local_name or self.env.scope.get("column"))
One example below
Screenshot 2022-10-31 at 11 08 42 PM

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the self.env.scope.get("column") Series name just be compared to the local_name?

Copy link
Member Author

@debnathshoham debnathshoham Oct 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so something like

  • check if Series
  • compare self.env.scope["column"].name with local_name
  • if not Series compare directly self.env.scope["column"]

sorry but how is that any better?

EDIT: or did you mean just compare the Series.name? but for most cases, they aren't Series

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This may be my ignorance to this part of the code base), but I assumed that self.env.scope["column"] would refer to a Series object and the local_name would refer to the name of the Series so no Series checking needs to be done.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, no that's not the case. Usually it's name of a df column.

I did the Series check just to cover the Sphinx errors. Maybe someone with better Sphinx knowledge can fix this more elegantly.

is_local = not is_local

res = self.env.resolve(local_name, is_local=is_local)
self.update(res)
Expand Down
29 changes: 15 additions & 14 deletions pandas/core/computation/pytables.py
Original file line number Diff line number Diff line change
Expand Up @@ -609,20 +609,21 @@ def __repr__(self) -> str:

def evaluate(self):
"""create and return the numexpr condition and filter"""
try:
self.condition = self.terms.prune(ConditionBinOp)
except AttributeError as err:
raise ValueError(
f"cannot process expression [{self.expr}], [{self}] "
"is not a valid condition"
) from err
try:
self.filter = self.terms.prune(FilterBinOp)
except AttributeError as err:
raise ValueError(
f"cannot process expression [{self.expr}], [{self}] "
"is not a valid filter"
) from err
if self.terms is not None:
try:
self.condition = self.terms.prune(ConditionBinOp)
except AttributeError as err:
raise ValueError(
f"cannot process expression [{self.expr}], [{self}] "
"is not a valid condition"
) from err
try:
self.filter = self.terms.prune(FilterBinOp)
except AttributeError as err:
raise ValueError(
f"cannot process expression [{self.expr}], [{self}] "
"is not a valid filter"
) from err

return self.condition, self.filter

Expand Down
13 changes: 12 additions & 1 deletion pandas/tests/computation/test_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -1846,7 +1846,7 @@ def test_negate_lt_eq_le(engine, parser):
)
def test_eval_no_support_column_name(request, column):
# GH 44603
if column in ["True", "False", "inf", "Inf"]:
if column in ["True", "False"]:
request.node.add_marker(
pytest.mark.xfail(
raises=KeyError,
Expand All @@ -1861,6 +1861,17 @@ def test_eval_no_support_column_name(request, column):
tm.assert_frame_equal(result, expected)


def test_classname_frame_query():
# GH 48694
class A:
a = 1

df1 = DataFrame({"x": [1, 2, 3], "y": [4, 5, 6]})
df_test = DataFrame({"x": [1], "y": [4]})
tm.assert_frame_equal(df_test, df1.query(f"x=={A.a}"))
tm.assert_frame_equal(df_test, df1.query("[email protected]"))


@td.skip_array_manager_not_yet_implemented
def test_set_inplace(using_copy_on_write):
# https://github.com/pandas-dev/pandas/issues/47449
Expand Down