Skip to content

BUG: Using datetime global in query expression raises UndefinedVariableError #50442

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
3 tasks done
phillipuniverse opened this issue Dec 26, 2022 · 4 comments
Open
3 tasks done
Labels
Bug expressions pd.eval, query

Comments

@phillipuniverse
Copy link

phillipuniverse commented Dec 26, 2022

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

def test_reproduction():
    unfiltered = DataFrame(
        {
            "planned_date": {
                "0": date.fromisoformat("2020-10-01"),
                "1": date.fromisoformat("2020-08-01"),
            },
        }
    )
    filtered = unfiltered.query(f'planned_date >= datetime.strptime("2020-09-01", "%Y-%m-%d").date()')\
        .copy().reset_index(drop=True)
    assert (
        filtered.to_json()
        == unfiltered.iloc[[0]].reset_index(drop=True).to_json()
    )

This results in the following exception, whereas on versions before 1.5.0 it works

            except KeyError as err:
>               raise UndefinedVariableError(key, is_local) from err
E               pandas.errors.UndefinedVariableError: name 'datetime' is not defined

Issue Description

The behavior change in #47273 appears to make it to where I cannot use the datetime global in a query string being passed to eval. I'm also just not sure what the right way is to now access globals within an eval, is there a better way to write this query?

Expected Behavior

Test passes

Installed Versions

INSTALLED VERSIONS

commit : 91111fd
python : 3.9.16.final.0
python-bits : 64
OS : Darwin
OS-release : 22.1.0
Version : Darwin Kernel Version 22.1.0: Sun Oct 9 20:14:54 PDT 2022; root:xnu-8792.41.9~2/RELEASE_X86_64
machine : x86_64
processor : i386
byteorder : little
LC_ALL : en_US.UTF-8
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.5.1
numpy : 1.23.0
pytz : 2022.1
dateutil : 2.8.2
setuptools : 65.6.3
pip : 22.1.2
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 4.9.0
html5lib : None
pymysql : 1.0.2
psycopg2 : 2.9.3
jinja2 : None
IPython : None
pandas_datareader: None
bs4 : 4.11.1
bottleneck : None
brotli : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.5.2
numba : None
numexpr : None
odfpy : None
openpyxl : 3.0.10
pandas_gbq : None
pyarrow : 10.0.1
pyreadstat : None
pyxlsb : None
s3fs : None
scipy : 1.5.4
snappy : None
sqlalchemy : 1.4.39
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
zstandard : None
tzdata : None

@phillipuniverse phillipuniverse added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Dec 26, 2022
@phillipuniverse
Copy link
Author

phillipuniverse commented Dec 26, 2022

Potentially resolved by #49248? Similar to an old issue from 2017 at #16283.

@phillipuniverse
Copy link
Author

phillipuniverse commented Dec 26, 2022

Workaround is to add all the globals in as resolvers to the query as a single-value tuple. This version of the above works:

from pandas import DataFrame
from pandas.core.computation.scope import DEFAULT_GLOBALS

def test_reproduction():
    unfiltered = DataFrame(
        {
            "planned_date": {
                "0": date.fromisoformat("2020-10-01"),
                "1": date.fromisoformat("2020-08-01"),
            },
        }
    )
    filtered = unfiltered.query(f'planned_date >= datetime.strptime("2020-09-01", "%Y-%m-%d").date()',
                                resolvers=(DEFAULT_GLOBALS, ))\
        .copy().reset_index(drop=True)
    assert (
        filtered.to_json()
        == unfiltered.iloc[[0]].reset_index(drop=True).to_json()
    )

@lakhvirss
Copy link

lakhvirss commented Jan 18, 2023

Do you know why this resolvers arg is needed? It helped fix my issue with using .query() on a Timestamp - this was never needed in Pandas < 1.4.3.

from pandas.core.computation.scope import DEFAULT_GLOBALS

sample_df = DataFrame.from_records([
    {'t': Timestamp('2023-01-04'), 'name': 'A', 'value': 100},
    {'t': Timestamp('2023-01-05'), 'name': 'A', 'value': 200},
    {'t': Timestamp('2023-01-04'), 'name': 'B', 'value': 300},
    {'t': Timestamp('2023-01-05'), 'name': 'B', 'value': 400},
])

# Does not work -> ValueError: "Timestamp" is not a supported function
sample_df.query('t==Timestamp("2023-01-04")')

# Works.
sample_df.query('t==Timestamp("2023-01-04")', resolvers=(DEFAULT_GLOBALS, ))```


Many thanks,
L

@topper-123
Copy link
Contributor

I think you have to prepend the variable with @, i.e. f'planned_date >= @datetime.strptime("2020-09-01", "%Y-%m-%d").date()'? (assuming datetime is in the environment)

However, I tried that and it still didn't work. I think this happens because datetime is in DEFAULT_GLOBALS.

pandas/core/computation/ops.py:110:

if local_name in self.env.scope and isinstance(
    self.env.scope[local_name], type
):
    is_local = False

I think is_local should probably not be False here, though I haven't checked thoroughly. Worth an investigation.

@topper-123 topper-123 added expressions pd.eval, query and removed Needs Triage Issue that has not been reviewed by a pandas team member labels May 15, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug expressions pd.eval, query
Projects
None yet
Development

No branches or pull requests

3 participants