Skip to content

DOC: Confusing 'resolvers' kwarg documentation for DataFrame.query() #34966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eric-brunner opened this issue Jun 24, 2020 · 1 comment · Fixed by #45134
Closed

DOC: Confusing 'resolvers' kwarg documentation for DataFrame.query() #34966

eric-brunner opened this issue Jun 24, 2020 · 1 comment · Fixed by #45134
Labels
Docs expressions pd.eval, query
Milestone

Comments

@eric-brunner
Copy link

Location of the documentation

According to DataFrame.query() documentation
(https://pandas.pydata.org/docs/dev/reference/api/pandas.DataFrame.query.html#pandas.DataFrame.query)
the 'resolvers' keyword is allowed and its function documented by DataFrame.eval() (https://pandas.pydata.org/docs/dev/reference/api/pandas.eval.html#pandas.eval).

Documentation problem

The keyword's description states:
"A list of objects implementing the getitem special method that you can use to inject an additional collection of namespaces to use for variable lookup. [...]"

But default namespaces used by DataFrame.eval() are actually overwritten by namespace collections. See basic example:

from pandas import DataFrame, Series
import numpy as np
df = DataFrame(np.random.random((5, 3)),)
print(df.query('index==1'))
print(df.query('map==1', resolvers=({'map': Series([4,3,2,1,0], [0,1,2,3,4])},)))
print(df.query('index==1', resolvers=({'map': Series([4,3,2,1,0], [0,1,2,3,4])},)))

The last line raises an UndefinedVariableError, because 'index' is no longer part of the namespace collection.

This is caused by the condition if resolvers is None: in DataFrame.eval() source code (v1.0.4, frame.py, line 3335):

...
inplace = validate_bool_kwarg(inplace, "inplace")
        resolvers = kwargs.pop("resolvers", None)
        kwargs["level"] = kwargs.pop("level", 0) + 1
        if resolvers is None:
            index_resolvers = self._get_index_resolvers()
            column_resolvers = self._get_cleaned_column_resolvers()
            resolvers = column_resolvers, index_resolvers
        if "target" not in kwargs:
            kwargs["target"] = self
        kwargs["resolvers"] = kwargs.get("resolvers", ()) + tuple(resolvers)
...

Suggested fix for documentation

Therefore, according to the actual behaviour, the documentation for the 'resolvers' kwarg of DataFrame.eval() should read:
"A list of objects implementing the getitem special method that you can use to replace the default collection of namespaces to use for variable lookup. [...]"

OR

Is this a bug? Should it be addressed?

@eric-brunner eric-brunner added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 24, 2020
@jbrockmendel jbrockmendel added expressions pd.eval, query and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Sep 3, 2020
@bubblingoak
Copy link
Contributor

Just ran into this same issue. It seems like a bug to me as I think overriding the default resolvers just makes pandas.dataframe.eval equivalent to pandas.eval. The code itself is also contradictory. It first removes the resolvers key with dict.pop then attempts to get it with dict.get as though to combine the two tuples of resolvers.

I've made a MR to address this here #45134

@jreback jreback added this to the 1.4 milestone Dec 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Docs expressions pd.eval, query
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants