Skip to content

BUG: query on columns with characters like # in its name #59296

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

aram-cedarwood
Copy link
Contributor

@aram-cedarwood aram-cedarwood commented Jul 21, 2024


This PR first added parsing.py: #28215


The issue is that none of the libraries tokenize, ast, or numexpr handle # the way we want it to.

tokenize is used here:

return tokenize.untokenize(f(x) for x in tokenize_string(source))
token_generator = tokenize.generate_tokens(line_reader)

ast is used here:
node = ast.fix_missing_locations(ast.parse(clean))

numexpr is used here:
return ne.evaluate(s, local_dict=scope)

@rhshadrach rhshadrach added Bug expressions pd.eval, query labels Jul 23, 2024
@aram-cedarwood aram-cedarwood force-pushed the query_column_with_sharp_sign_in_name branch from 2ef2fdb to 101c71d Compare July 27, 2024 18:50
@aram-cedarwood aram-cedarwood requested a review from jreback July 27, 2024 20:11
@aram-cedarwood aram-cedarwood marked this pull request as draft July 27, 2024 21:59
@aram-cedarwood aram-cedarwood marked this pull request as ready for review July 27, 2024 22:21
@aram-cedarwood aram-cedarwood marked this pull request as draft July 27, 2024 22:27
@aram-cedarwood aram-cedarwood marked this pull request as ready for review July 27, 2024 23:48
@aram-cedarwood aram-cedarwood force-pushed the query_column_with_sharp_sign_in_name branch from 7286029 to bf69392 Compare July 30, 2024 11:12
@aram-cedarwood aram-cedarwood force-pushed the query_column_with_sharp_sign_in_name branch from 1743a58 to 612a6ff Compare August 6, 2024 16:14
@aram-cedarwood aram-cedarwood requested a review from WillAyd August 7, 2024 04:44
Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments / questions but overall this looks great. @jreback in case you have more feedback on your end

# end of a double-quoted string
elif (parse_state == ParseState.IN_DOUBLE_QUOTE) and (s[i - 1] != "\\"):
parse_state = ParseState.DEFAULT
substr += char
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming these strings don't get too large, but would definitely be better for performance to collect each element you loop over into a list and join at the end, rather than using add assignment

@aram-cedarwood aram-cedarwood force-pushed the query_column_with_sharp_sign_in_name branch from 5ce3201 to 5e0631d Compare August 9, 2024 01:15
@aram-cedarwood aram-cedarwood requested a review from WillAyd August 9, 2024 01:41
@WillAyd
Copy link
Member

WillAyd commented Aug 9, 2024

@aram-cinnamon for the pre-commit issue you can see if you find anything of note in https://github.com/pre-commit/pygrep-hooks/issues, but I am doubtful that there is any real solve. You may just need to add exclude: ^pandas/tests/frame/test_query_eval.py to the rst-inline-touching-normal hook in the .pre-commit-config.yaml file

@aram-cedarwood
Copy link
Contributor Author

Thanks so much Will!!

@aram-cedarwood
Copy link
Contributor Author

Hm tests still failing. Will look into it soon

@WillAyd
Copy link
Member

WillAyd commented Aug 10, 2024

Hmm the failure on the Future Infer Strings job is kind of tricky, and tangential to what you are doing. For now I would just add the @pytest.mark.xfail(using_string_dtype(), reason="TODO(infer_string)") decorator that you see on some other tests

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm @mroeschke

@WillAyd WillAyd merged commit 7c726e9 into pandas-dev:main Aug 20, 2024
47 checks passed
@WillAyd
Copy link
Member

WillAyd commented Aug 20, 2024

Thanks @aram-cinnamon - very impressive!

@tgpaul-lm
Copy link

tgpaul-lm commented Sep 30, 2024

Great work! Had been hoping to see this in the next patch release which was 2.2.3 (released 20-sep-2024), but didn't make it. Any thoughts when this might make its way into an official release?

@WillAyd
Copy link
Member

WillAyd commented Sep 30, 2024

It will be part of the 3.0 release

@tqa236 tqa236 mentioned this pull request Feb 5, 2025
5 tasks
@mroeschke mroeschke added this to the 3.0 milestone Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug expressions pd.eval, query
Projects
None yet
6 participants