Skip to content

Some cleanups of merge_asof tests and error code #26249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
5 tasks
chrish42 opened this issue Apr 30, 2019 · 6 comments
Closed
5 tasks

Some cleanups of merge_asof tests and error code #26249

chrish42 opened this issue Apr 30, 2019 · 6 comments
Labels
Clean Reshaping Concat, Merge/Join, Stack/Unstack, Explode Testing pandas testing functions or related to the test suite

Comments

@chrish42
Copy link
Contributor

chrish42 commented Apr 30, 2019

This is a follow-up to #26242, for the following points:

  • Add test in test_merge_asof.py that checks that merge_asof() raises an exception if called on unordered categories for a merge key.
  • Remove docstrings in tests in test_merge_asof.py
  • Add test to check that merge_asof works when one of the merge keys is an ordered categorical
  • Raise MergeError sooner (rather than the ValueError that happens later) when asked to do a merge and one of the keys is an unordered categorical.
  • use repr(x.dtype) instead of x.dtype as input to .format() in exception messages everywhere in merge.py
@WillAyd WillAyd added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Apr 30, 2019
@chrish42
Copy link
Contributor Author

chrish42 commented Apr 30, 2019

There's a fifth potential cleanup. Currently, most of the tests use the "ticker data" example with a merge_asof(trades, quotes, on='time', by='ticker'), but with the "ticker" column left as object dtype. This leaves on the table a large number of tests that could exercise the by=[category dtype] code paths.

I see potentially two options here:

  1. Just convert the "ticker" column to be a categorical. Pro: one-line change. Con: many fewer tests for the by=[object dtype] case.
  2. Refactor the ticker data to be a pytest parametrized fixture, and then parametrize that fixture to generate both "object" and "category" variants for the "trades" and "quotes" dataframes. Pro: both cases now tested. Con: bigger change.

What do you think? If option 2 sounds good, now would be a good time for me to do it at least, given that I've been looking at that code lately (because I've been doing asof merges on dataframes that contain a lot of categorical data).

@chrish42
Copy link
Contributor Author

Maybe this would also be a good first step to then work on #16454?

@gfyoung gfyoung added the Testing pandas testing functions or related to the test suite label May 2, 2019
@gfyoung
Copy link
Member

gfyoung commented May 2, 2019

I like option 2, as it's more comprehensive.

@chrish42
Copy link
Contributor Author

chrish42 commented May 3, 2019

@gfyoung Me too, especially since I'm interested in having merge_asof() working better with categorical data. Just looking for some approval that this direction is good with the Pandas developers before reworking a bunch of the tests there., and also that it's okay to use the more advanced pytest features in tests for Pandas.

@WillAyd
Copy link
Member

WillAyd commented May 3, 2019

@chrish42 I'd say go for it! We use purest parametrized fixtures in other places as well so nothing out of the ordinary

@mroeschke
Copy link
Member

I think this has been cleaned up over the years so closing. Can open individual tickets for remaining issues is something isnt addressed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Clean Reshaping Concat, Merge/Join, Stack/Unstack, Explode Testing pandas testing functions or related to the test suite
Projects
None yet
Development

No branches or pull requests

4 participants