-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
TST: refactored test_factorize #32311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TST: refactored test_factorize #32311
Conversation
pandas/tests/base/test_ops.py
Outdated
exp_arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4], np.intp) | ||
codes, uniques = n.factorize(sort=False) | ||
tm.assert_numpy_array_equal(codes, exp_arr) | ||
# CI: on linux 32bit the dtype is int32, otherwise int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bug itself; is there an open issue for it? If not can you open one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could be related to #31856
I think I addressed all your comments @WillAyd |
expected_codes = [expected_uniques_list.index(val) for val in obj] | ||
expected_codes = np.asarray(expected_codes, dtype=np.intp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not just use np.take here instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use expected_uniques.take is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I can use take
here. I want to construct an array containing the indices in expected_uniques
of the values of obj
.
# given
obj = Series([1, 2, 1, 3, 5])
expected_uniques = obj.unique() # array([1, 2, 3, 5])
# needed
expected_codes = array([0, 1, 0, 2, 3])
I could only use take
if already have the indices and need the values. I basically need the reverse of take
.
I guess I could use where
somehow, but it will probably be more complex than just using vanilla python list.index()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just use pd.factorize
then
In [7]: import pandas as pd
In [8]: obj = pd.Series([1, 2, 1, 3, 5])
In [10]: pd.factorize(obj)
Out[10]: (array([0, 1, 0, 2, 3]), Int64Index([1, 2, 3, 5], dtype='int64'))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm testing factorize here, so I need an alternative implementation 😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok you can actually use .get_loc
but a simple impl is better; can you add a comment explaining what you are doing (factorizing)
expected_codes = [expected_uniques_list.index(val) for val in obj] | ||
expected_codes = np.asarray(expected_codes, dtype=np.intp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use expected_uniques.take is better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small comment. ping on green.
expected_codes = [expected_uniques_list.index(val) for val in obj] | ||
expected_codes = np.asarray(expected_codes, dtype=np.intp) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok you can actually use .get_loc
but a simple impl is better; can you add a comment explaining what you are doing (factorizing)
@jreback took care of your comment and CI is green now |
Thanks @SaturnFromTitan |
TST: refactored test_factorize (pandas-dev#32311)
part of #23877
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff