Skip to content

TST: refactored test_factorize #32311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 7, 2020

Conversation

SaturnFromTitan
Copy link
Contributor

part of #23877

  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff

@SaturnFromTitan SaturnFromTitan changed the title refactored test_factorize TST: refactored test_factorize Feb 27, 2020
exp_arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4], np.intp)
codes, uniques = n.factorize(sort=False)
tm.assert_numpy_array_equal(codes, exp_arr)
# CI: on linux 32bit the dtype is int32, otherwise int64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a bug itself; is there an open issue for it? If not can you open one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be related to #31856

@WillAyd WillAyd added the Testing pandas testing functions or related to the test suite label Feb 27, 2020
@SaturnFromTitan
Copy link
Contributor Author

I think I addressed all your comments @WillAyd

Comment on lines +574 to +575
expected_codes = [expected_uniques_list.index(val) for val in obj]
expected_codes = np.asarray(expected_codes, dtype=np.intp)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you not just use np.take here instead?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use expected_uniques.take is better

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I can use take here. I want to construct an array containing the indices in expected_uniques of the values of obj.

# given
obj = Series([1, 2, 1, 3, 5])
expected_uniques = obj.unique()  # array([1, 2, 3, 5])

# needed
expected_codes = array([0, 1, 0, 2, 3])

I could only use take if already have the indices and need the values. I basically need the reverse of take.

I guess I could use where somehow, but it will probably be more complex than just using vanilla python list.index()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just use pd.factorize then

In [7]: import pandas as pd
In [8]: obj = pd.Series([1, 2, 1, 3, 5])
In [10]: pd.factorize(obj)
Out[10]: (array([0, 1, 0, 2, 3]), Int64Index([1, 2, 3, 5], dtype='int64'))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm testing factorize here, so I need an alternative implementation 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok you can actually use .get_loc but a simple impl is better; can you add a comment explaining what you are doing (factorizing)

Comment on lines +574 to +575
expected_codes = [expected_uniques_list.index(val) for val in obj]
expected_codes = np.asarray(expected_codes, dtype=np.intp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use expected_uniques.take is better

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small comment. ping on green.

Comment on lines +574 to +575
expected_codes = [expected_uniques_list.index(val) for val in obj]
expected_codes = np.asarray(expected_codes, dtype=np.intp)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok you can actually use .get_loc but a simple impl is better; can you add a comment explaining what you are doing (factorizing)

@jreback jreback added this to the 1.1 milestone Mar 4, 2020
@SaturnFromTitan
Copy link
Contributor Author

@jreback took care of your comment and CI is green now

@SaturnFromTitan
Copy link
Contributor Author

@jreback @WillAyd CI is green with two approvals. Can it be merged?

@WillAyd WillAyd merged commit 3d08aa5 into pandas-dev:master Mar 7, 2020
@WillAyd
Copy link
Member

WillAyd commented Mar 7, 2020

Thanks @SaturnFromTitan

sthagen added a commit to sthagen/pandas-dev-pandas that referenced this pull request Mar 7, 2020
SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants