TST: refactored test_factorize #32311

SaturnFromTitan · 2020-02-27T21:22:58Z

tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff

WillAyd · 2020-02-27T22:45:39Z

pandas/tests/base/test_ops.py

-            exp_arr = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4], np.intp)
-            codes, uniques = n.factorize(sort=False)
-            tm.assert_numpy_array_equal(codes, exp_arr)
+        # CI: on linux 32bit the dtype is int32, otherwise int64


This seems like a bug itself; is there an open issue for it? If not can you open one?

Could be related to #31856

pandas/tests/base/test_ops.py

SaturnFromTitan · 2020-02-29T10:13:52Z

I think I addressed all your comments @WillAyd

WillAyd · 2020-03-03T01:10:07Z

pandas/tests/base/test_ops.py

+        expected_codes = [expected_uniques_list.index(val) for val in obj]
+        expected_codes = np.asarray(expected_codes, dtype=np.intp)


Can you not just use np.take here instead?

you can use expected_uniques.take is better

I don't think I can use take here. I want to construct an array containing the indices in expected_uniques of the values of obj.

# given obj = Series([1, 2, 1, 3, 5]) expected_uniques = obj.unique() # array([1, 2, 3, 5]) # needed expected_codes = array([0, 1, 0, 2, 3])

I could only use take if already have the indices and need the values. I basically need the reverse of take.

I guess I could use where somehow, but it will probably be more complex than just using vanilla python list.index()

I think you can just use pd.factorize then

In [7]: import pandas as pd In [8]: obj = pd.Series([1, 2, 1, 3, 5]) In [10]: pd.factorize(obj) Out[10]: (array([0, 1, 0, 2, 3]), Int64Index([1, 2, 3, 5], dtype='int64'))

I'm testing factorize here, so I need an alternative implementation 😄

ok you can actually use .get_loc but a simple impl is better; can you add a comment explaining what you are doing (factorizing)

pandas/tests/base/test_ops.py

jreback · 2020-03-03T02:46:12Z

pandas/tests/base/test_ops.py

+        expected_codes = [expected_uniques_list.index(val) for val in obj]
+        expected_codes = np.asarray(expected_codes, dtype=np.intp)


you can use expected_uniques.take is better

WillAyd

lgtm

jreback

small comment. ping on green.

jreback · 2020-03-04T14:07:01Z

pandas/tests/base/test_ops.py

+        expected_codes = [expected_uniques_list.index(val) for val in obj]
+        expected_codes = np.asarray(expected_codes, dtype=np.intp)


ok you can actually use .get_loc but a simple impl is better; can you add a comment explaining what you are doing (factorizing)

SaturnFromTitan · 2020-03-05T16:00:06Z

@jreback took care of your comment and CI is green now

SaturnFromTitan · 2020-03-07T09:26:08Z

@jreback @WillAyd CI is green with two approvals. Can it be merged?

WillAyd · 2020-03-07T18:35:35Z

Thanks @SaturnFromTitan

TST: refactored test_factorize (pandas-dev#32311)

refactored test_factorize

e419141

SaturnFromTitan changed the title ~~refactored test_factorize~~ TST: refactored test_factorize Feb 27, 2020

fixing ci

cdca771

WillAyd requested changes Feb 27, 2020

View reviewed changes

WillAyd added the Testing pandas testing functions or related to the test suite label Feb 27, 2020

switched to using intp instead of workaround to make CI happy

1b4825b

WillAyd reviewed Mar 3, 2020

View reviewed changes

jreback requested changes Mar 3, 2020

View reviewed changes

WillAyd approved these changes Mar 3, 2020

View reviewed changes

jreback approved these changes Mar 4, 2020

View reviewed changes

jreback added this to the 1.1 milestone Mar 4, 2020

review comments

9c22a50

SaturnFromTitan mentioned this pull request Mar 4, 2020

TST: Split and simplify test_value_counts_unique_nunique #32281

Merged

3 tasks

SaturnFromTitan added 2 commits March 4, 2020 18:33

Merge branch 'master' into fixturize-test_factorize

d26d0fa

Merge branch 'master' into fixturize-test_factorize

76d73cc

WillAyd merged commit 3d08aa5 into pandas-dev:master Mar 7, 2020

sthagen added a commit to sthagen/pandas-dev-pandas that referenced this pull request Mar 7, 2020

Merge pull request #82 from pandas-dev/master

8523cfa

TST: refactored test_factorize (pandas-dev#32311)

SeeminSyed pushed a commit to CSCD01-team01/pandas that referenced this pull request Mar 22, 2020

TST: refactored test_factorize (pandas-dev#32311)

a709fbd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TST: refactored test_factorize #32311

TST: refactored test_factorize #32311

SaturnFromTitan commented Feb 27, 2020

WillAyd Feb 27, 2020

SaturnFromTitan Feb 27, 2020

SaturnFromTitan commented Feb 29, 2020

WillAyd Mar 3, 2020

jreback Mar 3, 2020

SaturnFromTitan Mar 3, 2020

WillAyd Mar 3, 2020

SaturnFromTitan Mar 3, 2020

jreback Mar 4, 2020

jreback Mar 3, 2020

WillAyd left a comment

jreback left a comment

jreback Mar 4, 2020

SaturnFromTitan commented Mar 5, 2020

SaturnFromTitan commented Mar 7, 2020

WillAyd commented Mar 7, 2020

		expected_codes = [expected_uniques_list.index(val) for val in obj]
		expected_codes = np.asarray(expected_codes, dtype=np.intp)

TST: refactored test_factorize #32311

TST: refactored test_factorize #32311

Conversation

SaturnFromTitan commented Feb 27, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SaturnFromTitan commented Feb 29, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd left a comment

Choose a reason for hiding this comment

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SaturnFromTitan commented Mar 5, 2020

SaturnFromTitan commented Mar 7, 2020

WillAyd commented Mar 7, 2020