PERF/COMPAT: define platform int to np.intp #13972

chris-b1 · 2016-08-12T00:42:13Z

AFAIK this only affects 64 bit python on Windows.

numpy wants an np.intp (i8 on Windows) as a indexer for take, but pandas defines a "platform int" as a np.int_ (i4 on Windows). This hits performance twice, because we often start with i8, cast to i4, then numpy will cast back to i8 in its take.

This is an alternative to #13924 - there I explored replacing ndarray.take with our take_nd fully, but this approach solves the perf problem without much pain or risking new segfaults.

I'd still need to adjust a bunch of tests here to pass on Windows. This is an API change for "advanced" methods like get_indexer, but I don't think anything is necessary beyond a doc note?

ASV:

    before     after       ratio
  [4a80521 ] [3ced5d5 ]
     3.49ms     2.86ms      0.82  indexing.series_take_dtindex.time_series_take_dtindex
     3.28ms     3.03ms      0.92  indexing.series_take_intindex.time_series_take_intindex
    15.16ms    14.57ms      0.96  join_merge.join_dataframe_index_single_key_bigger.time_join_dataframe_index_single_key_bigger
    19.81ms    16.62ms      0.84  join_merge.join_dataframe_index_single_key_bigger_sort.time_join_dataframe_index_single_key_bigger_sort
    14.05ms    13.63ms      0.97  join_merge.join_dataframe_index_single_key_small.time_join_dataframe_index_single_key_small
     4.83ms     4.64ms      0.96  join_merge.join_dataframe_integer_2key.time_join_dataframe_integer_2key
     1.80ms     1.63ms      0.90  join_merge.join_dataframe_integer_key.time_join_dataframe_integer_key
   633.31us   570.37us      0.90  join_merge.join_non_unique_equal.time_join_non_unique_equal
      2.39s      1.36s      0.57  join_merge.left_outer_join_index.time_left_outer_join_index

codecov-io · 2016-08-12T01:22:23Z

Current coverage is 85.29% (diff: 100%)

Merging #13972 into master will increase coverage by 0.02%

@@             master     #13972   diff @@
==========================================
  Files           139        139          
  Lines         50219      50245    +26   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42819      42854    +35   
+ Misses         7400       7391     -9   
  Partials          0          0

Powered by Codecov. Last update 4a80521...322b11a

jreback · 2016-08-12T13:13:46Z

doc/source/whatsnew/v0.19.0.txt

@@ -767,6 +767,40 @@ Note that the limitation is applied to ``fill_value`` which default is ``np.nan`
 - Bug in ``SparseSeries.abs`` incorrectly keeps negative ``fill_value`` (:issue:`13853`)
 - Bug in single row slicing on multi-type ``SparseDataFrame``s, types were previously forced to float (:issue:`13917`)

+Indexer dtype Changes
+^^^^^^^^^^^^^^^^^^^^^


add the issue ref here.

jreback · 2016-08-12T13:16:02Z

yes this looks reasonable. go ahead an make ready on windows. Also add an asv as above.

jreback · 2016-08-13T22:16:24Z

pls update the docs as indicated. otherwise lgtm.

chris-b1 · 2016-08-14T14:32:13Z

What else were you looking for in the docs? I did add an issue ref in the first paragraph.

jreback · 2016-08-15T11:31:04Z

ok this looks fine. can you run this on 32-bit linux as well and see that it comes up clean (IIRC you tests on windows so that should be good).

chris-b1 · 2016-08-15T23:08:16Z

The changes I just pushed were to get this passing on 32 bit windows. I don't have anything handy to test 32 bit Linux, I can probably set up a VM, although I think it should generally act the same 32 bit Windows (np.int_ == np.intp == np.int32)

jreback · 2016-08-15T23:20:52Z

ok I'll run on 32 bit tomorrow

I just use a macosx / vm

jreback · 2016-08-16T10:45:04Z

This currently fails on 32-bit linux (before your PR). I recall had to do something on windows to get this to pass. Something wrong somewhere. Your PR fixes this! yeah!

======================================================================
FAIL: test_alignment_non_pandas (pandas.tests.frame.test_operators.TestDataFrameOperators)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jreback/pandas/pandas/tests/frame/test_operators.py", line 1210, in test_alignment_non_pandas
    Series([1, 2, 3], index=df.index))
  File "/home/jreback/pandas/pandas/util/testing.py", line 1154, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "/home/jreback/pandas/pandas/util/testing.py", line 878, in assert_attr_equal
    left_attr, right_attr)
  File "/home/jreback/pandas/pandas/util/testing.py", line 1018, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Attributes are different

Attribute "dtype" are different
[left]:  int32
[right]: int64

So as-is this passes on linux32 (and fixes the above).

lmk when good to go.

chris-b1 · 2016-08-16T15:50:39Z

Cool - fixed the lint issue so this should be good to go.

jreback · 2016-08-17T10:21:40Z

thanks!

PERF/COMPAT: define platform int to np.intp

3ced5d5

sinhrks added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode Windows Windows OS labels Aug 12, 2016

jreback reviewed Aug 12, 2016
View reviewed changes

adjust test for platform independence

84f38b2

jreback added this to the 0.19.0 milestone Aug 15, 2016

adjust for 32bit

fc80938

chris-b1 mentioned this pull request Aug 15, 2016

CLN/PERF: remove ndarray.take and platform int conversions #13924

Closed

lint fixup

322b11a

chris-b1 changed the title ~~PERF/COMPAT: define platform int to np.intp (WIP)~~ PERF/COMPAT: define platform int to np.intp Aug 16, 2016

jreback closed this in 0780443 Aug 17, 2016

chris-b1 deleted the platform-int branch August 17, 2016 22:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PERF/COMPAT: define platform int to np.intp #13972

PERF/COMPAT: define platform int to np.intp #13972

chris-b1 commented Aug 12, 2016 •

edited

Loading

codecov-io commented Aug 12, 2016 •

edited

Loading

jreback Aug 12, 2016

jreback commented Aug 12, 2016

jreback commented Aug 13, 2016

chris-b1 commented Aug 14, 2016

jreback commented Aug 15, 2016

chris-b1 commented Aug 15, 2016

jreback commented Aug 15, 2016

jreback commented Aug 16, 2016

chris-b1 commented Aug 16, 2016

jreback commented Aug 17, 2016

PERF/COMPAT: define platform int to np.intp #13972

PERF/COMPAT: define platform int to np.intp #13972

Conversation

chris-b1 commented Aug 12, 2016 • edited Loading

codecov-io commented Aug 12, 2016 • edited Loading

Current coverage is 85.29% (diff: 100%)

jreback Aug 12, 2016

Choose a reason for hiding this comment

jreback commented Aug 12, 2016

jreback commented Aug 13, 2016

chris-b1 commented Aug 14, 2016

jreback commented Aug 15, 2016

chris-b1 commented Aug 15, 2016

jreback commented Aug 15, 2016

jreback commented Aug 16, 2016

chris-b1 commented Aug 16, 2016

jreback commented Aug 17, 2016

chris-b1 commented Aug 12, 2016 •

edited

Loading

codecov-io commented Aug 12, 2016 •

edited

Loading