Skip to content

PERF/COMPAT: define platform int to np.intp #13972

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

chris-b1
Copy link
Contributor

@chris-b1 chris-b1 commented Aug 12, 2016

AFAIK this only affects 64 bit python on Windows.

numpy wants an np.intp (i8 on Windows) as a indexer for take, but pandas defines a "platform int" as a np.int_ (i4 on Windows). This hits performance twice, because we often start with i8, cast to i4, then numpy will cast back to i8 in its take.

This is an alternative to #13924 - there I explored replacing ndarray.take with our take_nd fully, but this approach solves the perf problem without much pain or risking new segfaults.

I'd still need to adjust a bunch of tests here to pass on Windows. This is an API change for "advanced" methods like get_indexer, but I don't think anything is necessary beyond a doc note?

ASV:

    before     after       ratio
  [4a80521 ] [3ced5d5 ]
     3.49ms     2.86ms      0.82  indexing.series_take_dtindex.time_series_take_dtindex
     3.28ms     3.03ms      0.92  indexing.series_take_intindex.time_series_take_intindex
    15.16ms    14.57ms      0.96  join_merge.join_dataframe_index_single_key_bigger.time_join_dataframe_index_single_key_bigger
    19.81ms    16.62ms      0.84  join_merge.join_dataframe_index_single_key_bigger_sort.time_join_dataframe_index_single_key_bigger_sort
    14.05ms    13.63ms      0.97  join_merge.join_dataframe_index_single_key_small.time_join_dataframe_index_single_key_small
     4.83ms     4.64ms      0.96  join_merge.join_dataframe_integer_2key.time_join_dataframe_integer_2key
     1.80ms     1.63ms      0.90  join_merge.join_dataframe_integer_key.time_join_dataframe_integer_key
   633.31us   570.37us      0.90  join_merge.join_non_unique_equal.time_join_non_unique_equal
      2.39s      1.36s      0.57  join_merge.left_outer_join_index.time_left_outer_join_index

@codecov-io
Copy link

codecov-io commented Aug 12, 2016

Current coverage is 85.29% (diff: 100%)

Merging #13972 into master will increase coverage by 0.02%

@@             master     #13972   diff @@
==========================================
  Files           139        139          
  Lines         50219      50245    +26   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          42819      42854    +35   
+ Misses         7400       7391     -9   
  Partials          0          0          

Powered by Codecov. Last update 4a80521...322b11a

@sinhrks sinhrks added Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode Windows Windows OS labels Aug 12, 2016
@@ -767,6 +767,40 @@ Note that the limitation is applied to ``fill_value`` which default is ``np.nan`
- Bug in ``SparseSeries.abs`` incorrectly keeps negative ``fill_value`` (:issue:`13853`)
- Bug in single row slicing on multi-type ``SparseDataFrame``s, types were previously forced to float (:issue:`13917`)

Indexer dtype Changes
^^^^^^^^^^^^^^^^^^^^^
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the issue ref here.

@jreback
Copy link
Contributor

jreback commented Aug 12, 2016

yes this looks reasonable. go ahead an make ready on windows. Also add an asv as above.

@jreback
Copy link
Contributor

jreback commented Aug 13, 2016

pls update the docs as indicated. otherwise lgtm.

@chris-b1
Copy link
Contributor Author

What else were you looking for in the docs? I did add an issue ref in the first paragraph.

@jreback
Copy link
Contributor

jreback commented Aug 15, 2016

ok this looks fine. can you run this on 32-bit linux as well and see that it comes up clean (IIRC you tests on windows so that should be good).

@jreback jreback added this to the 0.19.0 milestone Aug 15, 2016
@chris-b1
Copy link
Contributor Author

The changes I just pushed were to get this passing on 32 bit windows. I don't have anything handy to test 32 bit Linux, I can probably set up a VM, although I think it should generally act the same 32 bit Windows (np.int_ == np.intp == np.int32)

@jreback
Copy link
Contributor

jreback commented Aug 15, 2016

ok I'll run on 32 bit tomorrow

I just use a macosx / vm

@jreback
Copy link
Contributor

jreback commented Aug 16, 2016

This currently fails on 32-bit linux (before your PR). I recall had to do something on windows to get this to pass. Something wrong somewhere. Your PR fixes this! yeah!

======================================================================
FAIL: test_alignment_non_pandas (pandas.tests.frame.test_operators.TestDataFrameOperators)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jreback/pandas/pandas/tests/frame/test_operators.py", line 1210, in test_alignment_non_pandas
    Series([1, 2, 3], index=df.index))
  File "/home/jreback/pandas/pandas/util/testing.py", line 1154, in assert_series_equal
    assert_attr_equal('dtype', left, right)
  File "/home/jreback/pandas/pandas/util/testing.py", line 878, in assert_attr_equal
    left_attr, right_attr)
  File "/home/jreback/pandas/pandas/util/testing.py", line 1018, in raise_assert_detail
    raise AssertionError(msg)
AssertionError: Attributes are different

Attribute "dtype" are different
[left]:  int32
[right]: int64

So as-is this passes on linux32 (and fixes the above).

lmk when good to go.

@chris-b1
Copy link
Contributor Author

Cool - fixed the lint issue so this should be good to go.

@chris-b1 chris-b1 changed the title PERF/COMPAT: define platform int to np.intp (WIP) PERF/COMPAT: define platform int to np.intp Aug 16, 2016
@jreback jreback closed this in 0780443 Aug 17, 2016
@jreback
Copy link
Contributor

jreback commented Aug 17, 2016

thanks!

@chris-b1 chris-b1 deleted the platform-int branch August 17, 2016 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Memory or execution speed performance Reshaping Concat, Merge/Join, Stack/Unstack, Explode Windows Windows OS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants