PERF: improve conversion to BooleanArray from int/float array #30095

ethanywang · 2019-12-05T21:43:27Z

closes PERF: improve conversion to BooleanArray from int/float array #29838
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

jreback

can you add this issue number onto the list for adding boolean array in the whatsnew.

can you add some tests specifically for this (we might have them, but pls point them out)

pandas/core/arrays/boolean.py

ethanywang · 2019-12-06T05:27:40Z

Related Test:

pandas/pandas/tests/arrays/test_boolean.py

Lines 136 to 155 in 2d5455c

    
           def test_to_boolean_array_from_integer_array(): 
        
               result = pd.array(np.array([1, 0, 1, 0]), dtype="boolean") 
        
               expected = pd.array([True, False, True, False], dtype="boolean") 
        
               tm.assert_extension_array_equal(result, expected) 
        
               # with missing values 
        
               result = pd.array(np.array([1, 0, 1, None]), dtype="boolean") 
        
               expected = pd.array([True, False, True, None], dtype="boolean") 
        
               tm.assert_extension_array_equal(result, expected) 
        
           def test_to_boolean_array_from_float_array(): 
        
               result = pd.array(np.array([1.0, 0.0, 1.0, 0.0]), dtype="boolean") 
        
               expected = pd.array([True, False, True, False], dtype="boolean") 
        
               tm.assert_extension_array_equal(result, expected) 
        
               # with missing values 
        
               result = pd.array(np.array([1.0, 0.0, 1.0, np.nan]), dtype="boolean") 
        
               expected = pd.array([True, False, True, None], dtype="boolean") 
        
               tm.assert_extension_array_equal(result, expected)

pandas/pandas/tests/arrays/test_boolean.py

Line 108 in 2d5455c

(np.array([np.nan, np.nan], dtype=float), [None, None]),

pandas/core/arrays/boolean.py

jreback · 2019-12-06T23:30:11Z

ok lgtm. can you do a simple benchmark on this and show it (+1 if you can add it to the asvs)

WillAyd

lgtm as well outside of @jreback comments

ethanywang · 2019-12-07T17:26:38Z

I used pytest-benchmark to do the simple benchmarking, as I'm not quite sure where should I write the benchmark for the boolean array in the asv folder.

Original Branch:

------------------------------------------------------------------------------------------------ benchmark: 2 tests ------------------------------------------------------------------------------------------------
Name (time in us)                          Min                   Max                Mean              StdDev              Median                 IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_from_integer_array     632.0190 (1.0)      4,040.8400 (1.0)      959.2070 (1.11)     343.6619 (1.04)     859.9870 (1.13)     310.8663 (1.28)        86;37        1.0425 (0.90)        829           1
test_benchmark_from_float_array       650.8890 (1.03)     4,129.7300 (1.02)     863.9706 (1.0)      330.2377 (1.0)      761.4645 (1.0)      242.4200 (1.0)         96;64        1.1574 (1.0)         978           1
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

PR Branch:

------------------------------------------------------------------------------------------------ benchmark: 2 tests -----------------------------------------------------------------------------------------------
Name (time in us)                          Min                   Max                Mean              StdDev              Median                IQR            Outliers  OPS (Kops/s)            Rounds  Iterations
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_benchmark_from_float_array       580.4190 (1.0)      2,836.9650 (1.53)     687.1475 (1.02)     196.7836 (1.69)     611.1900 (1.0)      97.8967 (2.22)      117;146        1.4553 (0.98)       1359           1
test_benchmark_from_integer_array     609.3710 (1.05)     1,856.8680 (1.0)      675.1754 (1.0)      116.3234 (1.0)      637.5215 (1.04)     44.0175 (1.0)         62;95        1.4811 (1.0)         760           1
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Is it now okay to merge? @jreback

jreback · 2019-12-08T17:32:51Z

@ethanywang see the asv docs here: https://dev.pandas.io/docs/development/contributing.html#running-the-performance-test-suite

can you construct some asvs which add some benchmarks (and then show the results here)

you can create a new file in benchmarks/

call it array.py (and then use pd.array for the construction).

ethanywang · 2019-12-08T18:26:49Z

@jreback Using the asv bechmark. The results are:

       before           after         ratio
     [c0f6428b]       [3dcbbd61]
     <master>         <int_float_to_boolean>
-        90.7±1μs       60.2±0.7μs     0.66  array.BooleanArray.time_from_float_array
-        95.0±4μs         60.5±2μs     0.64  array.BooleanArray.time_from_integer_array

SOME BENCHMARKS HAVE CHANGED SIGNIFICANTLY.
PERFORMANCE INCREASED.

jreback · 2019-12-08T18:34:42Z

perfect @ethanywang

would a follow up PR for similar asvs for IntegerArray and StringArray

(we might have some in series somewhere for IntegerArray already)
can move those

ethanywang · 2019-12-08T18:36:33Z

@jreback So you mean I can remove the array.py in the asv_benchmark folder, and not submit it in this PR?

jreback · 2019-12-08T19:06:33Z

@jreback So you mean I can remove the array.py in the asv_benchmark folder, and not submit it in this PR?

no in a follow up PR i would like to add asv constructions for IntegerArray and StringArray in array.py

we may have some construction benchmarks already for Integer dtypes in Series which we can move

jorisvandenbossche · 2019-12-09T09:41:43Z

@ethanywang Thanks a lot!

…-dev#30095)

ethanywang added 2 commits December 5, 2019 16:35

PERF: improve conversion to BooleanArray from int/float array

ee64bd6

Merge branch 'master' into int_float_to_boolean

4993578

ethanywang marked this pull request as ready for review December 5, 2019 22:15

jreback added the ExtensionArray Extending pandas with custom dtypes or arrays. label Dec 5, 2019

jreback requested changes Dec 5, 2019

View reviewed changes

pandas/core/arrays/boolean.py Outdated Show resolved Hide resolved

ethanywang added 3 commits December 6, 2019 00:19

DOC: add issue number into whatsnew list

3da324e

PERF: remove redundant copy() in booleanArray

788820f

TST: Update some test of boolean array from int/float np.ndarray

2d5455c

ethanywang requested a review from jreback December 6, 2019 05:28

WillAyd reviewed Dec 6, 2019

View reviewed changes

pandas/core/arrays/boolean.py Outdated Show resolved Hide resolved

STY: Change values.dtype comparation

5c3e4aa

jreback added this to the 1.0 milestone Dec 6, 2019

jreback added the Performance Memory or execution speed performance label Dec 6, 2019

ethanywang requested a review from WillAyd December 7, 2019 00:38

WillAyd approved these changes Dec 7, 2019

View reviewed changes

ethanywang added 2 commits December 8, 2019 13:11

TST: Add exception test from np.array

66fa559

ASV: add array benchmark

3dcbbd6

jreback approved these changes Dec 8, 2019

View reviewed changes

jorisvandenbossche approved these changes Dec 9, 2019

View reviewed changes

jorisvandenbossche merged commit fb50258 into pandas-dev:master Dec 9, 2019

ethanywang deleted the int_float_to_boolean branch December 9, 2019 16:24

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

PERF: improve conversion to BooleanArray from int/float array (pandas…

2b184ca

…-dev#30095)

proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019

PERF: improve conversion to BooleanArray from int/float array (pandas…

02770f4

…-dev#30095)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: improve conversion to BooleanArray from int/float array #30095

PERF: improve conversion to BooleanArray from int/float array #30095

Uh oh!

ethanywang commented Dec 5, 2019 •

edited

Loading

Uh oh!

jreback left a comment

Uh oh!

Uh oh!

ethanywang commented Dec 6, 2019

Uh oh!

Uh oh!

jreback commented Dec 6, 2019

Uh oh!

WillAyd left a comment

Uh oh!

ethanywang commented Dec 7, 2019 •

edited

Loading

Uh oh!

jreback commented Dec 8, 2019 •

edited

Loading

Uh oh!

ethanywang commented Dec 8, 2019

Uh oh!

jreback commented Dec 8, 2019

Uh oh!

ethanywang commented Dec 8, 2019

Uh oh!

jreback commented Dec 8, 2019

Uh oh!

jorisvandenbossche commented Dec 9, 2019

Uh oh!

Uh oh!

Uh oh!

PERF: improve conversion to BooleanArray from int/float array #30095

PERF: improve conversion to BooleanArray from int/float array #30095

Uh oh!

Conversation

ethanywang commented Dec 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ethanywang commented Dec 6, 2019

Uh oh!

Uh oh!

jreback commented Dec 6, 2019

Uh oh!

WillAyd left a comment

Choose a reason for hiding this comment

Uh oh!

ethanywang commented Dec 7, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jreback commented Dec 8, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ethanywang commented Dec 8, 2019

Uh oh!

jreback commented Dec 8, 2019

Uh oh!

ethanywang commented Dec 8, 2019

Uh oh!

jreback commented Dec 8, 2019

Uh oh!

jorisvandenbossche commented Dec 9, 2019

Uh oh!

Uh oh!

ethanywang commented Dec 5, 2019 •

edited

Loading

ethanywang commented Dec 7, 2019 •

edited

Loading

jreback commented Dec 8, 2019 •

edited

Loading