PERF: improved clip performance #16364

jreback · 2017-05-16T02:33:08Z

closes #15400
In [1]: np.random.seed(1234)

In [2]: s = pd.Series(np.random.randn(50))

master

In [3]: %timeit s.clip(0, 1)
1.65 ms ± 48.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

PR

In [3]: %timeit s.clip(0, 1)
124 µs ± 2.79 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

prob as good as can do for now as we still have 2 where ops (numpy does this in a single loop), and we have a mask check and fill (and final construction).

but about 15x better

closes pandas-dev#15400

codecov · 2017-05-16T10:47:31Z

Codecov Report

Merging #16364 into master will decrease coverage by 0.01%.
The diff coverage is 94.11%.

@@            Coverage Diff             @@
##           master   #16364      +/-   ##
==========================================
- Coverage   90.38%   90.36%   -0.02%     
==========================================
  Files         161      161              
  Lines       50916    50933      +17     
==========================================
+ Hits        46021    46028       +7     
- Misses       4895     4905      +10

Flag	Coverage Δ
#multiple	`88.14% <94.11%> (ø)`	⬆️
#single	`40.21% <5.88%> (-0.12%)`	⬇️

Impacted Files	Coverage Δ
pandas/core/generic.py	`91.96% <94.11%> (+0.01%)`	⬆️
pandas/io/gbq.py	`25% <0%> (-58.34%)`	⬇️
pandas/core/frame.py	`97.68% <0%> (-0.1%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d92f06a...62843f8. Read the comment docs.

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

jorisvandenbossche

There is a slight change in behaviour in that the new implementation does not preserve heterogeneous data types (eg int/float).
Not that this should hold back these perf improvements, but we might consider to keep it for 0.21 (this is also no regression)

jorisvandenbossche · 2017-05-16T21:06:18Z

pandas/core/generic.py

+        result = self.values
+        mask = isnull(result)
+        if upper is not None:
+            result = np.where(result >= upper, upper, result)


I think this needs a with np.errstate, as we are working with raw array

In [8]: pd.Series([0, np.nan, 2]).clip(0, 1) /home/joris/scipy/pandas/pandas/core/generic.py:4117: RuntimeWarning: invalid value encountered in greater_equal result = np.where(result >= upper, upper, result) /home/joris/scipy/pandas/pandas/core/generic.py:4119: RuntimeWarning: invalid value encountered in less_equal result = np.where(result <= lower, lower, result) Out[8]: 0 0.0 1 NaN 2 1.0 dtype: float64

jorisvandenbossche · 2017-05-16T21:06:48Z

pandas/core/generic.py

+    def _clip_with_scalar(self, lower, upper):
+
+        if ((lower is not None and np.any(isnull(lower))) or
+                (upper is not None and np.any(isnull(upper)))):


Are the np.any needed here? As lower/upper are already confirmed to be a scalar?

jorisvandenbossche · 2017-05-16T21:18:41Z

new implementation does not preserve heterogeneous data types

In principle could add a check for that (at the if statement to decide to take this path or not), but not sure that is worth it ..

jreback · 2017-05-16T22:03:45Z

yeah this is a limitation of the current methodology

let me see what i can do

Additional test cases for #16364 when upper and / or lower is nan.

* upstream/master: (48 commits) BUG: Categorical comparison with unordered (pandas-dev#16339) ENH: Adding 'protocol' parameter to 'to_pickle'. PERF: improve MultiIndex get_loc performance (pandas-dev#16346) TST: remove pandas-datareader xfail as 0.4.0 works (pandas-dev#16374) TST: followup to pandas-dev#16364, catch errstate warnings (pandas-dev#16373) DOC: new oauth token TST: Add test for clip-na (pandas-dev#16369) ENH: Draft metadata specification doc for Apache Parquet (pandas-dev#16315) MAINT: Add .iml to .gitignore (pandas-dev#16368) BUG/API: Categorical constructor scalar categories (pandas-dev#16340) ENH: Provide dict object for to_dict() pandas-dev#16122 (pandas-dev#16220) PERF: improved clip performance (pandas-dev#16364) DOC: try new token for docs DOC: try with new secure token DOC: add developer section to the docs DEPS: Drop Python 3.4 support (pandas-dev#16303) DOC: remove credential helper DOC: force fetch on build docs DOC: redo dev docs access token DOC: add dataframe construction in merge_asof example (pandas-dev#16348) ...

closes pandas-dev#15400

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

…v#16373)

closes pandas-dev#15400 (cherry picked from commit 42e2a87)

…v#16373) (cherry picked from commit e97865e)

closes #15400 (cherry picked from commit 42e2a87)

(cherry picked from commit e97865e)

closes pandas-dev#15400

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

…v#16373)

jreback added the Performance Memory or execution speed performance label May 16, 2017

jreback added this to the 0.20.2 milestone May 16, 2017

PERF: improved clip performance

62843f8

closes pandas-dev#15400

jreback force-pushed the clip branch from 6efa1c8 to 62843f8 Compare May 16, 2017 10:14

jreback merged commit 42e2a87 into pandas-dev:master May 16, 2017

TomAugspurger added a commit to TomAugspurger/pandas that referenced this pull request May 16, 2017

TST: Add test for clip-na

b546752

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

TomAugspurger mentioned this pull request May 16, 2017

TST: Add test for clip-na #16369

Merged

jorisvandenbossche reviewed May 16, 2017

View reviewed changes

jreback pushed a commit that referenced this pull request May 16, 2017

TST: Add test for clip-na (#16369)

9c8337a

Additional test cases for #16364 when upper and / or lower is nan.

jreback added a commit to jreback/pandas that referenced this pull request May 16, 2017

TST: followup to pandas-dev#16364, catch errstate warnings

ffbb0b5

jreback added a commit that referenced this pull request May 17, 2017

TST: followup to #16364, catch errstate warnings (#16373)

e97865e

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017

PERF: improved clip performance (pandas-dev#16364)

a4730d5

closes pandas-dev#15400

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017

TST: Add test for clip-na (pandas-dev#16369)

6b05e16

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017

TST: followup to pandas-dev#16364, catch errstate warnings (pandas-de…

04ab907

…v#16373)

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request May 29, 2017

PERF: improved clip performance (pandas-dev#16364)

41d90dc

closes pandas-dev#15400 (cherry picked from commit 42e2a87)

TomAugspurger pushed a commit to TomAugspurger/pandas that referenced this pull request May 29, 2017

TST: followup to pandas-dev#16364, catch errstate warnings (pandas-de…

a495669

…v#16373) (cherry picked from commit e97865e)

TomAugspurger added Backported and removed Needs Backport labels May 30, 2017

TomAugspurger pushed a commit that referenced this pull request May 30, 2017

PERF: improved clip performance (#16364)

f16141f

closes #15400 (cherry picked from commit 42e2a87)

TomAugspurger pushed a commit that referenced this pull request May 30, 2017

TST: followup to #16364, catch errstate warnings (#16373)

fef4136

(cherry picked from commit e97865e)

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017

PERF: improved clip performance (pandas-dev#16364)

4c6b1c9

closes pandas-dev#15400

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017

TST: Add test for clip-na (pandas-dev#16369)

15f33e0

Additional test cases for pandas-dev#16364 when upper and / or lower is nan.

stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017

TST: followup to pandas-dev#16364, catch errstate warnings (pandas-de…

3667eb3

…v#16373)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

PERF: improved clip performance #16364

PERF: improved clip performance #16364

Uh oh!

jreback commented May 16, 2017

Uh oh!

codecov bot commented May 16, 2017 •

edited

Loading

Uh oh!

jorisvandenbossche left a comment

Uh oh!

jorisvandenbossche May 16, 2017

Uh oh!

jorisvandenbossche May 16, 2017

Uh oh!

jorisvandenbossche commented May 16, 2017

Uh oh!

jreback commented May 16, 2017

Uh oh!

Uh oh!

Uh oh!

PERF: improved clip performance #16364

PERF: improved clip performance #16364

Uh oh!

Conversation

jreback commented May 16, 2017

Uh oh!

codecov bot commented May 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche May 16, 2017

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche May 16, 2017

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche commented May 16, 2017

Uh oh!

jreback commented May 16, 2017

Uh oh!

Uh oh!

codecov bot commented May 16, 2017 •

edited

Loading