ENH: Allow parameters method and min_periods in DataFrame.corrwith() #15573

anthonyho · 2017-03-04T23:18:17Z

Added new keyword parameters for DataFrame.corrwith(), which allows methods other than Pearson to be used. See #9490.

closes Allow method keyword for DataFrame.corrwith() #9490
tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

jreback · 2017-03-04T23:26:31Z

doc/source/computation.rst

@@ -157,6 +157,7 @@ objects.
   df2 = pd.DataFrame(np.random.randn(4, 4), index=index[:4], columns=columns)
   df1.corrwith(df2)
   df2.corrwith(df1, axis=1)
+   df2.corrwith(df1, axis=1, method='kendall')


add versionsddes tag (and small comment here)

jreback · 2017-03-04T23:28:18Z

pandas/core/frame.py

-
-        correl = num / dom
+        correl = Series({col: nanops.nancorr(left[col].values,
+                                             right[col].values,


this is going to be very slow

we need to rework nancorr to do this instead

I think the new implementation (which calls nancorr which in turns calls numpy/scipy correlation functions) is actually significantly faster than the current implementation (manually computing Pearson correlation using DataFrame.mean(), DataFrame.sum(), and DataFrame.std())

For example:

Current implementation:

>>> import pandas as pd; import timeit >>> pd.__version__ u'0.19.2' >>> iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') >>> timeit.timeit(lambda: iris.corrwith(iris), number=10000) 50.891642808914185 >>> timeit.timeit(lambda: iris.T.corrwith(iris.T), number=10000) 42.0677649974823

New implementation:

>>> import pandas as pd; import timeit >>> pd.__version__ '0.19.0+539.g0b77680.dirty' >>> iris = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv') >>> timeit.timeit(lambda: iris.corrwith(iris, method='pearson'), number=10000) 28.622286081314087 >>> timeit.timeit(lambda: iris.T.corrwith(iris.T, method='pearson'), number=10000) 21.898916959762573

I'm pretty new to this, so please let me know if I'm missing anything here.

look thru the benchmarks and pls add some asv as appropriate

include wide and talk data

on wide data this will be slower

jreback · 2017-04-03T15:15:00Z

can you update

jreback · 2017-05-07T14:07:20Z

can you rebase, add some benchmarks to asv and show them.

jreback · 2017-06-10T19:05:10Z

can you rebase and update?

jreback · 2017-08-17T10:29:05Z

closing as stale

ENH: Allow parameters method and min_periods in DataFrame.corrwith() (p…

d7e03eb

…andas-dev#9490)

jreback requested changes Mar 4, 2017

View reviewed changes

jreback added Numeric Operations Arithmetic, Comparison, and Logical operations Enhancement labels May 7, 2017

jreback closed this Aug 17, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Allow parameters method and min_periods in DataFrame.corrwith() #15573

ENH: Allow parameters method and min_periods in DataFrame.corrwith() #15573

anthonyho commented Mar 4, 2017 •

edited

Loading

jreback Mar 4, 2017

jreback Mar 4, 2017

anthonyho Mar 5, 2017 •

edited

Loading

jreback Mar 5, 2017

jreback commented Apr 3, 2017

jreback commented May 7, 2017

jreback commented Jun 10, 2017

jreback commented Aug 17, 2017

ENH: Allow parameters method and min_periods in DataFrame.corrwith() #15573

ENH: Allow parameters method and min_periods in DataFrame.corrwith() #15573

Conversation

anthonyho commented Mar 4, 2017 • edited Loading

jreback Mar 4, 2017

Choose a reason for hiding this comment

jreback Mar 4, 2017

Choose a reason for hiding this comment

anthonyho Mar 5, 2017 • edited Loading

Choose a reason for hiding this comment

jreback Mar 5, 2017

Choose a reason for hiding this comment

jreback commented Apr 3, 2017

jreback commented May 7, 2017

jreback commented Jun 10, 2017

jreback commented Aug 17, 2017

anthonyho commented Mar 4, 2017 •

edited

Loading

anthonyho Mar 5, 2017 •

edited

Loading