Skip to content

DataFrame.mean takes a very long time with mixed dtype columns #6662

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
cpcloud opened this issue Mar 18, 2014 · 4 comments
Closed

DataFrame.mean takes a very long time with mixed dtype columns #6662

cpcloud opened this issue Mar 18, 2014 · 4 comments

Comments

@cpcloud
Copy link
Member

cpcloud commented Mar 18, 2014

If I have some numeric columns over which I want to compute the mean and I have at least one string column, it takes much too long to compute. I can't even hit Ctrl-C to interrupt the process (if the frame is large enough). Interestingly, the string columns are discarded in the final result. The perf difference is about a factor of 800 when the frame has 10000 elements.

In [18]: n = 10000

In [19]: df = DataFrame(randn(n, 2), columns=list('ab'))

In [20]: df['c'] = [pd.util.testing.rands(5) for _ in xrange(n)]

In [21]: df.head(10)
Out[21]:
        a       b      c
0  1.0393  0.5719  AVi6V
1  0.6642  0.7441  mtqXk
2 -1.1552  0.1583  euUoo
3  0.7759  0.7647  cAAk2
4 -0.4958  0.4079  TYRRj
5 -0.7168 -1.1523  YT34i
6  1.5557 -1.7054  vXtgM
7  0.2898 -0.4858  2Rs1P
8  0.3752  0.2802  4UUz1
9 -0.2449 -2.3170  Bbue3

[10 rows x 3 columns]

In [22]: timeit df.mean()
10 loops, best of 3: 48.6 ms per loop

In [23]: dfnum = df[['a', 'b']]

In [24]: timeit dfnum.mean()
10000 loops, best of 3: 61.6 µs per loop

In [25]: 48.6 * 1000 / 61.6
Out[25]: 788.961038961039 # this is huge
@cpcloud
Copy link
Member Author

cpcloud commented Mar 18, 2014

hm i see the numeric_only flag guess that's the solution

@cpcloud cpcloud closed this as completed Mar 18, 2014
@jreback
Copy link
Contributor

jreback commented Mar 18, 2014

see #4787, should prob change the default

@jreback
Copy link
Contributor

jreback commented Mar 18, 2014

@cpcloud you want to do that issue? pretty straightforward I think

@cpcloud
Copy link
Member Author

cpcloud commented Mar 18, 2014

Sure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants