-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
Pandas quantile function very slow #11623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems to be because it computes the quantiles series by series -- so computing 10k quantiles like this example does is going to have a lot of overhead. This was presumably done as a simplification to handle different types such as TimeStamp. It also handles nulls by default (as do most Pandas functions), which also affects performance (lots of Ultimately |
these can simply be done block-by-block. we do this with almost all other functions already. |
I think the null-handling prevents trivial application even block-by-block. See my revised comment. |
There's an |
NumPy >= 1.9, so would require some special casing. Kevin On Thu, Nov 19, 2015 at 9:34 AM Maximilian Roos [email protected]
|
@sinhrks it needs to be done on a block-basis. Then it will be the same. |
The another bottleneck is transposition caused by |
numpy handles the 2-d just fine. so when it is done by blocks, we just transpose quantile and transpose (e.g. kind of like what |
after #13122
|
REGR: series quantile with nan closes pandas-dev#11623 closes pandas-dev#13098
On a similar note, I see that describe() seems unnecessarily slow. Computing individual components is much faster.
|
The quantile function is almost 10 000 times slower than the equivalent percentile function in numpy. See code below:
The text was updated successfully, but these errors were encountered: