Skip to content

element-wise value comparison noticeably slower for large-scale dataframes #28617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
leo4183 opened this issue Sep 25, 2019 · 1 comment
Closed

Comments

@leo4183
Copy link

leo4183 commented Sep 25, 2019

Problem Description

element-wise value comparison noticeably slower for large-scale dataframes, compared to corresponding numpy array value comparison (although in older pandas version, the operation speed was acceptable).

import numpy as np
import pandas as pd
x = pd.DataFrame(np.random.rand(3000*4000).reshape((3000,-1)))

# pandas 0.25.1
%timeit x>0.0 
1.72 s ± 3.23 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

# pandas 0.17.1
%timeit x>0.0
115 ms ± 114 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit x.values>0.0
13.1 ms ± 148 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

INSTALLED VERSIONS

commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-80.7.2.el8_0.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 0.25.1
numpy : 1.17.2

@TomAugspurger
Copy link
Contributor

Duplicate of #24990. @jbrockmendel is working on this but it's a large project. Should be done for 1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants