-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DataFrame __getitem__ ~100X slower on Pandas 0.19.1 vs 0.18.1 possibly getitem caching? #14930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I have a fix for this. |
I changed this 3c96442 to make it lazy |
@jreback Thanks for looking into this. 👍 Can you provide some info about Was this regression due to something that improved performance of row-based index but possibly hurt columnar access? |
see the referenced issues. They explain all. |
@dragoljub short answer is this. When the The solution was to do our uniqueness checking at the same time as we are already checking for sortedness( e.g. the However it seems that this was each time constructing the index mapping (bad). cc @llllllllll |
@jreback I took a look at your proposed fix. I think it looks correct, but I think there might be a simpler fix that just makes |
@jreback Thanks for the explanation and quick response time on this. 😄 This will be a great last minute per fix for 0.19.2! |
closes pandas-dev#14930 Author: Jeff Reback <[email protected]> Closes pandas-dev#14933 from jreback/perf and squashes the following commits: dc32b39 [Jeff Reback] PERF: fix getitem unique_check / initialization issue (cherry picked from commit 07c83ee)
closes pandas-dev#14930 Author: Jeff Reback <[email protected]> Closes pandas-dev#14933 from jreback/perf and squashes the following commits: dc32b39 [Jeff Reback] PERF: fix getitem unique_check / initialization issue
Problem description
It appears that
get_item_cache()
or__contains__
may have something to do with it. This affects other functionality such asdf.info()
which is now also ~100X slower.Expected Output
Output of
pd.show_versions()
pandas: 0.19.1
nose: 1.3.7
pip: 8.1.1
setuptools: 20.10.1
Cython: 0.24.1
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: 0.7.1
IPython: 4.2.0
sphinx: 1.3.6
patsy: 0.4.0
dateutil: 2.5.0
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.1
openpyxl: 2.2.5
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: None
lxml: 3.6.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.12
pymysql: None
psycopg2: None
jinja2: 2.8
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: