Skip to content

Query errors in Windows #12023

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jlopezpena opened this issue Jan 12, 2016 · 7 comments
Closed

Query errors in Windows #12023

jlopezpena opened this issue Jan 12, 2016 · 7 comments
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Windows Windows OS

Comments

@jlopezpena
Copy link

We are observing some inconsistent behaviour of the dataframe query method under windows.

I enclose a smallish (3600 rows) dataframe where we find the issue.

The following code fails randomly under Windows, but seems to work smoothly on Mac:

import pandas as pd

df = pd.read_csv("pd_bug.txt")

for x in df.A.unique():
    try:
        assert len(df.query('(A == @x) | (B == @x)')) == len(df[(df.A == x) | (df.B == x)])
    except AssertionError:
        print x

We are observing a failure every other run (often with two or three values failing).
Here is the output of pd.show_versions() on the failing machine:

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.11.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 94 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.17.1
nose: 1.3.7
pip: 7.1.2
setuptools: 19.1.1
Cython: 0.23.4
numpy: 1.10.1
scipy: 0.16.0
statsmodels: 0.6.1
IPython: 4.0.1
sphinx: 1.3.1
patsy: 0.4.0
dateutil: 2.4.2
pytz: 2015.7
blosc: None
bottleneck: 1.0.0
tables: 3.2.2
numexpr: 2.4.4
matplotlib: 1.5.0
openpyxl: 2.2.6
xlrd: 0.9.4
xlwt: 1.0.0
xlsxwriter: 0.7.7
lxml: 3.4.4
bs4: 4.4.1
html5lib: None
httplib2: None
apiclient: None
sqlalchemy: 1.0.9
pymysql: None
psycopg2: None
Jinja2: None

pd_bug.txt

@sinhrks
Copy link
Member

sinhrks commented Jan 12, 2016

Thanks for the report.

I've got a similar report personally which seems to be caused by a local variable using @. Unfortunately I couldn't reproduce the issue (I've used Mac at that time).

Is your problem only caused by queries includes @?

@sinhrks sinhrks added Numeric Operations Arithmetic, Comparison, and Logical operations Windows Windows OS labels Jan 12, 2016
@jlopezpena
Copy link
Author

We have only observed it on queries using @, since the failure happens at random values it is very hard to test whether it fails without using the @ operator.

@jreback
Copy link
Contributor

jreback commented Jan 12, 2016

this is numexpr 2.4.4 which is a bit buggy

see #11743

update to 2.4.6 and all will be well

@jreback jreback closed this as completed Jan 12, 2016
@jlopezpena
Copy link
Author

According to your comment on #11743 shouldn't the indexing issue only happen for >500K elements?
My example here is much smaller.

Anyway, if if is a numexpr issue, I cannot fix it in the offending machine until 2.4.6 gets pushed to conda (the pip version requires a different numpy that fails pip-installing on windows).

@jreback
Copy link
Contributor

jreback commented Jan 12, 2016

you can remove numexpr and it wlll work

@jreback
Copy link
Contributor

jreback commented Jan 13, 2016

note that numexpr was just updated in conda to 2.4.6

@jlopezpena
Copy link
Author

Just noticed it a few minutes ago. Will test with the upgraded version and check that everything works. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Numeric Operations Arithmetic, Comparison, and Logical operations Windows Windows OS
Projects
None yet
Development

No branches or pull requests

3 participants