Skip to content

test_missing_value_conversion on ubuntu 13.10 32bit KeyError: 2147483647 #8968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yarikoptic opened this issue Dec 2, 2014 · 6 comments · Fixed by #8982
Closed

test_missing_value_conversion on ubuntu 13.10 32bit KeyError: 2147483647 #8968

yarikoptic opened this issue Dec 2, 2014 · 6 comments · Fixed by #8982
Labels
IO Stata read_stata, to_stata Testing pandas testing functions or related to the test suite
Milestone

Comments

@yarikoptic
Copy link
Contributor

======================================================================
ERROR: test_missing_value_conversion (pandas.io.tests.test_stata.TestStata)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tmp/buildd/pandas-0.15.1+git125-ge463818/debian/tmp/usr/lib/python3/dist-packages/pandas/io/tests/test_stata.py", line 651, in test_missing_value_conversion
    parsed_113 = read_stata(self.dta17_113, convert_missing=True)
  File "/tmp/buildd/pandas-0.15.1+git125-ge463818/debian/tmp/usr/lib/python3/dist-packages/pandas/io/stata.py", line 69, in read_stata
    order_categoricals)
  File "/tmp/buildd/pandas-0.15.1+git125-ge463818/debian/tmp/usr/lib/python3/dist-packages/pandas/io/stata.py", line 1278, in data
    missing_value = StataMissingValue(um)
  File "/tmp/buildd/pandas-0.15.1+git125-ge463818/debian/tmp/usr/lib/python3/dist-packages/pandas/io/stata.py", line 646, in __init__
    self._str = self.MISSING_VALUES[value]
KeyError: 2147483647

doesn't happen on amd64

INSTALLED VERSIONS
------------------
commit: None
python: 3.3.2.final.0
python-bits: 32
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C
LANG: C

pandas: 0.15.1.dev
nose: 1.3.0
Cython: 0.19
numpy: 1.7.1
scipy: 0.12.0
statsmodels: None
IPython: None
sphinx: 1.1.3
patsy: None
dateutil: 2.0
pytz: 2012c
bottleneck: None
tables: None
numexpr: None
matplotlib: 1.2.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: 4.2.0
html5lib: None
httplib2: None
apiclient: None
rpy2: None
sqlalchemy: None
pymysql: None
psycopg2: None
@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

cc @bashtage

@jreback jreback added IO Stata read_stata, to_stata Testing pandas testing functions or related to the test suite labels Dec 3, 2014
@jreback jreback added this to the 0.15.2 milestone Dec 3, 2014
@bashtage
Copy link
Contributor

bashtage commented Dec 3, 2014

Bizzare. This is straight python

MISSING_VALUES = {}
bases = (101, 32741, 2147483621)
for b in bases:
    MISSING_VALUES[b] = '.'
    for i in range(1, 27):
        MISSING_VALUES[i + b] = '.' + chr(96 + i)

This certainly appears to create MISSING_VALUE[2147483621 + 26] (2147483647).

I don't have any 32 bit to test on - I suppose that is is a 32 bit issue since this is the largest 32-bit integer

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

so on 32-bit
the max values i np.iinfo(np.int32).max (which is that number)

you cannot add to it (well you can, but it squelches the overflow) and its 'undefined' IIRC.
however I have seen where the max value 'wraps' around, the the max values i one less that as show here.

are these bases a stata thing?

@bashtage
Copy link
Contributor

bashtage commented Dec 3, 2014

Yes, these are a Stata choice to use the highest integer values of each type to represent missing values. From what I can see, it should be perfectly fine since it is not adding anything to this value.

@bashtage
Copy link
Contributor

bashtage commented Dec 3, 2014

Just some thoughts - I am guessing that hash(np.int32(2147483647)) is not the same as hash(int(2147483647)) - perhaps numpy is treating it as 2147483647L which might have a different hash on 32 bit. I suppose one way to workaround this would be to explicitly cast any integer values to int when looking up (any missing < 2147483647 must be an int).

bashtage added a commit to bashtage/pandas that referenced this issue Dec 6, 2014
Force conversion to integer for missing values when
they must be integer to avoid hash errors on 32 bit
platforms.

closes pandas-dev#8968
@bashtage
Copy link
Contributor

bashtage commented Dec 6, 2014

@jreback Should be ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Stata read_stata, to_stata Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants