Skip to content

FIX: Workaround for integer hashing #8982

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 6, 2014

Conversation

bashtage
Copy link
Contributor

@bashtage bashtage commented Dec 3, 2014

Force conversion to integer for missing values when
they must be integer to avoid hash errors on 32 bit
platforms.

closes #8968

@bashtage
Copy link
Contributor Author

bashtage commented Dec 3, 2014

@yarikoptic This might fix it by forcing int. I'm not completely sure if this isn't a NumPy bug (or at least idiosyncracy) on 32 bit platforms with hashing or casting this value to native Python types.

@jreback
Copy link
Contributor

jreback commented Dec 3, 2014

iirc this is a numpyism

that dtype hashes are not the same

@jreback jreback added IO Stata read_stata, to_stata Testing pandas testing functions or related to the test suite labels Dec 3, 2014
@jreback jreback added this to the 0.15.2 milestone Dec 3, 2014
@jreback
Copy link
Contributor

jreback commented Dec 4, 2014

@yarikoptic can you try this out?

@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

well i'll merge and see what happens

@bashtage thanks

@@ -643,6 +643,7 @@ class StataMissingValue(StringMixin):

def __init__(self, value):
self._value = value
value = int(value) if value < 2147483648 else value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe put this as 2147......L e.g. long?

can you test somehow?

@bashtage
Copy link
Contributor Author

bashtage commented Dec 5, 2014

There is some cleaning to do in the stata code, so I'll think about how to make this more robust in the future - perhaps making all integer values always longs when creating the dictionary, and then doing the same when doing a look up might be the more stable across bitnesses.

@jreback
Copy link
Contributor

jreback commented Dec 5, 2014

ok, if this fixes @yarikoptic then ok to put in

@bashtage bashtage force-pushed the stata-large-int-missing-value branch from 04d4308 to b7fc3c6 Compare December 6, 2014 03:58
@bashtage
Copy link
Contributor Author

bashtage commented Dec 6, 2014

Have made a few minor changes to ensure that the hashes in the missing value dictionary should always have the correct type for any bitness.

@bashtage bashtage force-pushed the stata-large-int-missing-value branch 2 times, most recently from bd85b73 to 802917c Compare December 6, 2014 04:56
Force conversion to integer for missing values when
they must be integer to avoid hash errors on 32 bit
platforms.

closes pandas-dev#8968
jreback added a commit that referenced this pull request Dec 6, 2014
@jreback jreback merged commit fa914cc into pandas-dev:master Dec 6, 2014
@bashtage bashtage deleted the stata-large-int-missing-value branch January 18, 2015 22:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IO Stata read_stata, to_stata Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

test_missing_value_conversion on ubuntu 13.10 32bit KeyError: 2147483647
2 participants