Skip to content

to_stata uint16 #7397

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

to_stata uint16 #7397

wants to merge 2 commits into from

Conversation

dmsul
Copy link

@dmsul dmsul commented Jun 9, 2014

Simple changes to io/stata.py to write unsigned integers to Stata files. Sorry about the poor commit messages--I'm still new to git and wasn't able to correct it cleanly.

(See issue #7365)

@jreback
Copy link
Contributor

jreback commented Jun 9, 2014

you need to add some tests in io/tests/test_stata.py

you create a frame with several dtypes then round trip and compare with the original

these tests should fail (with uint16 dtypes) before your fix and pass after

@cpcloud cpcloud changed the title Tostata uint16 to_stata uint16 Jun 9, 2014
@jreback jreback added this to the 0.15.0 milestone Jun 14, 2014
@@ -990,11 +990,11 @@ def _dtype_to_stata_type(dtype):
return chr(255)
elif dtype == np.float32:
return chr(254)
elif dtype == np.int32:
elif dtype in (np.int32, np.uint32):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not safe and can lead to data loss. Only supported Stata data types should be used here, and Stata does not support unsigned types. The correct method is to first perform all casting and then use only the case, Stata-safe data types when writing the data type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change should not be necessary since this function is called after

_cast_to_stata_types which cannot return any uint types.

@bashtage
Copy link
Contributor

I'm not sure this is a good idea since Stata doesn't support any unsigned data types. In particular some of the code which uses the numpy datatypes in this commit will result in data loss.

The only place where uint* handling should be added is in the casting code. Once the columns have been cast to supported Stata datatypes then only the supported Stata datatypes should be used when writing the Stata data types (e.g. char(253))

@@ -230,13 +230,13 @@ def _cast_to_stata_types(data):
ws = ''
for col in data:
dtype = data[col].dtype
if dtype == np.int8:
if dtype in (np.int8, np.uint8):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the correct behavior for this function. This function ensures that all datatypes after it is run have a trivial mapping to Stata data types. It would be simplest to simply upcast uints to the next largest int which is always safe and then the other changes in the commit are not needed.

Something simple like

if dtype==np.uint8:
    data[col] = data[col].astype(np.int16)
elif dtype==np.uint16:
    data[col] = data[col].astype(np.int32)
elif dtype in (np.uint32, np.uint64):
  # either convert to int32 if max is small enough or float64, warning

@bashtage
Copy link
Contributor

Also noticed a clear bug in (my) the code

https://github.com/dmsul/pandas/blob/tostata-uint16/pandas/io/stata.py#L244

should be

if data[col].max() <= 2 ** 53 or data[col].min() >= -2 ** 53:

The first comparison has *, not **. Would be good to fix this.

@jreback
Copy link
Contributor

jreback commented Jun 16, 2014

@dmsul ok, then this needs to be changed to coerce uint types to int (and if they are out of range, then I would raise (e.g. a >int64 value from a uint64)

@dmsul dmsul closed this Jul 17, 2014
@dmsul dmsul deleted the tostata-uint16 branch July 17, 2014 03:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions IO Stata read_stata, to_stata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants