to_stata uint16 #7397

dmsul · 2014-06-09T01:54:45Z

Simple changes to io/stata.py to write unsigned integers to Stata files. Sorry about the poor commit messages--I'm still new to git and wasn't able to correct it cleanly.

(See issue #7365)

jreback · 2014-06-09T02:00:09Z

you need to add some tests in io/tests/test_stata.py

you create a frame with several dtypes then round trip and compare with the original

these tests should fail (with uint16 dtypes) before your fix and pass after

bashtage · 2014-06-16T08:35:59Z

pandas/io/stata.py

@@ -990,11 +990,11 @@ def _dtype_to_stata_type(dtype):
        return chr(255)
    elif dtype == np.float32:
        return chr(254)
-    elif dtype == np.int32:
+    elif dtype in (np.int32, np.uint32):


This is not safe and can lead to data loss. Only supported Stata data types should be used here, and Stata does not support unsigned types. The correct method is to first perform all casting and then use only the case, Stata-safe data types when writing the data type.

This change should not be necessary since this function is called after

_cast_to_stata_types which cannot return any uint types.

bashtage · 2014-06-16T08:39:20Z

I'm not sure this is a good idea since Stata doesn't support any unsigned data types. In particular some of the code which uses the numpy datatypes in this commit will result in data loss.

The only place where uint* handling should be added is in the casting code. Once the columns have been cast to supported Stata datatypes then only the supported Stata datatypes should be used when writing the Stata data types (e.g. char(253))

bashtage · 2014-06-16T08:43:34Z

pandas/io/stata.py

@@ -230,13 +230,13 @@ def _cast_to_stata_types(data):
    ws = ''
    for col in data:
        dtype = data[col].dtype
-        if dtype == np.int8:
+        if dtype in (np.int8, np.uint8):


This is not the correct behavior for this function. This function ensures that all datatypes after it is run have a trivial mapping to Stata data types. It would be simplest to simply upcast uints to the next largest int which is always safe and then the other changes in the commit are not needed.

Something simple like

if dtype==np.uint8: data[col] = data[col].astype(np.int16) elif dtype==np.uint16: data[col] = data[col].astype(np.int32) elif dtype in (np.uint32, np.uint64): # either convert to int32 if max is small enough or float64, warning

bashtage · 2014-06-16T10:17:34Z

Also noticed a clear bug in (my) the code

https://github.com/dmsul/pandas/blob/tostata-uint16/pandas/io/stata.py#L244

should be

if data[col].max() <= 2 ** 53 or data[col].min() >= -2 ** 53:

The first comparison has *, not **. Would be good to fix this.

jreback · 2014-06-16T12:30:50Z

@dmsul ok, then this needs to be changed to coerce uint types to int (and if they are out of range, then I would raise (e.g. a >int64 value from a uint64)

dmsul added 2 commits June 8, 2014 20:23

Add uint support to DataFrame.to_Stata, attempt 1

0003a58

Added unsigned int support to DataFrame.to_stata()

5a6ce61

cpcloud changed the title ~~Tostata uint16~~ to_stata uint16 Jun 9, 2014

jreback added Dtypes labels Jun 14, 2014

jreback added this to the 0.15.0 milestone Jun 14, 2014

bashtage reviewed Jun 16, 2014
View reviewed changes

dmsul closed this Jul 17, 2014

dmsul deleted the tostata-uint16 branch July 17, 2014 03:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

to_stata uint16 #7397

to_stata uint16 #7397

dmsul commented Jun 9, 2014

jreback commented Jun 9, 2014

bashtage Jun 16, 2014

bashtage Jun 16, 2014

bashtage commented Jun 16, 2014

bashtage Jun 16, 2014

bashtage commented Jun 16, 2014

jreback commented Jun 16, 2014

to_stata uint16 #7397

to_stata uint16 #7397

Conversation

dmsul commented Jun 9, 2014

jreback commented Jun 9, 2014

bashtage Jun 16, 2014

Choose a reason for hiding this comment

bashtage Jun 16, 2014

Choose a reason for hiding this comment

bashtage commented Jun 16, 2014

bashtage Jun 16, 2014

Choose a reason for hiding this comment

bashtage commented Jun 16, 2014

jreback commented Jun 16, 2014