BUG: NaN values not converted to Stata missing values (GH6684) #6685

bashtage · 2014-03-21T22:22:24Z

Stata does not correctly handle NaNs, and so these must be replaced with Stata
missing values (. by default). The fix checks floating point columns for nan
and replaces these with the Stata numeric code for (.).

The write_index option was also being ignored by omission. This has been fixed and
numerous tests which were not correct have been fixed.

bashtage · 2014-03-21T22:23:04Z

Fix for #6684

jreback · 2014-03-21T22:24:25Z

is the missing value indicator 'standard' or something the user would want to set? (e.g. you should it be a paramater with a default?)

bashtage · 2014-03-21T22:28:44Z

There are many missing value indicators ranging from . (dot), .a, .b, ..., .z. In principle these allow the reason for the missing value to be encoded (e.g. .a for non-response, .b for negative value, etc), In theory these could be set, and I thought about how to do this once, but I think it probably isn't a feature that would have much use since pandas doesn't really have a good method to handle different types of missing values.

The idea I had at the time would require a passing a second DataFrame with some additional information about the missing data codes to use. Not worth the effort IMO.

jreback · 2014-03-21T22:39:35Z

@bashtage totally fine..

jreback · 2014-03-22T03:01:34Z

looks good
can u add a release note?

bashtage · 2014-03-22T12:39:30Z

@jreback Assuming this passes, then this should do it. I looked at coverage and added a few more tests for some corner cases and removed a small amount of unreachable/unnecessary code.

jreback · 2014-03-22T20:54:10Z

@bashtage can you rebase and push again

bashtage · 2014-03-22T23:53:35Z

Rebased.

Stata does not correctly handle NaNs, and so these must be replaced with Stata missing values (. by default). The fix checks floating point columns for nan and replaces these with the Stata numeric code for (.). One of the code paths which writes files correctly handled this case, and this last-minute check was removed. The write_index option was also being ignored by omission. This has been fixed and numerous tests which were not correct have been fixed. Also contains some additional tests which were uncovered edges cases related to fix.

bashtage · 2014-03-23T09:58:49Z

Fixed a dictionary comprehension, so I think this is finished.

BUG: NaN values not converted to Stata missing values (GH6684)

jreback · 2014-03-23T13:41:29Z

thank you sir!

jreback · 2014-05-28T19:52:55Z

@bashtage see issue #5781 if you have a chance.

I think these reading/writing needs to handle endianess properly (most machines are little, but big does exist!)

bashtage · 2014-05-29T06:44:07Z

I don't have reasonable access to a big endian machine. I would rate this as pretty low priority since Stata does not support big endian platforms (any more).

jreback · 2014-05-29T11:09:44Z

turns out was pretty trivial, closed by #7272

jreback added Bug labels Mar 22, 2014

jreback added this to the 0.14.0 milestone Mar 22, 2014

jreback changed the title ~~BUG: NaN values not converted to Stata missing values~~ BUG: NaN values not converted to Stata missing values (GH6684) Mar 22, 2014

jreback added the Data IO label Mar 22, 2014

jreback added a commit that referenced this pull request Mar 23, 2014

Merge pull request #6685 from bashtage/stata-world-indicators

83b1ce4

BUG: NaN values not converted to Stata missing values (GH6684)

jreback merged commit 83b1ce4 into pandas-dev:master Mar 23, 2014

bashtage deleted the stata-world-indicators branch April 5, 2014 11:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: NaN values not converted to Stata missing values (GH6684) #6685

BUG: NaN values not converted to Stata missing values (GH6684) #6685

bashtage commented Mar 21, 2014

bashtage commented Mar 21, 2014

jreback commented Mar 21, 2014

bashtage commented Mar 21, 2014

jreback commented Mar 21, 2014

jreback commented Mar 22, 2014

bashtage commented Mar 22, 2014

jreback commented Mar 22, 2014

bashtage commented Mar 22, 2014

bashtage commented Mar 23, 2014

jreback commented Mar 23, 2014

jreback commented May 28, 2014

bashtage commented May 29, 2014

jreback commented May 29, 2014

BUG: NaN values not converted to Stata missing values (GH6684) #6685

BUG: NaN values not converted to Stata missing values (GH6684) #6685

Conversation

bashtage commented Mar 21, 2014

bashtage commented Mar 21, 2014

jreback commented Mar 21, 2014

bashtage commented Mar 21, 2014

jreback commented Mar 21, 2014

jreback commented Mar 22, 2014

bashtage commented Mar 22, 2014

jreback commented Mar 22, 2014

bashtage commented Mar 22, 2014

bashtage commented Mar 23, 2014

jreback commented Mar 23, 2014

jreback commented May 28, 2014

bashtage commented May 29, 2014

jreback commented May 29, 2014