Skip to content

BUG: NaN values not converted to Stata missing values (GH6684) #6685

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 23, 2014

Conversation

bashtage
Copy link
Contributor

closes #6684

Stata does not correctly handle NaNs, and so these must be replaced with Stata
missing values (. by default). The fix checks floating point columns for nan
and replaces these with the Stata numeric code for (.).

The write_index option was also being ignored by omission. This has been fixed and
numerous tests which were not correct have been fixed.

@bashtage
Copy link
Contributor Author

Fix for #6684

@jreback
Copy link
Contributor

jreback commented Mar 21, 2014

is the missing value indicator 'standard' or something the user would want to set? (e.g. you should it be a paramater with a default?)

@bashtage
Copy link
Contributor Author

There are many missing value indicators ranging from . (dot), .a, .b, ..., .z. In principle these allow the reason for the missing value to be encoded (e.g. .a for non-response, .b for negative value, etc), In theory these could be set, and I thought about how to do this once, but I think it probably isn't a feature that would have much use since pandas doesn't really have a good method to handle different types of missing values.

The idea I had at the time would require a passing a second DataFrame with some additional information about the missing data codes to use. Not worth the effort IMO.

@jreback
Copy link
Contributor

jreback commented Mar 21, 2014

@bashtage totally fine..

@jreback
Copy link
Contributor

jreback commented Mar 22, 2014

looks good
can u add a release note?

@bashtage
Copy link
Contributor Author

@jreback Assuming this passes, then this should do it. I looked at coverage and added a few more tests for some corner cases and removed a small amount of unreachable/unnecessary code.

@jreback jreback added this to the 0.14.0 milestone Mar 22, 2014
@jreback jreback changed the title BUG: NaN values not converted to Stata missing values BUG: NaN values not converted to Stata missing values (GH6684) Mar 22, 2014
@jreback
Copy link
Contributor

jreback commented Mar 22, 2014

@bashtage can you rebase and push again

@bashtage
Copy link
Contributor Author

Rebased.

Stata does not correctly handle NaNs, and so these must be replaced with Stata
missing values (. by default).  The fix checks floating point columns for nan
and replaces these with the Stata numeric code for (.).  One of the code paths
which writes files correctly handled this case, and this last-minute check was
removed.

The write_index option was also being ignored by omission. This has been fixed
and numerous tests which were not correct have been fixed.

Also contains some additional tests which were uncovered edges cases related to
fix.
@bashtage
Copy link
Contributor Author

Fixed a dictionary comprehension, so I think this is finished.

jreback added a commit that referenced this pull request Mar 23, 2014
BUG: NaN values not converted to Stata missing values (GH6684)
@jreback jreback merged commit 83b1ce4 into pandas-dev:master Mar 23, 2014
@jreback
Copy link
Contributor

jreback commented Mar 23, 2014

thank you sir!

@bashtage bashtage deleted the stata-world-indicators branch April 5, 2014 11:11
@jreback
Copy link
Contributor

jreback commented May 28, 2014

@bashtage see issue #5781 if you have a chance.

I think these reading/writing needs to handle endianess properly (most machines are little, but big does exist!)

@bashtage
Copy link
Contributor Author

I don't have reasonable access to a big endian machine. I would rate this as pretty low priority since Stata does not support big endian platforms (any more).

@jreback
Copy link
Contributor

jreback commented May 29, 2014

turns out was pretty trivial, closed by #7272

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label Missing-data np.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Bug: Export to Stata NaN not converted to "."
2 participants