-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: NaN values not converted to Stata missing values (GH6684) #6685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix for #6684 |
is the missing value indicator 'standard' or something the user would want to set? (e.g. you should it be a paramater with a default?) |
There are many missing value indicators ranging from . (dot), .a, .b, ..., .z. In principle these allow the reason for the missing value to be encoded (e.g. .a for non-response, .b for negative value, etc), In theory these could be set, and I thought about how to do this once, but I think it probably isn't a feature that would have much use since pandas doesn't really have a good method to handle different types of missing values. The idea I had at the time would require a passing a second DataFrame with some additional information about the missing data codes to use. Not worth the effort IMO. |
@bashtage totally fine.. |
looks good |
@jreback Assuming this passes, then this should do it. I looked at coverage and added a few more tests for some corner cases and removed a small amount of unreachable/unnecessary code. |
@bashtage can you rebase and push again |
Rebased. |
Stata does not correctly handle NaNs, and so these must be replaced with Stata missing values (. by default). The fix checks floating point columns for nan and replaces these with the Stata numeric code for (.). One of the code paths which writes files correctly handled this case, and this last-minute check was removed. The write_index option was also being ignored by omission. This has been fixed and numerous tests which were not correct have been fixed. Also contains some additional tests which were uncovered edges cases related to fix.
Fixed a dictionary comprehension, so I think this is finished. |
BUG: NaN values not converted to Stata missing values (GH6684)
thank you sir! |
I don't have reasonable access to a big endian machine. I would rate this as pretty low priority since Stata does not support big endian platforms (any more). |
turns out was pretty trivial, closed by #7272 |
closes #6684
Stata does not correctly handle NaNs, and so these must be replaced with Stata
missing values (. by default). The fix checks floating point columns for nan
and replaces these with the Stata numeric code for (.).
The write_index option was also being ignored by omission. This has been fixed and
numerous tests which were not correct have been fixed.