Bug: Export to Stata NaN not converted to "." #6684

ozak · 2014-03-21T18:30:20Z

Hi,

I noticed that when exporting data to stata the NaN values are not always converted to Stata missing values but instead left blank. This somehow confuses Stata which does not allow using the destring command to solve the problem nor using replace value=. if value==..

As an Example I downloaded the World Development Indicators and used the following commands to export National Savings to an the excel and csv file:

import pandas as pd
import os
dfwdi=pd.read_excel('WDI.xlsx','Data')
dfwdi.columns
dfout=dfwdi.ix[dfwdi['Indicator Code']=='NY.GDS.TOTL.ZS']
dfout
cols=['savyr'+str(i) for i in xrange(1960,dfwdi.columns.values[-1]+1)]
dfout.reset_index(inplace=True, drop=True)
dfout.to_csv('sav.csv', index=False)
dfout.to_stata('sav.dta', write_index=False)

If you import the data into Stata (I am using v.13) and run the following commands, things fail.

use "sav.dta", clear

* Correct number of missing values
summ savyr2000
reg savyr2010 savyr 2000

* Correct countries identified as missing
tab code if savyr==.

* replace missing values to "."
* One cannot replace the missing not presented as "."
replace savyr2010==. if savyr==""
* Use "." to identify
replace savyr2010==. if savyr==.

* Perform analysis again
summ savyr2000
reg savyr2010 savyr 2000

* Still fails

As you can see Stata does not perform the analysis, even though it correctly recognizes the missing values. But not all of them are presented as ".". If one imports the the csv version into Stata and runs the same initial commands it works fine.

import delimited "sav.csv"

* Correct number of missing values
summ savyr2000
reg savyr2010 savyr 2000

Furthermore, for some reason the index is still present in the stata file, even though I had used the write_index=False option.

I am using Enthought's Canopy distribution on OSX Mavericks with Pandas '0.13.1'. Haven't tried on other Python dists.

The text was updated successfully, but these errors were encountered:

jreback · 2014-03-21T18:32:09Z

cc @bashtage

we have tests for this?

@ozak can you try on master, have been many fixes in regards to stata reading/writing

ozak · 2014-03-21T18:34:35Z

@jreback can you explain a little more what you'd like me to do...I am still a newbie in the GitHub problem solving scheme.

bashtage · 2014-03-21T18:39:02Z

This will need to be compares to master in pandas (the pre-release of 0.14). I did a lot of work around missing values, and there were (iirc) some issues regarding some dara types (e.g. doubles).

jreback · 2014-03-21T19:12:29Z

@ozak I was suggesting building with master from the main repo

http://pandas.pydata.org/developers.html#working-with-the-code

then you could help explore where the error is

bashtage · 2014-03-21T21:22:01Z

@ozak @jreback There is definitely something wrong in master. The nans are appearing as 1.#QNAN. After a bit of looking, it seems that Stata does not support NaNs, and expects a missing value rather than a NaN. This should be simple, at least ignoring performance considerations,

jreback · 2014-03-21T21:57:26Z

ok will mark as a bug

bashtage · 2014-03-21T21:58:17Z

Writing last test for patch now.

ozak · 2014-03-22T02:00:19Z

Wow that was fast! I guess this means this is solved?

bashtage · 2014-03-23T10:00:01Z

@Azak Once the referenced patch gets pulled into master, then the master, and later 0.14, will not have this issue.

jreback added Missing-data labels Mar 21, 2014

jreback added this to the 0.14.0 milestone Mar 21, 2014

jreback added the Bug label Mar 21, 2014

bashtage mentioned this issue Mar 21, 2014

BUG: NaN values not converted to Stata missing values (GH6684) #6685

Merged

jreback closed this as completed in #6685 Mar 23, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: Export to Stata NaN not converted to "." #6684

Bug: Export to Stata NaN not converted to "." #6684

ozak commented Mar 21, 2014

jreback commented Mar 21, 2014

ozak commented Mar 21, 2014

bashtage commented Mar 21, 2014

jreback commented Mar 21, 2014

bashtage commented Mar 21, 2014

jreback commented Mar 21, 2014

bashtage commented Mar 21, 2014

ozak commented Mar 22, 2014

bashtage commented Mar 23, 2014

Bug: Export to Stata NaN not converted to "." #6684

Bug: Export to Stata NaN not converted to "." #6684

Comments

ozak commented Mar 21, 2014

jreback commented Mar 21, 2014

ozak commented Mar 21, 2014

bashtage commented Mar 21, 2014

jreback commented Mar 21, 2014

bashtage commented Mar 21, 2014

jreback commented Mar 21, 2014

bashtage commented Mar 21, 2014

ozak commented Mar 22, 2014

bashtage commented Mar 23, 2014