Skip to content

Allow timestamp option for StataWriter.write_file() #6545

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bquistorff opened this issue Mar 4, 2014 · 4 comments · Fixed by #6553
Closed

Allow timestamp option for StataWriter.write_file() #6545

bquistorff opened this issue Mar 4, 2014 · 4 comments · Fixed by #6553
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Milestone

Comments

@bquistorff
Copy link
Contributor

This is a combined feature request & minor bug notice.
Feature Request: I would like to be able to write code that produces, byte-for-byte, reproducible outputs. To that end I want to write Stata dta files with a blank (or constant) timestamp. It would be nice to allow write_file() to accept a timestamp (or some option to zero it out).

Bug: In an attempt to do this myself, I made my own version of StataWriter.write_file() where the only difference is I call (underscore)write_header() internal function with a constant timestamp. But that produces the following bug.

import pandas as pd
import numpy as np
from pandas.io.stata import StataWriter
import datetime

df = pd.DataFrame(np.random.randn(6,4),index=list('abcdef'),columns=list('ABCD'))
writer = StataWriter('ouput.dta', df)
fktime_stamp = datetime.datetime.now()
writer._write_header(time_stamp=fktime_stamp)
# rest of write_file()

produces the following error

  File "C:\Program Files\Python27\lib\site-packages\pandas\io\stata.py", line 1057, in _write_header
    elif not isinstance(time_stamp, datetime):
TypeError: isinstance() arg 2 must be a class, type, or tuple of classes and types

My system details are.

>>> pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.2.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: AMD64 Family 16 Model 6 Stepping 3, AuthenticAMD
byteorder: little
LC_ALL: None
LANG: None

pandas: 0.13.1
Cython: None
numpy: 1.8.1
scipy: None
statsmodels: None
IPython: None
sphinx: None
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: None
bottleneck: None
tables: None
numexpr: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
@jreback
Copy link
Contributor

jreback commented Mar 4, 2014

going to merge #6335 shortly to fix some basic issues. then pls revist

@jseabold
Copy link
Contributor

jseabold commented Mar 5, 2014

It should be isinstance(..., datetime.datetime) for starters.

@bashtage
Copy link
Contributor

bashtage commented Mar 5, 2014

@jseabold hit the nail on the head.

@jreback I've put together a patch what will allow the time_stamp to be set from to_stata if there is any demand for this.

FWIW, this code is unreachable in normal use. There are a couple of other file properties that aren't exposed externally (e.g. an 80 character description string).

@bashtage
Copy link
Contributor

bashtage commented Mar 5, 2014

This was a small fix since I didn't have to refresh my memory, so I have submitted a PR.

@jreback jreback added this to the 0.14.0 milestone Mar 5, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants