ENH: Add class to write dta format 117 files #20844

bashtage · 2018-04-27T14:52:28Z

Add export for dta 117 files which add support for long strings
Refactor StataWriter to simplify new writer

closes #16450

closes to_stata: Fixed width strings in Stata .dta files are limited to 244 (or fewer) #16450
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

pep8speaks · 2018-04-27T14:52:32Z

Hello @bashtage! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on May 01, 2018 at 18:57 Hours UTC

codecov · 2018-04-27T17:49:18Z

Codecov Report

❗ No coverage uploaded for pull request base (master@ade293d). Click here to learn what that means.
The diff coverage is 80%.

@@            Coverage Diff            @@
##             master   #20844   +/-   ##
=========================================
  Coverage          ?   91.78%           
=========================================
  Files             ?      153           
  Lines             ?    49349           
  Branches          ?        0           
=========================================
  Hits              ?    45295           
  Misses            ?     4054           
  Partials          ?        0

Flag	Coverage Δ
#multiple	`90.17% <80%> (?)`
#single	`41.93% <0%> (?)`

Impacted Files	Coverage Δ
pandas/core/frame.py	`97.13% <80%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ade293d...2d54ded. Read the comment docs.

bashtage · 2018-04-29T09:38:46Z

Is it intentional that stata.py is not covered? Seems a bit strange.

jreback · 2018-04-29T23:59:41Z

Is it intentional that stata.py is not covered? Seems a bit strange.

what does this mean? the coverage that prints on the issue is odd (and has always been that way), but you can click thru and see the actual reports.

jreback · 2018-04-29T23:55:47Z

pandas/core/frame.py

+
+        convert_strl : list, optional
+            List of column names to convert to string columns to Stata StrL
+            format. Only available if version is 117.  Storign strings in the


jreback · 2018-04-30T00:00:47Z

pandas/core/frame.py

@@ -1801,6 +1802,23 @@ def to_stata(self, fname, convert_dates=None, write_index=True,

            .. versionadded:: 0.19.0

+        version : {114, 117}
+            dta version to use in the output file.  Version 114 can be used
+            read by Stata 10 and later.  Version 117 can be read by Stata 13


should we just support version 117 and forward? how old are these respective versions?

The old version is supported from 2007+. The new version is 2013+. If it was only possible to support 1 version I would stick with the old version since compatibility is more important than features for an export format IMO.

The biggest advantage of this PR is that it provides a stepping stone to supporting future export formats which have more useful features for full compatibility with pandas, like unicode support.

how much simplfication do we get from only supporting new format? (2013 is pretty 'old' though)

Would probably drop ~100-200 lines that have been are overridden by the methods in the StataWriter117 class from the base StataWriter class. The formats are fairly similar in terms of how the binary blob parts are stored and so this is all shared.

bashtage · 2018-04-30T07:19:17Z

what does this mean? the coverage that prints on the issue is odd (and has always been that way), but you can click thru and see the actual reports.

For some reasn it isn't showing up in codecov at all.

See

https://codecov.io/gh/pandas-dev/pandas/pull/20844/tree/pandas/io

for coverage fo this PR. stata.py is missng as are other files in io.py.

The master coverage is also wrong.

https://codecov.io/gh/pandas-dev/pandas/tree/master/pandas/io

Might be a bug in codecov.

excel, pickle, stata, and sql are all missing.

Add export for dta 117 files which add support for long strings Refactor StataWriter to simplify new writer closes pandas-dev#16450

Fix typo Enhance compliance of related docstrings usign validator

Fix incorrect skipping in strl writer Fix incorrect byteorder when exporting bigendian Fix incorrect byteorder parsing when importing bigendian Improve test coverage for errors

jreback

minor comments. Might consiser renaming StataWriter to StataWriter114 (and essentially make it private). Otherwise lgtm.

jreback · 2018-05-01T10:33:22Z

pandas/io/stata.py

+        # byteorder
+        bio.write(self._tag(byteorder == ">" and "MSF" or "LSF", 'byteorder'))
+        # number of vars, 2 bytes
+        assert self.nvar < 2 ** 16


maybe add a blank line before comments, easier to read

jreback · 2018-05-01T10:33:38Z

pandas/tests/io/test_stata.py


 import numpy as np
+import pytest
+from pandas._libs.tslib import NaT


import from pandas

bashtage · 2018-05-01T10:47:02Z

minor comments. Might consiser renaming StataWriter to StataWriter114 (and essentially make it private). Otherwise lgtm.

I think this would need a dep cycle since StataWriter has been public. Can do this if you want.

Fix import location and add whitespace

bashtage · 2018-05-01T14:24:26Z

@jreback Green and ready.

TomAugspurger · 2018-05-01T14:28:58Z

pandas/core/frame.py

@@ -1801,6 +1802,23 @@ def to_stata(self, fname, convert_dates=None, write_index=True,

            .. versionadded:: 0.19.0

+        version : {114, 117}


Hmm, should these be strings? Is this exposed anywhere in stats itself? Do they use integers? (when I see version number, I think string).

Huh, https://www.stata.com/support/faqs/data-management/save-for-previous-version/ seems to suggest that stata uses integers? version(13). OK then, let's follow that.

Could use 10 and 13 which are the Stata release versions.

TomAugspurger

Couple small comments. OK to do as a followup (I'm tagging the RC momentarily.)

TomAugspurger · 2018-05-01T17:53:44Z

pandas/core/frame.py

-        writer = StataWriter(fname, self, convert_dates=convert_dates,
+        kwargs = {}
+        if version not in (114, 117):
+            raise ValueError('Only formats 114 and 117 supported.')


Would be nice to include the user passed version in the error message.

I can push in a little bit.

TomAugspurger · 2018-05-01T17:59:13Z

pandas/tests/io/test_stata.py

+                             columns=['long1' * 10, 'long', 1])
+        original.index.name = 'index'
+
+        with warnings.catch_warnings(record=True) as w:  # noqa


What warnings are you catching here?

An invalid name. The writer mungs the name to comply with Stata rules and issues a warning.

TomAugspurger · 2018-05-01T18:02:11Z

@jreback are you OK with merging this as is? I'll fix the git conflict on merge.

jreback · 2018-05-01T18:48:37Z

yep this is fine
if we need follow ups after RC that is fine

TomAugspurger · 2018-05-01T18:55:52Z

K. @bashtage I'm going to fix the conflict and then merge. A followup PR would be welcome.

…port

bashtage force-pushed the strl-support branch from 9717de4 to d3e7634 Compare April 28, 2018 11:56

bashtage changed the title ~~ENH: Add class to write da format 117 files~~ ENH: Add class to write dta format 117 files Apr 28, 2018

bashtage force-pushed the strl-support branch 4 times, most recently from 927beda to e87e64c Compare April 29, 2018 09:09

jreback added the IO Stata read_stata, to_stata label Apr 29, 2018

jreback requested changes Apr 30, 2018

View reviewed changes

bashtage force-pushed the strl-support branch 5 times, most recently from 13d2897 to d13e32d Compare May 1, 2018 06:58

bashtage added 2 commits May 1, 2018 08:33

ENH: Add class to write dta format 117 files

d54541a

Add export for dta 117 files which add support for long strings Refactor StataWriter to simplify new writer closes pandas-dev#16450

DOC: Clean up doc strings

900c9f7

Fix typo Enhance compliance of related docstrings usign validator

bashtage force-pushed the strl-support branch from d13e32d to 831c9eb Compare May 1, 2018 07:33

BUG: Fix bugs in stata

a5f1653

Fix incorrect skipping in strl writer Fix incorrect byteorder when exporting bigendian Fix incorrect byteorder parsing when importing bigendian Improve test coverage for errors

bashtage force-pushed the strl-support branch from 831c9eb to a5f1653 Compare May 1, 2018 08:54

jreback requested changes May 1, 2018

View reviewed changes

jreback added this to the 0.23.0 milestone May 1, 2018

jreback added the Enhancement label May 1, 2018

CLN: Fix imports

4397ae7

Fix import location and add whitespace

TomAugspurger mentioned this pull request May 1, 2018

RLS: 0.23.0 #20531

Closed

71 tasks

TomAugspurger reviewed May 1, 2018

View reviewed changes

TomAugspurger approved these changes May 1, 2018

View reviewed changes

Merge remote-tracking branch 'upstream/master' into bashtage-strl-sup…

2d54ded

…port

TomAugspurger merged commit 93e7123 into pandas-dev:master May 1, 2018

bashtage deleted the strl-support branch June 2, 2018 10:33

		@@ -1801,6 +1802,23 @@ def to_stata(self, fname, convert_dates=None, write_index=True,

		.. versionadded:: 0.19.0

		version : {114, 117}

Uh oh!

ENH: Add class to write dta format 117 files #20844

ENH: Add class to write dta format 117 files #20844

Uh oh!

Conversation

bashtage commented Apr 27, 2018

Uh oh!

pep8speaks commented Apr 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Comment last updated on May 01, 2018 at 18:57 Hours UTC

Uh oh!

codecov bot commented Apr 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

bashtage commented Apr 29, 2018

Uh oh!

jreback commented Apr 29, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jreback Apr 30, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bashtage commented Apr 30, 2018

Uh oh!

jreback left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bashtage commented May 1, 2018

Uh oh!

bashtage commented May 1, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented May 1, 2018

Uh oh!

jreback commented May 1, 2018

Uh oh!

TomAugspurger commented May 1, 2018

Uh oh!

Uh oh!

pep8speaks commented Apr 27, 2018 •

edited

Loading

codecov bot commented Apr 27, 2018 •

edited

Loading

jreback Apr 30, 2018 •

edited

Loading