Skip to content

Commit b3ff5eb

Browse files
committed
Merge pull request #9601 from kshedden/stata_doc
Fix several stata doc issues
2 parents 3039533 + 08297e6 commit b3ff5eb

File tree

1 file changed

+19
-17
lines changed

1 file changed

+19
-17
lines changed

doc/source/io.rst

+19-17
Original file line numberDiff line numberDiff line change
@@ -3779,15 +3779,15 @@ into a .dta file. The format version of this file is always 115 (Stata 12).
37793779
df = DataFrame(randn(10, 2), columns=list('AB'))
37803780
df.to_stata('stata.dta')
37813781
3782-
*Stata* data files have limited data type support; only strings with 244 or
3783-
fewer characters, ``int8``, ``int16``, ``int32``, ``float32` and ``float64``
3784-
can be stored
3785-
in ``.dta`` files. Additionally, *Stata* reserves certain values to represent
3786-
missing data. Exporting a non-missing value that is outside of the
3787-
permitted range in Stata for a particular data type will retype the variable
3788-
to the next larger size. For example, ``int8`` values are restricted to lie
3789-
between -127 and 100 in Stata, and so variables with values above 100 will
3790-
trigger a conversion to ``int16``. ``nan`` values in floating points data
3782+
*Stata* data files have limited data type support; only strings with
3783+
244 or fewer characters, ``int8``, ``int16``, ``int32``, ``float32``
3784+
and ``float64`` can be stored in ``.dta`` files. Additionally,
3785+
*Stata* reserves certain values to represent missing data. Exporting a
3786+
non-missing value that is outside of the permitted range in Stata for
3787+
a particular data type will retype the variable to the next larger
3788+
size. For example, ``int8`` values are restricted to lie between -127
3789+
and 100 in Stata, and so variables with values above 100 will trigger
3790+
a conversion to ``int16``. ``nan`` values in floating points data
37913791
types are stored as the basic missing data type (``.`` in *Stata*).
37923792

37933793
.. note::
@@ -3810,7 +3810,7 @@ outside of this range, the variable is cast to ``int16``.
38103810

38113811
.. warning::
38123812

3813-
:class:`~pandas.io.stata.StataWriter`` and
3813+
:class:`~pandas.io.stata.StataWriter` and
38143814
:func:`~pandas.core.frame.DataFrame.to_stata` only support fixed width
38153815
strings containing up to 244 characters, a limitation imposed by the version
38163816
115 dta file format. Attempting to write *Stata* dta files with strings
@@ -3836,9 +3836,11 @@ Specifying a ``chunksize`` yields a
38363836
read ``chunksize`` lines from the file at a time. The ``StataReader``
38373837
object can be used as an iterator.
38383838

3839-
reader = pd.read_stata('stata.dta', chunksize=1000)
3840-
for df in reader:
3841-
do_something(df)
3839+
.. ipython:: python
3840+
3841+
reader = pd.read_stata('stata.dta', chunksize=3)
3842+
for df in reader:
3843+
print(df.shape)
38423844
38433845
For more fine-grained control, use ``iterator=True`` and specify
38443846
``chunksize`` with each call to
@@ -3847,8 +3849,8 @@ For more fine-grained control, use ``iterator=True`` and specify
38473849
.. ipython:: python
38483850
38493851
reader = pd.read_stata('stata.dta', iterator=True)
3850-
chunk1 = reader.read(10)
3851-
chunk2 = reader.read(20)
3852+
chunk1 = reader.read(5)
3853+
chunk2 = reader.read(5)
38523854
38533855
Currently the ``index`` is retrieved as a column.
38543856

@@ -3861,15 +3863,15 @@ The parameter ``convert_missing`` indicates whether missing value
38613863
representations in Stata should be preserved. If ``False`` (the default),
38623864
missing values are represented as ``np.nan``. If ``True``, missing values are
38633865
represented using ``StataMissingValue`` objects, and columns containing missing
3864-
values will have ```object`` data type.
3866+
values will have ``object`` data type.
38653867

38663868
:func:`~pandas.read_stata` and :class:`~pandas.io.stata.StataReader` supports .dta
38673869
formats 104, 105, 108, 113-115 (Stata 10-12) and 117 (Stata 13+).
38683870

38693871
.. note::
38703872

38713873
Setting ``preserve_dtypes=False`` will upcast to the standard pandas data types:
3872-
``int64`` for all integer types and ``float64`` for floating poitn data. By default,
3874+
``int64`` for all integer types and ``float64`` for floating point data. By default,
38733875
the Stata data types are preserved when importing.
38743876

38753877
.. ipython:: python

0 commit comments

Comments
 (0)