@@ -3779,15 +3779,15 @@ into a .dta file. The format version of this file is always 115 (Stata 12).
3779
3779
df = DataFrame(randn(10 , 2 ), columns = list (' AB' ))
3780
3780
df.to_stata(' stata.dta' )
3781
3781
3782
- *Stata * data files have limited data type support; only strings with 244 or
3783
- fewer characters, ``int8 ``, ``int16 ``, ``int32 ``, ``float32` and ``float64 ``
3784
- can be stored
3785
- in `` .dta `` files. Additionally, *Stata * reserves certain values to represent
3786
- missing data. Exporting a non-missing value that is outside of the
3787
- permitted range in Stata for a particular data type will retype the variable
3788
- to the next larger size. For example, ``int8 `` values are restricted to lie
3789
- between -127 and 100 in Stata, and so variables with values above 100 will
3790
- trigger a conversion to ``int16 ``. ``nan `` values in floating points data
3782
+ *Stata * data files have limited data type support; only strings with
3783
+ 244 or fewer characters, ``int8 ``, ``int16 ``, ``int32 ``, ``float32 ``
3784
+ and `` float64 `` can be stored in `` .dta `` files. Additionally,
3785
+ *Stata * reserves certain values to represent missing data. Exporting a
3786
+ non-missing value that is outside of the permitted range in Stata for
3787
+ a particular data type will retype the variable to the next larger
3788
+ size. For example, ``int8 `` values are restricted to lie between -127
3789
+ and 100 in Stata, and so variables with values above 100 will trigger
3790
+ a conversion to ``int16 ``. ``nan `` values in floating points data
3791
3791
types are stored as the basic missing data type (``. `` in *Stata *).
3792
3792
3793
3793
.. note ::
@@ -3810,7 +3810,7 @@ outside of this range, the variable is cast to ``int16``.
3810
3810
3811
3811
.. warning ::
3812
3812
3813
- :class: `~pandas.io.stata.StataWriter` ` and
3813
+ :class: `~pandas.io.stata.StataWriter ` and
3814
3814
:func: `~pandas.core.frame.DataFrame.to_stata ` only support fixed width
3815
3815
strings containing up to 244 characters, a limitation imposed by the version
3816
3816
115 dta file format. Attempting to write *Stata * dta files with strings
@@ -3836,9 +3836,11 @@ Specifying a ``chunksize`` yields a
3836
3836
read ``chunksize `` lines from the file at a time. The ``StataReader ``
3837
3837
object can be used as an iterator.
3838
3838
3839
- reader = pd.read_stata('stata.dta', chunksize=1000)
3840
- for df in reader:
3841
- do_something(df)
3839
+ .. ipython :: python
3840
+
3841
+ reader = pd.read_stata(' stata.dta' , chunksize = 3 )
3842
+ for df in reader:
3843
+ print (df.shape)
3842
3844
3843
3845
For more fine-grained control, use ``iterator=True `` and specify
3844
3846
``chunksize `` with each call to
@@ -3847,8 +3849,8 @@ For more fine-grained control, use ``iterator=True`` and specify
3847
3849
.. ipython :: python
3848
3850
3849
3851
reader = pd.read_stata(' stata.dta' , iterator = True )
3850
- chunk1 = reader.read(10 )
3851
- chunk2 = reader.read(20 )
3852
+ chunk1 = reader.read(5 )
3853
+ chunk2 = reader.read(5 )
3852
3854
3853
3855
Currently the ``index `` is retrieved as a column.
3854
3856
@@ -3861,15 +3863,15 @@ The parameter ``convert_missing`` indicates whether missing value
3861
3863
representations in Stata should be preserved. If ``False `` (the default),
3862
3864
missing values are represented as ``np.nan ``. If ``True ``, missing values are
3863
3865
represented using ``StataMissingValue `` objects, and columns containing missing
3864
- values will have ``` object `` data type.
3866
+ values will have ``object `` data type.
3865
3867
3866
3868
:func: `~pandas.read_stata ` and :class: `~pandas.io.stata.StataReader ` supports .dta
3867
3869
formats 104, 105, 108, 113-115 (Stata 10-12) and 117 (Stata 13+).
3868
3870
3869
3871
.. note ::
3870
3872
3871
3873
Setting ``preserve_dtypes=False `` will upcast to the standard pandas data types:
3872
- ``int64 `` for all integer types and ``float64 `` for floating poitn data. By default,
3874
+ ``int64 `` for all integer types and ``float64 `` for floating point data. By default,
3873
3875
the Stata data types are preserved when importing.
3874
3876
3875
3877
.. ipython :: python
0 commit comments