@@ -3821,22 +3821,41 @@ outside of this range, the variable is cast to ``int16``.
3821
3821
Reading from Stata format
3822
3822
~~~~~~~~~~~~~~~~~~~~~~~~~
3823
3823
3824
- The top-level function ``read_stata `` will read a dta files
3825
- and return a DataFrame. Alternatively, the class :class: `~pandas.io.stata.StataReader `
3826
- can be used if more granular access is required. :class: `~pandas.io.stata.StataReader `
3827
- reads the header of the dta file at initialization. The method
3828
- :func: `~pandas.io.stata.StataReader.data ` reads and converts observations to a DataFrame.
3824
+ The top-level function ``read_stata `` will read a dta file and return
3825
+ either a DataFrame or a :class: `~pandas.io.stata.StataReader ` that can
3826
+ be used to read the file incrementally.
3829
3827
3830
3828
.. ipython :: python
3831
3829
3832
3830
pd.read_stata(' stata.dta' )
3833
3831
3832
+ .. versionadded :: 0.16.0
3833
+
3834
+ Specifying a ``chunksize `` yields a
3835
+ :class: `~pandas.io.stata.StataReader ` instance that can be used to
3836
+ read ``chunksize `` lines from the file at a time. The ``StataReader ``
3837
+ object can be used as an iterator.
3838
+
3839
+ reader = pd.read_stata('stata.dta', chunksize=1000)
3840
+ for df in reader:
3841
+ do_something(df)
3842
+
3843
+ For more fine-grained control, use ``iterator=True `` and specify
3844
+ ``chunksize `` with each call to
3845
+ :func: `~pandas.io.stata.StataReader.read `.
3846
+
3847
+ .. ipython :: python
3848
+
3849
+ reader = pd.read_stata(' stata.dta' , iterator = True )
3850
+ chunk1 = reader.read(10 )
3851
+ chunk2 = reader.read(20 )
3852
+
3834
3853
Currently the ``index `` is retrieved as a column.
3835
3854
3836
3855
The parameter ``convert_categoricals `` indicates whether value labels should be
3837
3856
read and used to create a ``Categorical `` variable from them. Value labels can
3838
- also be retrieved by the function ``variable_labels ``, which requires data to be
3839
- called before use (see `` pandas.io.stata.StataReader ``) .
3857
+ also be retrieved by the function ``value_labels ``, which requires :func: ` ~pandas.io.stata.StataReader.read `
3858
+ to be called before use.
3840
3859
3841
3860
The parameter ``convert_missing `` indicates whether missing value
3842
3861
representations in Stata should be preserved. If ``False `` (the default),
0 commit comments