Skip to content

Commit d155623

Browse files
committed
CLN: added io.api for i/o importing functions
moved excel functionaility out of io.parsers to io.excel added read_excel top-level function aliases from pandas.io.excel added read_stata top-level function, to_stata DataFrame method aliases from pandas.io.stata removed read_dta (replace by read_stata) added read_sql top-level function, to_sql DataFrame method aliases from pandas.io.sql DOC: doc updates for all the above and intro section to io.rst
1 parent 09e47bc commit d155623

21 files changed

+663
-507
lines changed

RELEASE.rst

+6
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,7 @@ pandas 0.11.1
3535
GH3606_)
3636
- Support for reading Amazon S3 files. (GH3504_)
3737
- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
38+
includes ``to_stata`` DataFrame method, and a ``read_stata`` top-level reader
3839
- Added support for writing in ``to_csv`` and reading in ``read_csv``,
3940
multi-index columns. The ``header`` option in ``read_csv`` now accepts a
4041
list of the rows from which to read the index. Added the option,
@@ -104,6 +105,11 @@ pandas 0.11.1
104105
does not control triggering of summary, similar to < 0.11.0.
105106
- Add the keyword ``allow_duplicates`` to ``DataFrame.insert`` to allow a duplicate column
106107
to be inserted if ``True``, default is ``False`` (same as prior to 0.11.1) (GH3679_)
108+
- io API changes
109+
110+
- added ``pandas.io.api`` for i/o imports
111+
- removed ``Excel`` support to ``pandas.io.excel``
112+
- added top-level ``pd.read_sql`` and ``to_sql`` DataFrame methods
107113

108114
**Bug Fixes**
109115

doc/source/10min.rst

+1-2
Original file line numberDiff line numberDiff line change
@@ -699,8 +699,7 @@ Reading from an excel file
699699

700700
.. ipython:: python
701701
702-
xls = ExcelFile('foo.xlsx')
703-
xls.parse('sheet1', index_col=None, na_values=['NA'])
702+
read_excel('foo.xlsx', 'sheet1', index_col=None, na_values=['NA'])
704703
705704
.. ipython:: python
706705
:suppress:

doc/source/api.rst

+30-1
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,20 @@ File IO
4848

4949
read_table
5050
read_csv
51-
ExcelFile.parse
51+
52+
.. currentmodule:: pandas.io.excel
53+
54+
.. autosummary::
55+
:toctree: generated/
56+
57+
read_excel
58+
59+
.. currentmodule:: pandas.io.stata
60+
61+
.. autosummary::
62+
:toctree: generated/
63+
64+
read_stata
5265

5366
.. currentmodule:: pandas.io.html
5467

@@ -57,15 +70,29 @@ File IO
5770

5871
read_html
5972

73+
SQL
74+
~~~
75+
76+
.. currentmodule:: pandas.io.sql
77+
78+
.. autosummary::
79+
:toctree: generated/
80+
81+
read_sql
82+
6083
HDFStore: PyTables (HDF5)
6184
~~~~~~~~~~~~~~~~~~~~~~~~~
85+
6286
.. currentmodule:: pandas.io.pytables
6387

6488
.. autosummary::
6589
:toctree: generated/
6690

91+
read_hdf
6792
HDFStore.put
93+
HDFStore.append
6894
HDFStore.get
95+
HDFStore.select
6996

7097
Standard moving window functions
7198
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -532,9 +559,11 @@ Serialization / IO / Conversion
532559
DataFrame.load
533560
DataFrame.save
534561
DataFrame.to_csv
562+
DataFrame.to_hdf
535563
DataFrame.to_dict
536564
DataFrame.to_excel
537565
DataFrame.to_html
566+
DataFrame.to_stata
538567
DataFrame.to_records
539568
DataFrame.to_sparse
540569
DataFrame.to_string

doc/source/io.rst

+30-32
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
import csv
1010
from StringIO import StringIO
1111
import pandas as pd
12+
ExcelWriter = pd.ExcelWriter
1213
1314
import numpy as np
1415
np.random.seed(123456)
@@ -27,6 +28,18 @@
2728
IO Tools (Text, CSV, HDF5, ...)
2829
*******************************
2930

31+
The Pandas I/O api is a set of top level ``reader`` functions accessed like ``pd.read_csv()`` that generally return a ``pandas``
32+
object. The corresponding ``writer`` functions are object methods that are accessed like ``df.to_csv()``
33+
34+
.. csv-table::
35+
:widths: 12, 15, 15, 15, 15
36+
:delim: ;
37+
38+
Reader; ``read_csv``; ``read_excel``; ``read_hdf``; ``read_sql``
39+
Writer; ``to_csv``; ``to_excel``; ``to_hdf``; ``to_sql``
40+
Reader; ``read_html``; ``read_stata``; ``read_clipboard`` ;
41+
Writer; ``to_html``; ``to_stata``; ``to_clipboard`` ;
42+
3043
.. _io.read_csv_table:
3144

3245
CSV & Text files
@@ -971,44 +984,35 @@ And then import the data directly to a DataFrame by calling:
971984
Excel files
972985
-----------
973986

974-
The ``ExcelFile`` class can read an Excel 2003 file using the ``xlrd`` Python
987+
The ``read_excel`` method can read an Excel 2003 file using the ``xlrd`` Python
975988
module and use the same parsing code as the above to convert tabular data into
976989
a DataFrame. See the :ref:`cookbook<cookbook.excel>` for some
977990
advanced strategies
978991

979-
To use it, create the ``ExcelFile`` object:
980-
981-
.. code-block:: python
982-
983-
xls = ExcelFile('path_to_file.xls')
984-
985-
Then use the ``parse`` instance method with a sheetname, then use the same
986-
additional arguments as the parsers above:
987-
988992
.. code-block:: python
989993
990-
xls.parse('Sheet1', index_col=None, na_values=['NA'])
994+
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
991995
992996
To read sheets from an Excel 2007 file, you can pass a filename with a ``.xlsx``
993997
extension, in which case the ``openpyxl`` module will be used to read the file.
994998

995999
It is often the case that users will insert columns to do temporary computations
996-
in Excel and you may not want to read in those columns. `ExcelFile.parse` takes
1000+
in Excel and you may not want to read in those columns. `read_excel` takes
9971001
a `parse_cols` keyword to allow you to specify a subset of columns to parse.
9981002

9991003
If `parse_cols` is an integer, then it is assumed to indicate the last column
10001004
to be parsed.
10011005

10021006
.. code-block:: python
10031007
1004-
xls.parse('Sheet1', parse_cols=2, index_col=None, na_values=['NA'])
1008+
read_excel('path_to_file.xls', 'Sheet1', parse_cols=2, index_col=None, na_values=['NA'])
10051009
10061010
If `parse_cols` is a list of integers, then it is assumed to be the file column
10071011
indices to be parsed.
10081012

10091013
.. code-block:: python
10101014
1011-
xls.parse('Sheet1', parse_cols=[0, 2, 3], index_col=None, na_values=['NA'])
1015+
read_excel('path_to_file.xls', Sheet1', parse_cols=[0, 2, 3], index_col=None, na_values=['NA'])
10121016
10131017
To write a DataFrame object to a sheet of an Excel file, you can use the
10141018
``to_excel`` instance method. The arguments are largely the same as ``to_csv``
@@ -1883,16 +1887,13 @@ Writing to STATA format
18831887
18841888
.. _io.StataWriter:
18851889
1886-
The function :func:'~pandas.io.StataWriter.write_file' will write a DataFrame
1887-
into a .dta file. The format version of this file is always the latest one,
1888-
115.
1890+
The method ``to_stata`` will write a DataFrame into a .dta file.
1891+
The format version of this file is always the latest one, 115.
18891892
18901893
.. ipython:: python
18911894
1892-
from pandas.io.stata import StataWriter
18931895
df = DataFrame(randn(10,2),columns=list('AB'))
1894-
writer = StataWriter('stata.dta',df)
1895-
writer.write_file()
1896+
df.to_stata('stata.dta')
18961897
18971898
Reading from STATA format
18981899
~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -1901,24 +1902,21 @@ Reading from STATA format
19011902
19021903
.. versionadded:: 0.11.1
19031904
1904-
The class StataReader will read the header of the given dta file at
1905-
initialization. Its function :func:'~pandas.io.StataReader.data' will
1906-
read the observations, converting them to a DataFrame which is returned:
1905+
The top-level function ``read_stata`` will read a dta format file
1906+
and return a DataFrame:
19071907
19081908
.. ipython:: python
19091909
1910-
from pandas.io.stata import StataReader
1911-
reader = StataReader('stata.dta')
1912-
reader.data()
1910+
pd.read_stata('stata.dta')
19131911
1914-
The parameter convert_categoricals indicates wheter value labels should be
1915-
read and used to create a Categorical variable from them. Value labels can
1916-
also be retrieved by the function variable_labels, which requires data to be
1917-
called before.
1912+
Currently the ``index`` is retrieved as a column on read back.
19181913
1919-
The StataReader supports .dta Formats 104, 105, 108, 113-115.
1914+
The parameter ``convert_categoricals`` indicates wheter value labels should be
1915+
read and used to create a ``Categorical`` variable from them. Value labels can
1916+
also be retrieved by the function ``variable_labels``, which requires data to be
1917+
called before (see ``pandas.io.stata.StataReader``).
19201918
1921-
Alternatively, the function :func:'~pandas.io.read_stata' can be used
1919+
The StataReader supports .dta Formats 104, 105, 108, 113-115.
19221920
19231921
.. ipython:: python
19241922
:suppress:

doc/source/v0.10.0.txt

+5
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
.. _whatsnew_0100:
22

3+
.. ipython:: python
4+
:suppress:
5+
6+
from StringIO import StringIO
7+
38
v0.10.0 (December 17, 2012)
49
---------------------------
510

doc/source/v0.11.1.txt

+40-2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,19 @@ v0.11.1 (??)
66
This is a minor release from 0.11.0 and includes several new features and
77
enhancements along with a large number of bug fixes.
88

9+
The I/O api is now much more consistent with the following top-level reading
10+
functions available, e.g. ``pd.read_csv``, and the counterpart writers are
11+
available as object methods, e.g. ``df.to_csv``
12+
13+
.. csv-table::
14+
:widths: 12, 15, 15, 15, 15
15+
:delim: ;
16+
17+
Reader; ``read_csv``; ``read_excel``; ``read_hdf``; ``read_sql``
18+
Writer; ``to_csv``; ``to_excel``; ``to_hdf``; ``to_sql``
19+
Reader; ``read_html``; ``read_stata``; ``read_clipboard`` ;
20+
Writer; ``to_html``; ``to_stata``; ``to_clipboard`` ;
21+
922
API changes
1023
~~~~~~~~~~~
1124

@@ -74,6 +87,29 @@ API changes
7487
- Add the keyword ``allow_duplicates`` to ``DataFrame.insert`` to allow a duplicate column
7588
to be inserted if ``True``, default is ``False`` (same as prior to 0.11.1) (GH3679_)
7689

90+
- IO api
91+
92+
- added top-level function ``read_excel`` to replace the following,
93+
however, the original API remains as well
94+
95+
.. code-block:: python
96+
97+
xls = ExcelFile('path_to_file.xls')
98+
xls.parse('Sheet1', index_col=None, na_values=['NA'])
99+
100+
With
101+
102+
.. code-block:: python
103+
104+
read_excel('path_to_file.xls', 'Sheet1', index_col=None, na_values=['NA'])
105+
106+
- added top-level function ``read_sql`` that is equivalent to the following
107+
108+
.. code-block:: python
109+
110+
from pandas.io.sql import read_frame
111+
read_frame(....)
112+
77113
Enhancements
78114
~~~~~~~~~~~~
79115

@@ -109,6 +145,8 @@ Enhancements
109145
a list or tuple.
110146

111147
- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
148+
accessable via ``read_stata`` top-level function for reading,
149+
and ``to_stata`` DataFrame method for writing
112150

113151
- ``DataFrame.replace()`` now allows regular expressions on contained
114152
``Series`` with object dtype. See the examples section in the regular docs
@@ -218,7 +256,7 @@ Bug Fixes
218256
.. ipython :: python
219257

220258
df = DataFrame({'a': list('ab..'), 'b': [1, 2, 3, 4]})
221-
df.replace(regex=r'\s*\.\s*', value=nan)
259+
df.replace(regex=r'\s*\.\s*', value=np.nan)
222260

223261
to replace all occurrences of the string ``'.'`` with zero or more
224262
instances of surrounding whitespace with ``NaN``.
@@ -227,7 +265,7 @@ Bug Fixes
227265

228266
.. ipython :: python
229267

230-
df.replace('.', nan)
268+
df.replace('.', np.nan)
231269

232270
to replace all occurrences of the string ``'.'`` with ``NaN``.
233271

pandas/__init__.py

+1-5
Original file line numberDiff line numberDiff line change
@@ -28,12 +28,8 @@
2828
from pandas.sparse.api import *
2929
from pandas.stats.api import *
3030
from pandas.tseries.api import *
31+
from pandas.io.api import *
3132

32-
from pandas.io.parsers import (read_csv, read_table, read_clipboard,
33-
read_fwf, to_clipboard, ExcelFile,
34-
ExcelWriter)
35-
from pandas.io.pytables import HDFStore, Term, get_store, read_hdf
36-
from pandas.io.html import read_html
3733
from pandas.util.testing import debug
3834

3935
from pandas.tools.describe import value_range

0 commit comments

Comments
 (0)