Skip to content

Commit b846abf

Browse files
committed
Merge pull request #3248 from y-p/doc_io_cleanups
DOC: io.rst cleanups
2 parents f021e07 + 6e1c95e commit b846abf

File tree

1 file changed

+30
-28
lines changed

1 file changed

+30
-28
lines changed

doc/source/io.rst

+30-28
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ CSV & Text files
3535
The two workhorse functions for reading text files (a.k.a. flat files) are
3636
:func:`~pandas.io.parsers.read_csv` and :func:`~pandas.io.parsers.read_table`.
3737
They both use the same parsing code to intelligently convert tabular
38-
data into a DataFrame object. See the :ref:`cookbook<cookbook.csv>`
38+
data into a DataFrame object. See the :ref:`cookbook<cookbook.csv>`
3939
for some advanced strategies
4040

4141
They can take a number of arguments:
@@ -917,7 +917,7 @@ Excel files
917917

918918
The ``ExcelFile`` class can read an Excel 2003 file using the ``xlrd`` Python
919919
module and use the same parsing code as the above to convert tabular data into
920-
a DataFrame. See the :ref:`cookbook<cookbook.excel>` for some
920+
a DataFrame. See the :ref:`cookbook<cookbook.excel>` for some
921921
advanced strategies
922922

923923
To use it, create the ``ExcelFile`` object:
@@ -1248,9 +1248,8 @@ greater than the date 20000102 and the minor_axis must be A or B`
12481248
store
12491249
store.select('wp', [ Term('major_axis>20000102'), Term('minor_axis', '=', ['A', 'B']) ])
12501250
1251-
The ``columns`` keyword can be supplied to select to filter a list of
1252-
the return columns, this is equivalent to passing a
1253-
``Term('columns', list_of_columns_to_filter)``
1251+
The ``columns`` keyword can be supplied to select a list of columns to be returned,
1252+
this is equivalent to passing a ``Term('columns', list_of_columns_to_filter)``:
12541253

12551254
.. ipython:: python
12561255
@@ -1323,7 +1322,7 @@ be data_columns
13231322
# this is in-memory version of this type of selection
13241323
df_dc[(df_dc.B > 0) & (df_dc.C > 0) & (df_dc.string == 'foo')]
13251324
1326-
# we have automagically created this index and that the B/C/string/string2
1325+
# we have automagically created this index and the B/C/string/string2
13271326
# columns are stored separately as ``PyTables`` columns
13281327
store.root.df_dc.table
13291328
@@ -1395,9 +1394,9 @@ New in 0.10.1 are the methods ``append_to_multple`` and
13951394
``select_as_multiple``, that can perform appending/selecting from
13961395
multiple tables at once. The idea is to have one table (call it the
13971396
selector table) that you index most/all of the columns, and perform your
1398-
queries. The other table(s) are data tables that are indexed the same as
1399-
the selector table. You can then perform a very fast query on the
1400-
selector table, yet get lots of data back. This method works similar to
1397+
queries. The other table(s) are data tables with an index matching the
1398+
selector table's index. You can then perform a very fast query
1399+
on the selector table, yet get lots of data back. This method works similar to
14011400
having a very wide table, but is more efficient in terms of queries.
14021401

14031402
Note, **THE USER IS RESPONSIBLE FOR SYNCHRONIZING THE TABLES**. This
@@ -1437,8 +1436,8 @@ deleting rows, it is important to understand the ``PyTables`` deletes
14371436
rows by erasing the rows, then **moving** the following data. Thus
14381437
deleting can potentially be a very expensive operation depending on the
14391438
orientation of your data. This is especially true in higher dimensional
1440-
objects (``Panel`` and ``Panel4D``). To get optimal deletion speed, it
1441-
pays to have the dimension you are deleting be the first of the
1439+
objects (``Panel`` and ``Panel4D``). To get optimal performance, it's
1440+
worthwhile to have the dimension you are deleting be the first of the
14421441
``indexables``.
14431442

14441443
Data is ordered (on the disk) in terms of the ``indexables``. Here's a
@@ -1500,8 +1499,8 @@ off file compression for a specific table by passing ``complevel=0``
15001499

15011500
**ptrepack**
15021501

1503-
``PyTables`` offer better write performance when compressed after
1504-
writing them, as opposed to turning on compression at the very
1502+
``PyTables`` offers better write performance when tables are compressed after
1503+
they are written, as opposed to turning on compression at the very
15051504
beginning. You can use the supplied ``PyTables`` utility
15061505
``ptrepack``. In addition, ``ptrepack`` can change compression levels
15071506
after the fact.
@@ -1615,10 +1614,10 @@ format store like this:
16151614
Backwards Compatibility
16161615
~~~~~~~~~~~~~~~~~~~~~~~
16171616

1618-
0.10.1 of ``HDFStore`` is backwards compatible for reading tables
1619-
created in a prior version of pandas however, query terms using the
1617+
0.10.1 of ``HDFStore`` can read tables created in a prior version of pandas,
1618+
however query terms using the
16201619
prior (undocumented) methodology are unsupported. ``HDFStore`` will
1621-
issue a warning if you try to use a prior-version format file. You must
1620+
issue a warning if you try to use a legacy-format file. You must
16221621
read in the entire file and write it out using the new format, using the
16231622
method ``copy`` to take advantage of the updates. The group attribute
16241623
``pandas_version`` contains the version information. ``copy`` takes a
@@ -1658,11 +1657,11 @@ Performance
16581657
query (potentially very large amounts of data). Write times are
16591658
generally longer as compared with regular stores. Query times can
16601659
be quite fast, especially on an indexed axis.
1661-
- You can pass ``chunksize=an integer`` to ``append``, to change the
1662-
writing chunksize (default is 50000). This will signficantly lower
1660+
- You can pass ``chunksize=<int>`` to ``append``, specifying the
1661+
write chunksize (default is 50000). This will signficantly lower
16631662
your memory usage on writing.
1664-
- You can pass ``expectedrows=an integer`` to the first ``append``,
1665-
to set the TOTAL number of expectedrows that ``PyTables`` will
1663+
- You can pass ``expectedrows=<int>`` to the first ``append``,
1664+
to set the TOTAL number of expected rows that ``PyTables`` will
16661665
expected. This will optimize read/write performance.
16671666
- Duplicate rows can be written to tables, but are filtered out in
16681667
selection (with the last items being selected; thus a table is
@@ -1688,7 +1687,7 @@ HDFStore supports ``Panel4D`` storage.
16881687
These, by default, index the three axes ``items, major_axis,
16891688
minor_axis``. On an ``AppendableTable`` it is possible to setup with the
16901689
first append a different indexing scheme, depending on how you want to
1691-
store your data. Pass the ``axes`` keyword with a list of dimension
1690+
store your data. Pass the ``axes`` keyword with a list of dimensions
16921691
(currently must by exactly 1 less than the total dimensions of the
16931692
object). This cannot be changed after table creation.
16941693

@@ -1712,12 +1711,14 @@ SQL Queries
17121711
-----------
17131712

17141713
The :mod:`pandas.io.sql` module provides a collection of query wrappers to both
1715-
facilitate data retrieval and to reduce dependency on DB-specific API. There
1714+
facilitate data retrieval and to reduce dependency on DB-specific API. These
17161715
wrappers only support the Python database adapters which respect the `Python
1717-
DB-API <http://www.python.org/dev/peps/pep-0249/>`_. See some
1716+
DB-API <http://www.python.org/dev/peps/pep-0249/>`_. See some
17181717
:ref:`cookbook examples <cookbook.sql>` for some advanced strategies
17191718

1720-
Suppose you want to query some data with different types from a table such as:
1719+
For example, suppose you want to query some data with different types from a
1720+
table such as:
1721+
17211722

17221723
+-----+------------+-------+-------+-------+
17231724
| id | Date | Col_1 | Col_2 | Col_3 |
@@ -1729,8 +1730,9 @@ Suppose you want to query some data with different types from a table such as:
17291730
| 63 | 2012-10-20 | Z | 5.73 | True |
17301731
+-----+------------+-------+-------+-------+
17311732

1733+
17321734
Functions from :mod:`pandas.io.sql` can extract some data into a DataFrame. In
1733-
the following example, we use `SQlite <http://www.sqlite.org/>`_ SQL database
1735+
the following example, we use the `SQlite <http://www.sqlite.org/>`_ SQL database
17341736
engine. You can use a temporary SQLite database where data are stored in
17351737
"memory". Just do:
17361738

@@ -1779,7 +1781,7 @@ You can also specify the name of the column as the DataFrame index:
17791781
sql.read_frame("SELECT * FROM data;", cnx, index_col='id')
17801782
sql.read_frame("SELECT * FROM data;", cnx, index_col='date')
17811783
1782-
Of course, you can specify more "complex" query.
1784+
Of course, you can specify a more "complex" query.
17831785

17841786
.. ipython:: python
17851787
@@ -1794,8 +1796,8 @@ Of course, you can specify more "complex" query.
17941796
17951797
There are a few other available functions:
17961798

1797-
- ``tquery`` returns list of tuples corresponding to each row.
1798-
- ``uquery`` does the same thing as tquery, but instead of returning results,
1799+
- ``tquery`` returns a list of tuples corresponding to each row.
1800+
- ``uquery`` does the same thing as tquery, but instead of returning results
17991801
it returns the number of related rows.
18001802
- ``write_frame`` writes records stored in a DataFrame into the SQL table.
18011803
- ``has_table`` checks if a given SQLite table exists.

0 commit comments

Comments
 (0)