@@ -35,7 +35,7 @@ CSV & Text files
35
35
The two workhorse functions for reading text files (a.k.a. flat files) are
36
36
:func: `~pandas.io.parsers.read_csv ` and :func: `~pandas.io.parsers.read_table `.
37
37
They both use the same parsing code to intelligently convert tabular
38
- data into a DataFrame object. See the :ref: `cookbook<cookbook.csv> `
38
+ data into a DataFrame object. See the :ref: `cookbook<cookbook.csv> `
39
39
for some advanced strategies
40
40
41
41
They can take a number of arguments:
@@ -917,7 +917,7 @@ Excel files
917
917
918
918
The ``ExcelFile `` class can read an Excel 2003 file using the ``xlrd `` Python
919
919
module and use the same parsing code as the above to convert tabular data into
920
- a DataFrame. See the :ref: `cookbook<cookbook.excel> ` for some
920
+ a DataFrame. See the :ref: `cookbook<cookbook.excel> ` for some
921
921
advanced strategies
922
922
923
923
To use it, create the ``ExcelFile `` object:
@@ -1248,9 +1248,8 @@ greater than the date 20000102 and the minor_axis must be A or B`
1248
1248
store
1249
1249
store.select(' wp' , [ Term(' major_axis>20000102' ), Term(' minor_axis' , ' =' , [' A' , ' B' ]) ])
1250
1250
1251
- The ``columns `` keyword can be supplied to select to filter a list of
1252
- the return columns, this is equivalent to passing a
1253
- ``Term('columns', list_of_columns_to_filter) ``
1251
+ The ``columns `` keyword can be supplied to select a list of columns to be returned,
1252
+ this is equivalent to passing a ``Term('columns', list_of_columns_to_filter) ``:
1254
1253
1255
1254
.. ipython :: python
1256
1255
@@ -1323,7 +1322,7 @@ be data_columns
1323
1322
# this is in-memory version of this type of selection
1324
1323
df_dc[(df_dc.B > 0 ) & (df_dc.C > 0 ) & (df_dc.string == ' foo' )]
1325
1324
1326
- # we have automagically created this index and that the B/C/string/string2
1325
+ # we have automagically created this index and the B/C/string/string2
1327
1326
# columns are stored separately as ``PyTables`` columns
1328
1327
store.root.df_dc.table
1329
1328
@@ -1395,9 +1394,9 @@ New in 0.10.1 are the methods ``append_to_multple`` and
1395
1394
``select_as_multiple ``, that can perform appending/selecting from
1396
1395
multiple tables at once. The idea is to have one table (call it the
1397
1396
selector table) that you index most/all of the columns, and perform your
1398
- queries. The other table(s) are data tables that are indexed the same as
1399
- the selector table. You can then perform a very fast query on the
1400
- selector table, yet get lots of data back. This method works similar to
1397
+ queries. The other table(s) are data tables with an index matching the
1398
+ selector table's index . You can then perform a very fast query
1399
+ on the selector table, yet get lots of data back. This method works similar to
1401
1400
having a very wide table, but is more efficient in terms of queries.
1402
1401
1403
1402
Note, **THE USER IS RESPONSIBLE FOR SYNCHRONIZING THE TABLES **. This
@@ -1437,8 +1436,8 @@ deleting rows, it is important to understand the ``PyTables`` deletes
1437
1436
rows by erasing the rows, then **moving ** the following data. Thus
1438
1437
deleting can potentially be a very expensive operation depending on the
1439
1438
orientation of your data. This is especially true in higher dimensional
1440
- objects (``Panel `` and ``Panel4D ``). To get optimal deletion speed , it
1441
- pays to have the dimension you are deleting be the first of the
1439
+ objects (``Panel `` and ``Panel4D ``). To get optimal performance , it's
1440
+ worthwhile to have the dimension you are deleting be the first of the
1442
1441
``indexables ``.
1443
1442
1444
1443
Data is ordered (on the disk) in terms of the ``indexables ``. Here's a
@@ -1500,8 +1499,8 @@ off file compression for a specific table by passing ``complevel=0``
1500
1499
1501
1500
**ptrepack **
1502
1501
1503
- ``PyTables `` offer better write performance when compressed after
1504
- writing them , as opposed to turning on compression at the very
1502
+ ``PyTables `` offers better write performance when tables are compressed after
1503
+ they are written , as opposed to turning on compression at the very
1505
1504
beginning. You can use the supplied ``PyTables `` utility
1506
1505
``ptrepack ``. In addition, ``ptrepack `` can change compression levels
1507
1506
after the fact.
@@ -1615,10 +1614,10 @@ format store like this:
1615
1614
Backwards Compatibility
1616
1615
~~~~~~~~~~~~~~~~~~~~~~~
1617
1616
1618
- 0.10.1 of ``HDFStore `` is backwards compatible for reading tables
1619
- created in a prior version of pandas however, query terms using the
1617
+ 0.10.1 of ``HDFStore `` can read tables created in a prior version of pandas,
1618
+ however query terms using the
1620
1619
prior (undocumented) methodology are unsupported. ``HDFStore `` will
1621
- issue a warning if you try to use a prior-version format file. You must
1620
+ issue a warning if you try to use a legacy- format file. You must
1622
1621
read in the entire file and write it out using the new format, using the
1623
1622
method ``copy `` to take advantage of the updates. The group attribute
1624
1623
``pandas_version `` contains the version information. ``copy `` takes a
@@ -1658,11 +1657,11 @@ Performance
1658
1657
query (potentially very large amounts of data). Write times are
1659
1658
generally longer as compared with regular stores. Query times can
1660
1659
be quite fast, especially on an indexed axis.
1661
- - You can pass ``chunksize=an integer `` to ``append ``, to change the
1662
- writing chunksize (default is 50000). This will signficantly lower
1660
+ - You can pass ``chunksize=<int> `` to ``append ``, specifying the
1661
+ write chunksize (default is 50000). This will signficantly lower
1663
1662
your memory usage on writing.
1664
- - You can pass ``expectedrows=an integer `` to the first ``append ``,
1665
- to set the TOTAL number of expectedrows that ``PyTables `` will
1663
+ - You can pass ``expectedrows=<int> `` to the first ``append ``,
1664
+ to set the TOTAL number of expected rows that ``PyTables `` will
1666
1665
expected. This will optimize read/write performance.
1667
1666
- Duplicate rows can be written to tables, but are filtered out in
1668
1667
selection (with the last items being selected; thus a table is
@@ -1688,7 +1687,7 @@ HDFStore supports ``Panel4D`` storage.
1688
1687
These, by default, index the three axes ``items, major_axis,
1689
1688
minor_axis ``. On an ``AppendableTable `` it is possible to setup with the
1690
1689
first append a different indexing scheme, depending on how you want to
1691
- store your data. Pass the ``axes `` keyword with a list of dimension
1690
+ store your data. Pass the ``axes `` keyword with a list of dimensions
1692
1691
(currently must by exactly 1 less than the total dimensions of the
1693
1692
object). This cannot be changed after table creation.
1694
1693
@@ -1712,12 +1711,14 @@ SQL Queries
1712
1711
-----------
1713
1712
1714
1713
The :mod: `pandas.io.sql ` module provides a collection of query wrappers to both
1715
- facilitate data retrieval and to reduce dependency on DB-specific API. There
1714
+ facilitate data retrieval and to reduce dependency on DB-specific API. These
1716
1715
wrappers only support the Python database adapters which respect the `Python
1717
- DB-API <http://www.python.org/dev/peps/pep-0249/> `_. See some
1716
+ DB-API <http://www.python.org/dev/peps/pep-0249/> `_. See some
1718
1717
:ref: `cookbook examples <cookbook.sql >` for some advanced strategies
1719
1718
1720
- Suppose you want to query some data with different types from a table such as:
1719
+ For example, suppose you want to query some data with different types from a
1720
+ table such as:
1721
+
1721
1722
1722
1723
+-----+------------+-------+-------+-------+
1723
1724
| id | Date | Col_1 | Col_2 | Col_3 |
@@ -1729,8 +1730,9 @@ Suppose you want to query some data with different types from a table such as:
1729
1730
| 63 | 2012-10-20 | Z | 5.73 | True |
1730
1731
+-----+------------+-------+-------+-------+
1731
1732
1733
+
1732
1734
Functions from :mod: `pandas.io.sql ` can extract some data into a DataFrame. In
1733
- the following example, we use `SQlite <http://www.sqlite.org/ >`_ SQL database
1735
+ the following example, we use the `SQlite <http://www.sqlite.org/ >`_ SQL database
1734
1736
engine. You can use a temporary SQLite database where data are stored in
1735
1737
"memory". Just do:
1736
1738
@@ -1779,7 +1781,7 @@ You can also specify the name of the column as the DataFrame index:
1779
1781
sql.read_frame(" SELECT * FROM data;" , cnx, index_col = ' id' )
1780
1782
sql.read_frame(" SELECT * FROM data;" , cnx, index_col = ' date' )
1781
1783
1782
- Of course, you can specify more "complex" query.
1784
+ Of course, you can specify a more "complex" query.
1783
1785
1784
1786
.. ipython :: python
1785
1787
@@ -1794,8 +1796,8 @@ Of course, you can specify more "complex" query.
1794
1796
1795
1797
There are a few other available functions:
1796
1798
1797
- - ``tquery `` returns list of tuples corresponding to each row.
1798
- - ``uquery `` does the same thing as tquery, but instead of returning results,
1799
+ - ``tquery `` returns a list of tuples corresponding to each row.
1800
+ - ``uquery `` does the same thing as tquery, but instead of returning results
1799
1801
it returns the number of related rows.
1800
1802
- ``write_frame `` writes records stored in a DataFrame into the SQL table.
1801
1803
- ``has_table `` checks if a given SQLite table exists.
0 commit comments