Skip to content

Commit c96f31c

Browse files
committed
DOC: pytables iterator update
1 parent c9013b8 commit c96f31c

File tree

1 file changed

+21
-4
lines changed

1 file changed

+21
-4
lines changed

doc/source/io.rst

+21-4
Original file line numberDiff line numberDiff line change
@@ -2574,10 +2574,27 @@ The default is 50,000 rows returned in a chunk.
25742574
for df in read_hdf('store.h5','df', chunsize=3):
25752575
print(df)
25762576
2577-
Note, that the chunksize keyword applies to the **returned** rows. So if you
2578-
are doing a query, then that set will be subdivided and returned in the
2579-
iterator. Keep in mind that if you do not pass a ``where`` selection criteria
2580-
then the ``nrows`` of the table are considered.
2577+
Note, that the chunksize keyword applies to the **source** rows. So if you
2578+
are doing a query, then the chunksize will subdivide the total rows in the table
2579+
and the query applied, returning an iterator on potentially unequal sized chunks.
2580+
2581+
Here is a recipe for generating a query and using it to create equal sized return
2582+
chunks.
2583+
2584+
.. ipython:: python
2585+
2586+
dfeq = DataFrame({'number': np.arange(1,11)})
2587+
dfeq
2588+
2589+
store.append('dfeq', dfeq, data_columns=['number'])
2590+
2591+
def chunks(l, n):
2592+
return [l[i:i+n] for i in xrange(0, len(l), n)]
2593+
2594+
evens = [2,4,6,8,10]
2595+
coordinates = store.select_as_coordinates('dfeq','number=evens')
2596+
for c in chunks(coordinates, 2):
2597+
print store.select('dfeq',where=c)
25812598
25822599
Advanced Queries
25832600
~~~~~~~~~~~~~~~~

0 commit comments

Comments
 (0)