Skip to content

Commit a3400a4

Browse files
jrebackjeffreystarr
authored andcommitted
DOC: add space considerations to IO methods in performance section
1 parent 0322852 commit a3400a4

File tree

1 file changed

+48
-9
lines changed

1 file changed

+48
-9
lines changed

doc/source/io.rst

+48-9
Original file line numberDiff line numberDiff line change
@@ -3442,6 +3442,16 @@ Performance Considerations
34423442

34433443
This is an informal comparison of various IO methods, using pandas 0.13.1.
34443444

3445+
.. code-block:: python
3446+
3447+
In [3]: df = DataFrame(randn(1000000,2),columns=list('AB'))
3448+
<class 'pandas.core.frame.DataFrame'>
3449+
Int64Index: 1000000 entries, 0 to 999999
3450+
Data columns (total 2 columns):
3451+
A 1000000 non-null values
3452+
B 1000000 non-null values
3453+
dtypes: float64(2)
3454+
34453455
34463456
Writing
34473457

@@ -3453,9 +3463,15 @@ Writing
34533463
In [15]: %timeit test_hdf_fixed_write(df)
34543464
1 loops, best of 3: 237 ms per loop
34553465
3466+
In [26]: %timeit test_hdf_fixed_write_compress(df)
3467+
1 loops, best of 3: 245 ms per loop
3468+
34563469
In [16]: %timeit test_hdf_table_write(df)
34573470
1 loops, best of 3: 901 ms per loop
34583471
3472+
In [27]: %timeit test_hdf_table_write_compress(df)
3473+
1 loops, best of 3: 952 ms per loop
3474+
34593475
In [17]: %timeit test_csv_write(df)
34603476
1 loops, best of 3: 3.44 s per loop
34613477
@@ -3469,12 +3485,29 @@ Reading
34693485
In [19]: %timeit test_hdf_fixed_read()
34703486
10 loops, best of 3: 19.1 ms per loop
34713487
3488+
In [28]: %timeit test_hdf_fixed_read_compress()
3489+
10 loops, best of 3: 36.3 ms per loop
3490+
34723491
In [20]: %timeit test_hdf_table_read()
34733492
10 loops, best of 3: 39 ms per loop
34743493
3494+
In [29]: %timeit test_hdf_table_read_compress()
3495+
10 loops, best of 3: 60.6 ms per loop
3496+
34753497
In [22]: %timeit test_csv_read()
34763498
1 loops, best of 3: 620 ms per loop
34773499
3500+
Space on disk (in bytes)
3501+
3502+
.. code-block:: python
3503+
3504+
25843712 Apr 8 14:11 test.sql
3505+
24007368 Apr 8 14:11 test_fixed.hdf
3506+
15580682 Apr 8 14:11 test_fixed_compress.hdf
3507+
24458444 Apr 8 14:11 test_table.hdf
3508+
16797283 Apr 8 14:11 test_table_compress.hdf
3509+
46152810 Apr 8 14:11 test.csv
3510+
34783511
And here's the code
34793512

34803513
.. code-block:: python
@@ -3483,13 +3516,7 @@ And here's the code
34833516
import os
34843517
from pandas.io import sql
34853518
3486-
In [3]: df = DataFrame(randn(1000000,2),columns=list('AB'))
3487-
<class 'pandas.core.frame.DataFrame'>
3488-
Int64Index: 1000000 entries, 0 to 999999
3489-
Data columns (total 2 columns):
3490-
A 1000000 non-null values
3491-
B 1000000 non-null values
3492-
dtypes: float64(2)
3519+
df = DataFrame(randn(1000000,2),columns=list('AB'))
34933520
34943521
def test_sql_write(df):
34953522
if os.path.exists('test.sql'):
@@ -3509,15 +3536,27 @@ And here's the code
35093536
def test_hdf_fixed_read():
35103537
pd.read_hdf('test_fixed.hdf','test')
35113538
3539+
def test_hdf_fixed_write_compress(df):
3540+
df.to_hdf('test_fixed_compress.hdf','test',mode='w',complib='blosc')
3541+
3542+
def test_hdf_fixed_read_compress():
3543+
pd.read_hdf('test_fixed_compress.hdf','test')
3544+
35123545
def test_hdf_table_write(df):
35133546
df.to_hdf('test_table.hdf','test',mode='w',format='table')
35143547
35153548
def test_hdf_table_read():
35163549
pd.read_hdf('test_table.hdf','test')
35173550
3518-
def test_csv_read():
3519-
pd.read_csv('test.csv',index_col=0)
3551+
def test_hdf_table_write_compress(df):
3552+
df.to_hdf('test_table_compress.hdf','test',mode='w',complib='blosc',format='table')
3553+
3554+
def test_hdf_table_read_compress():
3555+
pd.read_hdf('test_table_compress.hdf','test')
35203556
35213557
def test_csv_write(df):
35223558
df.to_csv('test.csv',mode='w')
35233559
3560+
def test_csv_read():
3561+
pd.read_csv('test.csv',index_col=0)
3562+

0 commit comments

Comments
 (0)