Skip to content

Commit e9c5fd2

Browse files
committed
docs update
1 parent d50e430 commit e9c5fd2

File tree

3 files changed

+68
-73
lines changed

3 files changed

+68
-73
lines changed

doc/source/io.rst

+38-31
Original file line numberDiff line numberDiff line change
@@ -2908,56 +2908,63 @@ any pickled pandas object (or any other pickled object) from file:
29082908
import os
29092909
os.remove('foo.pkl')
29102910
2911-
The ``to_pickle`` and ``read_pickle`` methods can read and write compressed pickle files.
2912-
For ``read_pickle`` method, ``compression`` parameter can be one of
2913-
{``'infer'``, ``'gzip'``, ``'bz2'``, ``'zip'``, ``'xz'``, ``None``}, default ``'infer'``.
2914-
If 'infer', then use gzip, bz2, zip, or xz if filename ends in '.gz', '.bz2', '.zip', or
2915-
'.xz', respectively. If using 'zip', the ZIP file must contain only one data file to be
2916-
read in. Set to ``None`` for no decompression.
2917-
``to_pickle`` works in a similar way, except that 'zip' format is not supported. If the
2918-
filename ends with '.zip', an exception will be raised.
2911+
.. warning::
2912+
2913+
Loading pickled data received from untrusted sources can be unsafe.
2914+
2915+
See: http://docs.python.org/2.7/library/pickle.html
2916+
2917+
.. warning::
2918+
2919+
Several internal refactorings, 0.13 (:ref:`Series Refactoring <whatsnew_0130.refactoring>`), and 0.15 (:ref:`Index Refactoring <whatsnew_0150.refactoring>`),
2920+
preserve compatibility with pickles created prior to these versions. However, these must
2921+
be read with ``pd.read_pickle``, rather than the default python ``pickle.load``.
2922+
See `this question <http://stackoverflow.com/questions/20444593/pandas-compiled-from-source-default-pickle-behavior-changed>`__
2923+
for a detailed explanation.
2924+
2925+
.. note::
2926+
2927+
These methods were previously ``pd.save`` and ``pd.load``, prior to 0.12.0, and are now deprecated.
2928+
2929+
.. _io.pickle.compression:
2930+
2931+
Read/Write compressed pickle files
2932+
''''''''''''''
2933+
2934+
.. versionadded:: 0.20.0
29192935

2920-
.. versionadded:: 0.20.0
2936+
:func:`read_pickle`, :meth:`DataFame.to_pickle` and :meth:`Series.to_pickle` can read
2937+
and write compressed pickle files. Compression types of ``gzip``, ``bz2``, ``xz`` supports
2938+
both read and write. ``zip`` file supports read only and must contain only one data file
2939+
to be read in.
2940+
Compression type can be an explicitely parameter or be inferred from the file extension.
2941+
If 'infer', then use ``gzip``, ``bz2``, ``zip``, or ``xz`` if filename ends in ``'.gz'``, ``'.bz2'``, ``'.zip'``, or
2942+
``'.xz'``, respectively.
29212943

29222944
.. ipython:: python
29232945
29242946
df = pd.DataFrame({
29252947
'A': np.random.randn(1000),
29262948
'B': np.random.randn(1000),
29272949
'C': np.random.randn(1000)})
2928-
df.to_pickle("data.pkl.xz")
2929-
df.to_pickle("data.pkl.compress", compression="gzip")
2950+
df.to_pickle("data.pkl.compress", compression="gzip") # explicit compression type
2951+
df.to_pickle("data.pkl.xz", compression="infer") # infer compression type from extension
2952+
df.to_pickle("data.pkl.gz") # default, using "infer"
29302953
df["A"].to_pickle("s1.pkl.bz2")
29312954
2932-
df = pd.read_pickle("data.pkl.xz")
29332955
df = pd.read_pickle("data.pkl.compress", compression="gzip")
2956+
df = pd.read_pickle("data.pkl.xz", compression="infer")
2957+
df = pd.read_pickle("data.pkl.gz")
29342958
s = pd.read_pickle("s1.pkl.bz2")
29352959
29362960
.. ipython:: python
29372961
:suppress:
29382962
import os
2939-
os.remove("data.pkl.xz")
29402963
os.remove("data.pkl.compress")
2964+
os.remove("data.pkl.xz")
2965+
os.remove("data.pkl.gz")
29412966
os.remove("s1.pkl.bz2")
29422967
2943-
.. warning::
2944-
2945-
Loading pickled data received from untrusted sources can be unsafe.
2946-
2947-
See: http://docs.python.org/2.7/library/pickle.html
2948-
2949-
.. warning::
2950-
2951-
Several internal refactorings, 0.13 (:ref:`Series Refactoring <whatsnew_0130.refactoring>`), and 0.15 (:ref:`Index Refactoring <whatsnew_0150.refactoring>`),
2952-
preserve compatibility with pickles created prior to these versions. However, these must
2953-
be read with ``pd.read_pickle``, rather than the default python ``pickle.load``.
2954-
See `this question <http://stackoverflow.com/questions/20444593/pandas-compiled-from-source-default-pickle-behavior-changed>`__
2955-
for a detailed explanation.
2956-
2957-
.. note::
2958-
2959-
These methods were previously ``pd.save`` and ``pd.load``, prior to 0.12.0, and are now deprecated.
2960-
29612968
.. _io.msgpack:
29622969

29632970
msgpack

doc/source/whatsnew/v0.20.0.txt

+14-8
Original file line numberDiff line numberDiff line change
@@ -97,36 +97,42 @@ support for bz2 compression in the python 2 c-engine improved (:issue:`14874`).
9797
df = pd.read_table(url, compression='bz2') # explicitly specify compression
9898
df.head(2)
9999

100-
.. _whatsnew_0200.enhancements.uint64_support:
100+
.. _whatsnew_0200.enhancements.pickle_compression:
101101

102102
Pickle file I/O now supports compression
103103
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
104104

105-
``read_pickle`` and ``to_pickle`` can now read from and write to compressed
106-
pickle files. Compression methods can be explicit parameter or be inferred
107-
from file extension.
105+
:func:`read_pickle`, :meth:`DataFame.to_pickle` and :meth:`Series.to_pickle`
106+
can now read from and write to compressed pickle files. Compression methods
107+
can be an explicit parameter or be inferred from the file extension.
108+
See :ref:`Read/Write compressed pickle files <io.pickle.compression>`
108109

109110
.. ipython:: python
110111

111112
df = pd.DataFrame({
112113
'A': np.random.randn(1000),
113114
'B': np.random.randn(1000),
114115
'C': np.random.randn(1000)})
115-
df.to_pickle("data.pkl.xz")
116-
df.to_pickle("data.pkl.compress", compression="gzip")
116+
df.to_pickle("data.pkl.compress", compression="gzip") # explicit compression type
117+
df.to_pickle("data.pkl.xz", compression="infer") # infer compression type from extension
118+
df.to_pickle("data.pkl.gz") # default, using "infer"
117119
df["A"].to_pickle("s1.pkl.bz2")
118120

119-
df = pd.read_pickle("data.pkl.xz")
120121
df = pd.read_pickle("data.pkl.compress", compression="gzip")
122+
df = pd.read_pickle("data.pkl.xz", compression="infer")
123+
df = pd.read_pickle("data.pkl.gz")
121124
s = pd.read_pickle("s1.pkl.bz2")
122125

123126
.. ipython:: python
124127
:suppress:
125128
import os
126-
os.remove("data.pkl.xz")
127129
os.remove("data.pkl.compress")
130+
os.remove("data.pkl.xz")
131+
os.remove("data.pkl.gz")
128132
os.remove("s1.pkl.bz2")
129133

134+
.. _whatsnew_0200.enhancements.uint64_support:
135+
130136
UInt64 Support Improved
131137
^^^^^^^^^^^^^^^^^^^^^^^
132138

pandas/tests/io/test_pickle.py

+16-34
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,6 @@
1717
import os
1818
from distutils.version import LooseVersion
1919
import pandas as pd
20-
import numpy as np
2120
from pandas import Index
2221
from pandas.compat import is_platform_little_endian
2322
import pandas
@@ -391,12 +390,16 @@ def test_write_explicit(compression):
391390

392391
with tm.ensure_clean(path1) as p1, tm.ensure_clean(path2) as p2:
393392
df = tm.makeDataFrame()
393+
394394
# write to compressed file
395395
df.to_pickle(p1, compression=compression)
396+
396397
# decompress
397398
decompress_file(p1, p2, compression=compression)
399+
398400
# read decompressed file
399401
df2 = pd.read_pickle(p2, compression=None)
402+
400403
tm.assert_frame_equal(df, df2)
401404

402405

@@ -425,12 +428,16 @@ def test_write_infer(ext):
425428

426429
with tm.ensure_clean(path1) as p1, tm.ensure_clean(path2) as p2:
427430
df = tm.makeDataFrame()
431+
428432
# write to compressed file by inferred compression method
429433
df.to_pickle(p1)
434+
430435
# decompress
431436
decompress_file(p1, p2, compression=compression)
437+
432438
# read decompressed file
433439
df2 = pd.read_pickle(p2, compression=None)
440+
434441
tm.assert_frame_equal(df, df2)
435442

436443

@@ -446,12 +453,16 @@ def test_read_explicit(compression):
446453

447454
with tm.ensure_clean(path1) as p1, tm.ensure_clean(path2) as p2:
448455
df = tm.makeDataFrame()
456+
449457
# write to uncompressed file
450458
df.to_pickle(p1, compression=None)
459+
451460
# compress
452461
compress_file(p1, p2, compression=compression)
462+
453463
# read compressed file
454464
df2 = pd.read_pickle(p2, compression=compression)
465+
455466
tm.assert_frame_equal(df, df2)
456467

457468

@@ -472,43 +483,14 @@ def test_read_infer(ext):
472483

473484
with tm.ensure_clean(path1) as p1, tm.ensure_clean(path2) as p2:
474485
df = tm.makeDataFrame()
486+
475487
# write to uncompressed file
476488
df.to_pickle(p1, compression=None)
489+
477490
# compress
478491
compress_file(p1, p2, compression=compression)
492+
479493
# read compressed file by inferred compression method
480494
df2 = pd.read_pickle(p2)
481-
tm.assert_frame_equal(df, df2)
482-
483495

484-
485-
486-
487-
488-
489-
490-
491-
492-
493-
494-
495-
496-
497-
498-
def notest_zip():
499-
df = pd.DataFrame({
500-
'A': np.random.randn(100).repeat(10),
501-
'B': np.random.randn(100).repeat(10),
502-
'C': np.random.randn(100).repeat(10)})
503-
os.chdir("d:\\test")
504-
505-
df.to_pickle("data.raw")
506-
compress_file("data.raw", "data.zip", "zip")
507-
compress_file("data.raw", "data.xz", "xz")
508-
compress_file("data.raw", "data.bz2", "bz2")
509-
compress_file("data.raw", "data.gz", "gzip")
510-
511-
decompress_file("data.zip", "data.zip.raw", "zip")
512-
decompress_file("data.xz", "data.xz.raw", "xz")
513-
decompress_file("data.bz2", "data.bz2.raw", "bz2")
514-
decompress_file("data.gz", "data.gz.raw", "gzip")
496+
tm.assert_frame_equal(df, df2)

0 commit comments

Comments
 (0)