Skip to content

ENH: Adding 'protocol' parameter to 'to_pickle'. #16252

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 11 commits into from

Conversation

jbschiratti
Copy link
Contributor

This PR aims at adding an optional protocol parameter to the function to_pickle.
Closes #14488. If needed, I can update the corresponding test (pandas/tests/io/test_pickle.py).

@jreback
Copy link
Contributor

jreback commented May 5, 2017

tests are the first thing you should write.

@jreback jreback added Compat pandas objects compatability with Numpy or Python functions IO Data IO issues that don't fit into a more specific label Enhancement labels May 6, 2017
@jreback jreback changed the title Adding 'protocol' parameter to 'to_pickle'. ENH: Adding 'protocol' parameter to 'to_pickle'. May 6, 2017
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

needs some tests. with valid & invalid protocol passed.

@@ -50,6 +50,7 @@
from pandas.compat.numpy import function as nv
from pandas.compat import (map, zip, lzip, lrange, string_types,
isidentifier, set_function_name)
from pandas.compat import cPickle as pkl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to the previous line

@@ -1354,11 +1356,15 @@ def to_pickle(self, path, compression='infer'):
File path
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file
protocol : int
Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (Pickle module).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

version added tag (the compression one goes above this), your tag should read 0.21.0

@@ -18,6 +18,9 @@ def to_pickle(obj, path, compression='infer'):
File path
compression : {'infer', 'gzip', 'bz2', 'xz', None}, default 'infer'
a string representing the compression to use in the output file
protocol : int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jreback
Copy link
Contributor

jreback commented May 6, 2017

also add a whatsnew in enhancements for 0.21.0

@jbschiratti
Copy link
Contributor Author

I am facing an issue when writing the tests.
I Installed a pandas-dev environment as suggested in http://pandas.pydata.org/pandas-docs/stable/contributing.html. When I try to run pytests on pandas/tests/io/test_pickle.py, I get the following error :

 % pytest test_pickle.py                                                                                                                                                                                             ~/pandas/pandas/tests/io
============================================================================================================ test session starts =============================================================================================================
platform linux -- Python 3.6.1, pytest-3.0.7, py-1.4.33, pluggy-0.4.0
rootdir: /cal/homes/jbschiratti/pandas, inifile: setup.cfg
plugins: cov-2.3.1
collected 42 items 

test_pickle.py ..........................................Traceback (most recent call last):
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/bin/pytest", line 6, in <module>
    sys.exit(pytest.main())
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/config.py", line 57, in main
    return config.hook.pytest_cmdline_main(config=config)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 745, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 339, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 334, in <lambda>
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 614, in execute
    res = hook_impl.function(*args)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/main.py", line 127, in pytest_cmdline_main
    return wrap_session(config, _main)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/main.py", line 122, in wrap_session
    exitstatus=session.exitstatus)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 745, in __call__
    return self._hookexec(self, self._nonwrappers + self._wrappers, kwargs)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 339, in _hookexec
    return self._inner_hookexec(hook, methods, kwargs)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 334, in <lambda>
    _MultiCall(methods, kwargs, hook.spec_opts).execute()
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 613, in execute
    return _wrapped_call(hook_impl.function(*args), self.execute)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 250, in _wrapped_call
    wrap_controller.send(call_outcome)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/terminal.py", line 358, in pytest_sessionfinish
    outcome.get_result()
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 279, in get_result
    raise ex[1].with_traceback(ex[2])
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 265, in __init__
    self.result = func()
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/vendored_packages/pluggy.py", line 614, in execute
    res = hook_impl.function(*args)
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/cacheprovider.py", line 152, in pytest_sessionfinish
    prev_failed = config.cache.get("cache/lastfailed", None) is not None
  File "/cal/homes/jbschiratti/anaconda3/envs/pandas_dev/lib/python3.6/site-packages/_pytest/cacheprovider.py", line 57, in get
    return json.load(f)
AttributeError: module 'json' has no attribute 'load'

Is it because it indirectly calls the wrong json module ?

@TomAugspurger
Copy link
Contributor

Try changing to the root of the git repo and running pandas/tests/io/test_pickle.py

@jbschiratti
Copy link
Contributor Author

@TomAugspurger It solved my problem. Thanks!

@codecov
Copy link

codecov bot commented May 11, 2017

Codecov Report

Merging #16252 into master will decrease coverage by <.01%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16252      +/-   ##
==========================================
- Coverage   90.33%   90.32%   -0.01%     
==========================================
  Files         164      164              
  Lines       50890    50891       +1     
==========================================
- Hits        45972    45969       -3     
- Misses       4918     4922       +4
Flag Coverage Δ
#multiple 88.11% <100%> (+0.01%) ⬆️
#single 40.3% <60%> (-0.1%) ⬇️
Impacted Files Coverage Δ
pandas/core/generic.py 91.72% <100%> (ø) ⬆️
pandas/io/pickle.py 79.54% <100%> (ø) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.58% <0%> (-0.1%) ⬇️
pandas/core/indexes/datetimes.py 95.23% <0%> (-0.1%) ⬇️
pandas/core/common.py 91.03% <0%> (+0.34%) ⬆️
pandas/_version.py 44.65% <0%> (+1.9%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a31c96d...04bc5c2. Read the comment docs.

@codecov
Copy link

codecov bot commented May 11, 2017

Codecov Report

Merging #16252 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16252      +/-   ##
==========================================
+ Coverage   90.33%   90.38%   +0.04%     
==========================================
  Files         164      161       -3     
  Lines       50890    50967      +77     
==========================================
+ Hits        45972    46067      +95     
+ Misses       4918     4900      -18
Flag Coverage Δ
#multiple 88.16% <100%> (+0.05%) ⬆️
#single 40.2% <50%> (-0.21%) ⬇️
Impacted Files Coverage Δ
pandas/io/pickle.py 80.43% <100%> (+0.88%) ⬆️
pandas/core/generic.py 92.13% <100%> (+0.4%) ⬆️
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/io/msgpack/_version.py 44.86% <0%> (-55.14%) ⬇️
pandas/core/util/hashing.py 92.96% <0%> (-6.08%) ⬇️
pandas/core/computation/expressions.py 90.47% <0%> (-2.21%) ⬇️
pandas/core/indexes/category.py 98.18% <0%> (-0.31%) ⬇️
pandas/core/indexes/multi.py 96.56% <0%> (-0.17%) ⬇️
pandas/core/series.py 94.71% <0%> (-0.09%) ⬇️
pandas/io/formats/format.py 95.66% <0%> (ø) ⬆️
... and 36 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a31c96d...8eb660d. Read the comment docs.

@jreback
Copy link
Contributor

jreback commented May 11, 2017

this needs some tests.

@jbschiratti
Copy link
Contributor Author

I'm on it. I'll make a new commit, very soon, with the necessary tests.

@jbschiratti
Copy link
Contributor Author

@jreback I added some tests for valid and invalid values of the protocol parameter. Please, have a look and tell me if more tests are needed or if there is a problem with the code.

@jbschiratti
Copy link
Contributor Author

@jreback : Do you think that more tests are needed ?

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm.

@@ -27,6 +27,28 @@ New features
Other Enhancements
^^^^^^^^^^^^^^^^^^

Pickle file I/O protocol parameter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this section. Can just add a single line with the issue refernce.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the 'whatsnew' file accordingly.

@jreback
Copy link
Contributor

jreback commented May 17, 2017

small linting failure.

@jreback jreback added this to the 0.21.0 milestone May 17, 2017
@jbschiratti
Copy link
Contributor Author

jbschiratti commented May 17, 2017

Oupps! Fixed the linting failure. Should be OK now.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. minor comments. ping on green.

this parameter depend on the version of Python. For Python 2.x,
possible values are 0, 1, 2. For Python>=3.0, 3 is a valid value.
For Python >= 3.4, 4 is a valid value.A negative value for the
protocol parameter is equivalent to setting its value to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a link to the pickle docs where this is defined.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, I added the url.

@@ -27,7 +27,7 @@ New features
Other Enhancements
^^^^^^^^^^^^^^^^^^


- Added protocol parameter to :func:`to_pickle`. The 'protocol' parameter defaults to HIGHEST_PROTOCOL. (:issue:`16252`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:func:to_pickle has gained a protocol parameter.....

Int which indicates which protocol should be used by the pickler,
default HIGHEST_PROTOCOL (Pickle module). The possible values for
this parameter depend on the version of Python. For Python 2.x,
possible values are 0, 1, 2. For Python>=3.0, 3 is a valid value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

@jbschiratti
Copy link
Contributor Author

With my previous commit, all tests were successful. After modifying docstrings and the whatsnew file, test_clipboard is failing, according to Travis. Is there something wrong with my recent changes ?

@@ -27,7 +27,8 @@ New features
Other Enhancements
^^^^^^^^^^^^^^^^^^


- :func:`to_pickle` has gained a protocol parameter (:issue:`16252`). By default,
this parameter is set to HIGHEST_PROTOCOL (see https://docs.python.org/3/library/pickle.html, 12.1.2).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this render? e.g. we normally use HIGHEST_PROTOCOL <url....>__

@jreback
Copy link
Contributor

jreback commented May 18, 2017

I restarted that CI job. sometimes that tests fails :<

minor comment on docs. just ping when pushed.

@jbschiratti
Copy link
Contributor Author

@jreback : I pushed the changes.

@jreback
Copy link
Contributor

jreback commented May 18, 2017

thanks!

@jreback jreback closed this in 539de79 May 18, 2017
@jreback
Copy link
Contributor

jreback commented May 18, 2017

keep em coming!

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017
This PR aims at adding an optional `protocol` parameter to the
function `to_pickle`.   Closes pandas-dev#14488. If needed, I can update the
corresponding test (`pandas/tests/io/test_pickle.py`).

Author: Jean-Baptiste Schiratti <[email protected]>

Closes pandas-dev#16252 from jbschiratti/pickle_protocol and squashes the following commits:

8eb660d [Jean-Baptiste Schiratti] Minor change on whatsnew.
20a854d [Jean-Baptiste Schiratti] Added ref for protocol parameter + edited whatsnew.
14bc485 [Jean-Baptiste Schiratti] Fix : removed unused import.
7631146 [Jean-Baptiste Schiratti] Fix : added issue number.
460ca0c [Jean-Baptiste Schiratti] Shortened paragraph addded in 'whatsnew'.
352220b [Jean-Baptiste Schiratti] Fix : Fixed error message in 'test_read_bad_versions'.
9c9d38f [Jean-Baptiste Schiratti] Added enhancement to 'whatsnew' file.
35f8d18 [Jean-Baptiste Schiratti] Added tests for new 'protocol' parameter in 'to_pickle'.
4bf0386 [Jean-Baptiste Schiratti] Added docstring for negative protocol parameter.
04bc5c2 [Jean-Baptiste Schiratti] Added 'versionadded' tag, improved docstring + fixed import.
66a35e8 [Jean-Baptiste Schiratti] Added 'protocol' parameter to 'to_pickle'.
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
This PR aims at adding an optional `protocol` parameter to the
function `to_pickle`.   Closes pandas-dev#14488. If needed, I can update the
corresponding test (`pandas/tests/io/test_pickle.py`).

Author: Jean-Baptiste Schiratti <[email protected]>

Closes pandas-dev#16252 from jbschiratti/pickle_protocol and squashes the following commits:

8eb660d [Jean-Baptiste Schiratti] Minor change on whatsnew.
20a854d [Jean-Baptiste Schiratti] Added ref for protocol parameter + edited whatsnew.
14bc485 [Jean-Baptiste Schiratti] Fix : removed unused import.
7631146 [Jean-Baptiste Schiratti] Fix : added issue number.
460ca0c [Jean-Baptiste Schiratti] Shortened paragraph addded in 'whatsnew'.
352220b [Jean-Baptiste Schiratti] Fix : Fixed error message in 'test_read_bad_versions'.
9c9d38f [Jean-Baptiste Schiratti] Added enhancement to 'whatsnew' file.
35f8d18 [Jean-Baptiste Schiratti] Added tests for new 'protocol' parameter in 'to_pickle'.
4bf0386 [Jean-Baptiste Schiratti] Added docstring for negative protocol parameter.
04bc5c2 [Jean-Baptiste Schiratti] Added 'versionadded' tag, improved docstring + fixed import.
66a35e8 [Jean-Baptiste Schiratti] Added 'protocol' parameter to 'to_pickle'.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions Enhancement IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants