Skip to content

CLN: Exception in pickle loading #28645

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Nov 2, 2019
Merged

Conversation

jbrockmendel
Copy link
Member

  • closes #xxxx
  • tests added / passed
  • passes black pandas
  • passes git diff upstream/master -u -- "*.py" | flake8 --diff
  • whatsnew entry

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche @datapythonista since the CI failure is in web_and_docs. It looks like there is a problem trying to unpickle something from pytables? is this something we should have a regular test for?

@datapythonista
Copy link
Member

The error is in the v0.8 release note I think. May be there is some code there doing things that are not valid anymore?

Otherwise adding a new tests sounds good.

@jorisvandenbossche
Copy link
Member

Well, the docs is just code that is ran with the latest pandas. So there is a line of code that is working on master and failing with this branch (the pytables thing you see at the end is something else (ignored), you need to scroll up in the logs to find the actual error).

It's this (assuming 'pandas' is your local cloned repo):

In [7]: pd.read_pickle('pandas/doc/data/fx_prices') 
Out[7]: 
                EU      FR      GR      IT
DATE                                      
1980-01-01     NaN  4.0405  1.7246  804.74
1980-02-01     NaN  4.0963  1.7482  810.01
1980-03-01     NaN  4.3149  1.8519  860.13
1980-04-01     NaN  4.3536  1.8776  876.41
1980-05-01     NaN  4.1808  1.7913  843.23
...            ...     ...     ...     ...
2012-01-01  1.2910     NaN     NaN     NaN
2012-02-01  1.3238     NaN     NaN     NaN
2012-03-01  1.3208     NaN     NaN     NaN
2012-04-01  1.3160     NaN     NaN     NaN
2012-05-01  1.2806     NaN     NaN     NaN

[389 rows x 4 columns]

So we have a pickle file that reads fine on master, and the changes in this branch broke that.

Now, I assume the pickle file was written with pandas 0.8.0, so we don't guarantee support for that anymore, so it might not be a problem that you are breaking it (although t might be useful to understand why, to be sure it will not be a problem for more recent versions of pandas)

The error on this branch is:

>>>-------------------------------------------------------------------------
Exception in /home/vsts/work/1/s/doc/source/whatsnew/v0.8.0.rst at block ending on line None
Specify :okexcept: as an option in the ipython:: block to suppress this message
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-99c214b0f98c> in <module>
----> 1 fx = pd.read_pickle('data/fx_prices')

~/work/1/s/pandas/io/pickle.py in read_pickle(path, compression)
    156             # We want to silence any warnings about, e.g. moved modules.
    157             warnings.simplefilter("ignore", Warning)
--> 158             return pickle.load(f)
    159     except excs_to_catch:
    160         # e.g.

TypeError: _reconstruct: First argument must be a sub-type of ndarray
<<<-------------------------------------------------------------------------

try:
return pc.load(f, encoding=None)
except Exception:
except UnicodeDecodeError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't remove that code in 0.8.0 whatsnew? (e.g. make it a code-block)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this worked

@jreback jreback added the Clean label Oct 3, 2019
@jreback jreback added this to the 1.0 milestone Oct 3, 2019
@jorisvandenbossche
Copy link
Member

Note that by the changes in this PR, we now have a pickle file generated by pandas that we cannot longer read. So instead of just making this a code block, I would at least investigate a bit the reason why this happens to make sure this is not a valid use case that is now breaking.

@jreback
Copy link
Contributor

jreback commented Oct 17, 2019

Note that by the changes in this PR, we now have a pickle file generated by pandas that we cannot longer read. So instead of just making this a code block, I would at least investigate a bit the reason why this happens to make sure this is not a valid use case that is now breaking.

not sure I undertstand this last point. we no longer guarntee compatiblity for pickles < 0.20.3 (we did this in 0.25), so not sure this is an issue at all.

@jorisvandenbossche
Copy link
Member

so not sure this is an issue at all.

I am not sure either. But if we are not sure, we should investigate it before just removing the code. The pickle file contains a dataframe with just float columns, so doesn't seem something that should break to me.

@jorisvandenbossche
Copy link
Member

So since it was about encoding errors, I tried to write a simple pickle file with pandas 0.24.2 in Python 2, and read it with master and this branch:

In [1]: pd.__version__
Out[1]: u'0.24.2'

In [2]: df = pd.DataFrame(np.random.randn(10, 3), columns=['a', 'b', 'c'])

In [3]: df.to_pickle("test_py27.pkl")

and reading with master works, with this branch I get:

In [1]: pd.read_pickle("test_py27.pkl")
---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
<ipython-input-1-708cbd47a9cb> in <module>
----> 1 pd.read_pickle("test_py27.pkl")

~/scipy/pandas/pandas/io/pickle.py in read_pickle(path, compression)
    156             # We want to silence any warnings about, e.g. moved modules.
    157             warnings.simplefilter("ignore", Warning)
--> 158             return pickle.load(f)
    159     except excs_to_catch:
    160         # e.g.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc6 in position 2: ordinal not in range(128)

@jbrockmendel
Copy link
Member Author

@jorisvandenbossche excellent, can you make a test case out of this example? i'm fine with pinning it into this branch or seeing it go in first

@jbrockmendel
Copy link
Member Author

updated with a test based on Joris's example

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! small comment


import pandas as pd
fx = pd.read_pickle('data/fx_prices')
import matplotlib.pyplot as plt

``Series.plot`` now supports a ``secondary_y`` option:

.. ipython:: python
.. code-block:: python
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you turn those again into ipython blocks now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel do we need to revert this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was reverted and then un-reverted, following #28645 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool

@jorisvandenbossche
Copy link
Member

@jbrockmendel the doc example still gives problems after restoring the py2 compat?
Note that it might not need to be fixed. If the error is now related to the fact that it is an old file (and not to the py2 unicode-decoding), that's fine to not fix (we do not have any guarantee to support pickle files from such old pandas versions)

@jbrockmendel
Copy link
Member Author

the doc example still gives problems after restoring the py2 compat?

Yes. Re-disabling sounds goods.

@jbrockmendel
Copy link
Member Author

re-disabled and green

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jbrockmendel
Copy link
Member Author

any outstanding issues here?

@jreback jreback merged commit 69771ec into pandas-dev:master Nov 2, 2019
@jbrockmendel jbrockmendel deleted the faster_pickle branch November 2, 2019 15:50
Reksbril pushed a commit to Reksbril/pandas that referenced this pull request Nov 18, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
proost pushed a commit to proost/pandas that referenced this pull request Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants