-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: pd.NA doesn't pickle/unpickle faithfully #31847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you include a reproducible example? I could not reproduce this on master:
|
@MarcoGorelli I can upload a sample, if that will suffice, but I do not know how to reproduce |
@tsoernes if you run the two lines I posted above, do you get the same output? |
@MarcoGorelli Yes |
Here is a sample of that column with 2 rows. It is a zipped pickle file (Github only allows zips).
|
@tsoernes I'm afraid we can't accept raw pickle files in bug reports, as they could be unsafe. Please remove the attachment from your message :) Could you please paste the output of |
|
Thanks @tsoernes TBH I still can't reproduce the issue though
|
I can't either, when going via a dictionary.
…On Tue, Feb 11, 2020 at 11:57 PM Marco Gorelli ***@***.***> wrote:
Thanks @tsoernes <https://github.com/tsoernes>
TBH I still can't reproduce the issue though
>>> pd.Series({268: ['Fintech'], 269: pd.NA}).isna()
268 False
269 True
dtype: bool
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#31847?email_source=notifications&email_token=ABTX3RBRDVT7VZ5DSRTPSYTRCMNNRA5CNFSM4KSMLNH2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOELOHURQ#issuecomment-584874566>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABTX3RADPUPEHTYF7YCSB3LRCMNNRANCNFSM4KSMLNHQ>
.
|
When pickling/unpickling, I can reproduce this:
So apparently, when unpickling, it doesn't return the same singleton. |
So we should probably explicitly implement methods for pickling/unpickling on the NA class |
According to https://docs.python.org/3/library/pickle.html#object.__reduce__, > If a string is returned, the string should be interpreted as the name > of a global variable. It should be the object’s local name relative to > its module; the pickle module searches the module namespace to determine > the object’s module. This behaviour is typically useful for singletons. Closes pandas-dev#31847
A simple example of the problem:
This can also cause exceptions when working with dtypes other than object.
The following function replaces the incorrect NA values in place. It operates one column at-a-time to preserve dtypes. flake8 complains about comparing types rather than using def fix_wrong_na(df):
for column in df.columns:
isna_mask = df[column].apply(type) == type(pd.NA)
df[column][isna_mask] = pd.NA |
According to https://docs.python.org/3/library/pickle.html#object.__reduce__, > If a string is returned, the string should be interpreted as the name > of a global variable. It should be the object’s local name relative to > its module; the pickle module searches the module namespace to determine > the object’s module. This behaviour is typically useful for singletons. Closes #31847
Code Sample, a copy-pastable example if possible
Output of
pd.show_versions()
INSTALLED VERSIONS
commit : None
python : 3.7.5.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.16-200.fc30.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : nb_NO.UTF-8
LOCALE : nb_NO.UTF-8
pandas : 1.0.1
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.0
pip : 19.3.1
setuptools : 41.6.0.post20191030
Cython : 0.29.13
pytest : 5.2.2
hypothesis : None
sphinx : 2.2.1
blosc : None
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : 2.8.4 (dt dec pq3 ext lo64)
jinja2 : 2.10.3
IPython : 7.9.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : None
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 2.2.3
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : 0.15.1
pytables : None
pytest : 5.2.2
pyxlsb : None
s3fs : None
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.5.2
tabulate : 0.8.5
xarray : None
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2
numba : 0.46.0
The text was updated successfully, but these errors were encountered: