Skip to content

Commit b61d387

Browse files
levinzimmermannnfdiary
authored andcommitted
pandas: Fix unpickle np arrays with py2+pd>0.19.x
Pandas 0.20.0 introduced a bug fix [1] which changed the behaviour of 'DataFrame.to_records()', so that the resulting Record objects dtype names are unicodes if the data frames column names were unicode. Before this bug fix the dtype names were str, no matter whether the column names were str or unicode. Unfortunately np unpickle breaks if dtype names are unicode [2]. Since many of our data frame columns are unicode, loading arrays often fails. In python3 this isn't a problem anymore, so until then we fix this by introducing a simple monkey patch to pandas, which basically reverts the mentioned bug fix. [1] pandas-dev/pandas#11879 [2] Small example to reproduce this error: '' import os import numpy as np import pandas as pd r = pd.DataFrame({u'A':[1,2,3]}).to_records() a = np.ndarray(shape=r.shape, dtype=r.dtype.fields) p = "t" try: os.remove(p) except: pass with open(p, 'wb') as f: np.save(f, a) with open(p, 'rb') as f: np.load(f) '' /reviewed-on https://lab.nexedi.com/nexedi/erp5/merge_requests/1738 /reviewed-by @jerome @klaus
1 parent ae04f1a commit b61d387

File tree

2 files changed

+47
-1
lines changed

2 files changed

+47
-1
lines changed

product/ERP5Type/__init__.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@
3232
"""
3333
from __future__ import absolute_import
3434
from App.config import getConfiguration
35-
from .patches import python, globalrequest
35+
from .patches import python, globalrequest, Pandas
3636
import six
3737
if six.PY2:
3838
from .patches import pylint

product/ERP5Type/patches/Pandas.py

+46
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
##############################################################################
2+
#
3+
# Copyright (c) 2023 Nexedi SARL and Contributors. All Rights Reserved.
4+
#
5+
# WARNING: This program as such is intended to be used by professional
6+
# programmers who take the whole responsability of assessing all potential
7+
# consequences resulting from its eventual inadequacies and bugs
8+
# End users who are looking for a ready-to-use solution with commercial
9+
# garantees and support are strongly adviced to contract a Free Software
10+
# Service Company
11+
#
12+
# This program is Free Software; you can redistribute it and/or
13+
# modify it under the terms of the GNU General Public License
14+
# as published by the Free Software Foundation; either version 2
15+
# of the License, or (at your option) any later version.
16+
#
17+
# This program is distributed in the hope that it will be useful,
18+
# but WITHOUT ANY WARRANTY; without even the implied warranty of
19+
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
20+
# GNU General Public License for more details.
21+
#
22+
# You should have received a copy of the GNU General Public License
23+
# along with this program; if not, write to the Free Software
24+
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
25+
#
26+
##############################################################################
27+
28+
import numpy as np
29+
30+
try:
31+
import pandas as pd
32+
except ImportError:
33+
pass
34+
else:
35+
# This monkey-patch reverts https://github.com/pandas-dev/pandas/commit/25dcff59
36+
#
37+
# We're often using unicode strings in DataFrame column names,
38+
# which makes it impossible to unpickle np arrays. With python3
39+
# this isn't a problem anymore, so we should remove this once ERP5
40+
# is fully migrated to Python3 only support.
41+
pd_DataFrame_to_records = pd.DataFrame.to_records
42+
def DataFrame_to_records(*args, **kwargs):
43+
record = pd_DataFrame_to_records(*args, **kwargs)
44+
record.dtype = np.dtype([(str(k), v) for k, v in record.dtype.descr])
45+
return record
46+
pd.DataFrame.to_records = DataFrame_to_records

0 commit comments

Comments
 (0)