-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
pd.read_json() leads to a segmentation fault #32383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pd.read_json()
leads to a segmentation fault
pd.read_json()
leads to a segmentation fault
Can you post a copy/paste-able example (https://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports)? |
Yes I can! Apologies for not doing so sooner. Thanks for the helpful MCVE post. This leads to a segmentation fault on my machine: import pandas as pd
df = pd.DataFrame(
{'_id': {'0': '0'},
'category': {'0': 'Goods'},
'recommender_id': {'0': '3'},
'recommender_name_jp': {'0': '浦田'},
'recommender_name_en': {'0': 'Urata'},
'name_jp': {'0': '博多人形(松尾吉将まつお よしまさ)'},
'name_en': {'0': 'Hakata Dolls Matsuo'}
})
df_json = df.to_json()
pd.read_json(df_json) But this does not: import json
import pandas as pd
df = pd.DataFrame(
{'_id': {'0': '0'},
'category': {'0': 'Goods'},
'recommender_id': {'0': '3'},
'recommender_name_jp': {'0': '浦田'},
'recommender_name_en': {'0': 'Urata'},
'name_jp': {'0': '博多人形(松尾吉将まつお よしまさ)'},
'name_en': {'0': 'Hakata Dolls Matsuo'}
})
df_json = df.to_json()
df_dict = json.loads(df_json)
pd.DataFrame().from_dict(df_dict) And, unfortunately, there is no helpful traceback. All I get is a |
Hmm i cant reproduce this locally, so let's try to narrow down the example. Do you need all of the entries in that dict to get the segfault, or can you remove some? (id speculate that the ASCII entries can be removed) |
I also can't reproduce using the code in #32383 (comment) (using pandas master). |
@gambingo can you try on pandas master? |
Cant produce this either on master with OSX. Something possible might have been fixed on master in the meantime. Suppose it could use a test |
cannot reproduce this for pandas master and v1.0.1 and v0.25.3 in Windows environment. |
Code Sample
The following code leads to either a segmentation fault or a bus error:
But the following workaround fixes it:
Problem description
I am storing several jsonified dataframes in MongoDB. Trying to go straight from the json to the dataframe leads to either a segmentation fault or a bus error. I am storying several dataframes, but confusingly it only happens with one dataframe and not the others. The problem dataframe is identical in nature to other dataframes (same column names and data types). Unfortunately, it's not shareable but happy to answer any questions I can.
Output of
pd.show_versions()
Note: I am using the latest pandas, 1.0.1. I downgraded to pandas 0.25.3 and had the same issue.
pandas : 1.0.1
numpy : 1.18.1
pytz : 2019.3
dateutil : 2.8.1
pip : 20.0.2
setuptools : 45.1.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.1
IPython : 7.11.1
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.3
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
pytest : None
pyxlsb : None
s3fs : None
scipy : 1.4.1
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
numba : None
The text was updated successfully, but these errors were encountered: