-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: Segfault when calling df.to_json(orient='records') and numpy.datetime64 being serialized #58160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
System Information:OS: macOS-13.5-arm64-arm-64bit TestCase I used Create a Pandas DataFramets = pd.Timestamp('2020-02-21T00:00:00.000000000') Print the DataFrameprint("DataFrame:") Save the DataFrame as a JSON filejson_filename = "data.json"
data.json content |
As I said in the bug. The approximate reproduction code doesn't cause a segfault. I also included the code in my application next to where the segfault happens, along with debugging print statements. I think I'm constructing a dataframe with the same values as the failing one, but the failing dataframe is the result of a series of summary functions being applied to a dataframe and all of the summary values are constructed into a single dataframe. The memory layout for the failing dataframe is probably different than the value equivalent dataframe. I don't know how to succinctly reproduce the failing dataframe. I'm happy to dig into this problem with whoever is interested. You might get an interesting test case out of it. |
I created a pickle file with a simplified dataframe that can reproduce the segmentation fault simplified_df.pckl the Here is the test script import pandas as pd
simplified_df = pd.read_pickle('simplified_df.pckl')
pd.show_versions()
print("simplified_df")
print(simplified_df)
print("-" * 80)
print("simplified_df.dtypes")
print(simplified_df.dtypes)
print("-" * 80)
vc_bad_row = simplified_df['EventDate'].loc['value_counts']
print("vc_bad_row")
print(vc_bad_row)
print("-" * 80)
print("type(vc_bad_row)")
print(type(vc_bad_row))
print("-" * 80)
print("vc_bad_row.dtype")
print(vc_bad_row.dtype)
print("-" * 80)
print("before to_json")
simplified_df.to_json(orient='table')
print("after to_json") which outputs the following on my machine
|
I now have a minimally reproducible example
|
This looks to be fixed on main, so it appears to have been addressed since 2.0.3 so closing In [1]: import pandas as pd
...: import numpy as np
...: bad_df = pd.DataFrame({'column': { 'foo':8, 'bad_val': np.datetime64('2005-02-25')}})
...: bad_df.to_json(orient='table')
Out[1]: '{"schema":{"fields":[{"name":"index","type":"string"},{"name":"column","type":"string"}],"primaryKey":["index"],"pandas_version":"1.4.0"},"data":[{"index":"foo","column":8},{"index":"bad_val","column":"2005-02-25T00:00:00.000"}]}' |
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
this fails at the
df.to_json
call. Here is the debugging info outputtedThe text was updated successfully, but these errors were encountered: