Skip to content

BUG: read_excel leads to segfault #40321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
1 task
tharwan opened this issue Mar 9, 2021 · 2 comments
Closed
1 task

BUG: read_excel leads to segfault #40321

tharwan opened this issue Mar 9, 2021 · 2 comments
Labels
Bug Segfault Non-Recoverable Error

Comments

@tharwan
Copy link

tharwan commented Mar 9, 2021

  • [x ] I have checked that this issue has not already been reported.

  • [ x] I have confirmed this bug exists on the latest version of pandas.

  • (optional) I have confirmed this bug exists on the master branch of pandas.


Note: Please read this guide detailing how to provide the necessary information for us to reproduce your bug.

Code Sample, a copy-pastable example

from pathlib import Path
import pandas as pd

filename = 'file.xlsm'
template_df = pd.read_excel(filename, sheet_name="Muster", usecols="A:F", skiprows=1)
template_df.index = template_df["Beschreibung"]

path = 'file2.xlsx'
df = pd.read_excel(path, sheet_name="Januar", usecols="A:F", skiprows=1)
df.index = df["Beschreibung"]

template_df.loc['Fee EPEX ID Kundenhandel', "Short/Kauf/NEG"] = 0
template_df.update(df)
template_df.loc["Direktvermarktung Vertragspreis", "Long/Verkauf/POS"]

Problem description

First of all, I know that the example is not a proper minimal snippet. I also can not provide the excel files in question. However I have no idea how to debug this issue any further.

If I run the script above with python -X faulthandler I get:

Current thread 0x00007f76b30a3280 (most recent call first):
  File ".../.venv/lib/python3.8/site-packages/pandas/core/frame.py", line 3133 in _get_value
  File ".../.venv/lib/python3.8/site-packages/pandas/core/indexing.py", line 888 in __getitem__
  File ".../debug.py", line 22 in <module>
fish: “python -X faulthandler .../de…” terminated by signal SIGSEGV (Address boundary error)

From what I can deduce there is something wrong with the template_df DataFrame. I was trying to generate an equivivalent copy of it without copying the memory so I ran it through

tempalte_df = pd.Dataframe.from_dict(template_df.to_dict()). 

This does indeed seem to help. However by accident I discovered just doing

d = template_df.to_dict() 

also fixes it. As if to_dict() would somehow alter the DataFrame?

So I might be completely on the wrong track. Any help on how to drill down on this would be appreciated.

Expected Output

No segfault :-)

Output of pd.show_versions()

INSTALLED VERSIONS

commit : f2c8480
python : 3.8.2.final.0
python-bits : 64
OS : Linux
OS-release : 5.4.0-65-generic
Version : #73-Ubuntu SMP Mon Jan 18 17:25:17 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8

pandas : 1.2.3
numpy : 1.20.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.0.1
setuptools : 52.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : 1.3.7
lxml.etree : 4.6.2
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.21.0
pandas_datareader: None
bs4 : None
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.3.4
numexpr : None
odfpy : None
openpyxl : 3.0.6
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : None
sqlalchemy : 1.3.23
tables : None
tabulate : None
xarray : None
xlrd : 1.2.0
xlwt : None
numba : None

@tharwan tharwan added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 9, 2021
@mzeitlin11
Copy link
Member

Thanks for the report @tharwan. My best guess upon a quick glance is that the issue here is related to the line
df.index = df["Beschreibung"] (see #34364 for something similar). The same workaround should hopefully work here (using set_index instead of directly assigning). Since the index is immutable, direct assignment with a mutable object should generally be avoided (but clearly we should handle it more gracefully than a segfault).

@mzeitlin11 mzeitlin11 added Segfault Non-Recoverable Error and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 19, 2021
@mzeitlin11
Copy link
Member

@tharwan going to close in favor of #34364, but please reopen if you think it is a different issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Segfault Non-Recoverable Error
Projects
None yet
Development

No branches or pull requests

2 participants