Skip to content

BUG: Pandas doesn't release the lock when a corrupted file is fed in. pandas==1.2.1 worked fine but pandas==1.2.4 has this issue. #41778

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SAH-UJA opened this issue Jun 2, 2021 · 2 comments · Fixed by #41806
Labels
Bug IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Milestone

Comments

@SAH-UJA
Copy link

SAH-UJA commented Jun 2, 2021

Kindly refer the following code snippet.

import pandas as pd
import os

# Creating a corrupted file
with open('a.xlsx', 'w') as f:
pass

# Reading using pd.ExcelFile
data = pd.ExcelFile('a.xlsx', engine='openpyxl')

# Deleting the file
os.remove('a.xlsx')

Here, I create a .xlsx file which is not the correct way to make xlsx file but this will express my concern. Then I use pd.ExcelFile to read the xlsx file but it gives me zipfile.BadZipFile: File is not a zip file which is fine but later when I try to delete that file I get PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'a.xlsx'. This doesn't happen in pandas==1.2.1 and lower version. The current version i.e. pandas==1.2.4 has this bug.

@SAH-UJA SAH-UJA added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 2, 2021
@twoertwein
Copy link
Member

thank you for your report! Pandas 1.2 encourages to use a context manager with pd.ExcelFile to handle closing resources in case of errors.

Does the following work or you:

from pathlib import Path
import pandas as pd

file = Path("a.xlsx")
file.touch()

with pd.ExcelFile(file, engine='openpyxl') as _:
    pass

file.unlink()

@lithomas1 lithomas1 added IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jun 3, 2021
@jreback jreback added this to the 1.2.5 milestone Jun 3, 2021
@SAH-UJA
Copy link
Author

SAH-UJA commented Jun 4, 2021

@twoertwein I tried the snippet which you suggested but file.unlink() again gives the same exception that PermissionError: [WinError 32] The process cannot access the file because it is being used by another process: 'a.xlsx'. According to my belief, with context manager tries to call .close() but the problem is that the object is not getting returned because of the exception so .close() could not be called and therefore I feel that lock should have been released during exception handling internally by pandas. This is just my way of looking at this problem. I would love to get your feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Excel read_excel, to_excel Regression Functionality that used to work in a prior pandas version
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants